Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

rfc (api): revisit the option to return a single scalar when calling a metric #1259

Closed
glemaitre opened this issue Jan 29, 2025 · 7 comments
Closed

Comments

@glemaitre
Copy link
Member

glemaitre commented Jan 29, 2025

In EstimatorReport and CrossValidationReport, it can happen that a metric will return a dataframe with a single scalar in it. From several discussions it comes to the following trade-off:

  • Having a single number could be annoying and one would like to get the Python float directly
  • Having the type of output depending of what is going to be return is a bit annoying

A really explicit solution would be to introduce a parameter for metric that can return a single value, for instance as_type_if_scalar that could be "frame" or "float".

However, I'm seeking for feedback to know if it is the way to go.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from skore import EstimatorReport

X, y = make_classification()
estimator = LogisticRegression()
report = EstimatorReport(estimator, X_train=X, y_train=y, X_test=X, y_test=y)
report.metrics.accuracy()
         Metric 	Accuracy (↗︎)
LogisticRegression 	0.97

vs.

report.metrics.accuracy().iloc[0, 0]
np.float64(0.97)
@glemaitre glemaitre changed the title api: revisit the option to return a single scalar when calling a metric rfc (api): revisit the option to return a single scalar when calling a metric Jan 29, 2025
@MarieSacksick
Copy link
Contributor

I'd be happy a solution like this:

report.metrics.accuracy(as_float = True) # false by default, renders a float
report.metrics.precision(as_float = True) # renders an error if positive_label is not defined
report.metrics.precision(as_float = True, positive_label = True) # renders a float

Other possible solution, to take into account the various df :

report.metrics.accuracy(render_as = float) # renders a float
# the param render_as is "pandas" by default but can take ["float", "pandas", "polars"]
report.metrics.precision(as_float = True) # renders an error if positive_label is not defined
report.metrics.precision(as_float = True, positive_label = True) # renders a float

But, pfiou, it seems less fluid for like 0 additional value.

@auguste-probabl
Copy link
Contributor

Personally I like the consistency of all metrics methods returning things of the same type, and that it makes it possible to combine them using pandas methods. Admittedly we're limited by the pandas API here (for example in Pytorch, you can convert a N-d tensor to a float using tensor.item()).

@adrinjalali
Copy link
Contributor

As a user, I'd find it very odd if report.metrics.accuracy() returns a dataframe, for two reasons:

  • I might not like pandas, and I might want to handle my data in a different dataframe (like polars)
  • I expect a single number out of such a metric, the same way that I expect a very complex output coming out of roc, or something I'd expect coming out of a fairlearn.MetricFrame

So to me the natural thing here is to return the simplest form that each metric returns.

When it comes to wanting to show several metrics in the same table, we can have a separate method to return a table, such as:

skore.set_config(dataframe="polars")
report.metrics[["accuracy", "balanced_accuracy", "f1", "precision", "recall"]]
# or
report.metrics.as_frame(["accuracy", "f1", ...], dataframe="polars")

@glemaitre
Copy link
Member Author

Summarizing some thoughts that we got IRL with @adrinjalali that could simplify the API and answer the needs. I'll open subsequent dedicated issues reflecting some discussions here:

People have expectation when calling:

report.metrics.subsequent_metric(...)

It should be most probably a Python object like float for a scalar or a dict when we need a more information about the pos_label. In this thread, we are also convinced that we should have a sort of consistency regarding the output type. So here, it is a proposed API that looks a middle ground:

For metrics, we should force least ambiguous Python type but that should not change depending of the input. Let's provide two examples:

report.metrics.accuracy()

will return a float but

report.metrics.precision()

will return a dict. The number of values in the dict will depends on the parameters of precision: in binary case, if pos_label=None then we have both the positive and negative label reporting while if specified then a single value. For multiclass then we have all classes metrics reported. But the type is consistent.

When it comes to consolidate the metrics into a report, the report.metrics.report_metrics() makes a lot of sense now. It avoids you to have to create dataframe for each individual metric and we do the concatenate for you.

At the end, we do not need anymore a new parameter which is what I like about it. Most probably not perfect for everyone but answer the need of many.

@sylvaincom
Copy link
Contributor

sylvaincom commented Feb 4, 2025

Thanks

+1 : the pandas dataframe with all metrics accessible in report.metrics.report_metrics(), but for specific metrics what you suggested above

@MarieSacksick
Copy link
Contributor

What's the usual process for an rfc issue? Can we close it now that we have #1275 to implement the decision?

@auguste-probabl
Copy link
Contributor

Yep, the Definition of Done for this is that a decision was made.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants