Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

How does ROCAUC work in score_array()? #137

Open
janosh opened this issue May 12, 2022 · 1 comment
Open

How does ROCAUC work in score_array()? #137

janosh opened this issue May 12, 2022 · 1 comment
Assignees
Labels
code Anything having to do with matbench python package code high priority

Comments

@janosh
Copy link
Member

janosh commented May 12, 2022

Seems like there's something wrong with score_array() in the classification case.

def score_array(true_array, pred_array, task_type):
"""
Score an array according to multiple metrics.
Args:
true_array (list or np.array): The ground truth array
pred_array (list or np.array): The predicted (test) array
task_type (str): Either regression or classification.
Returns:
(dict): dictionary of the scores, according to all defined
metrics.
"""
computed = {}
if task_type == REG_KEY:
metrics = REG_METRICS
elif task_type == CLF_KEY:
metrics = CLF_METRICS
else:
raise ValueError(
f"'task_type' must be on of {[REG_KEY, CLF_KEY]}, not '{task_type}'"
)
for metric in metrics:
mfunc = METRIC_MAP[metric]
if metric == "rocauc":
# Both arrays must be in probability form
# if pred. array is given in probabilities
if isinstance(pred_array[0], float):
true_array = homogenize_clf_array(true_array, to_probs=True)
# Other clf metrics always be converted to labels
elif metric in CLF_METRICS:
if isinstance(pred_array[0], float):
pred_array = homogenize_clf_array(pred_array, to_labels=True)
computed[metric] = mfunc(true_array, pred_array)
return computed

accuracy comes before rocauc in CLF_METRICS:

CLF_METRICS = ["accuracy", "balanced_accuracy", "f1", "rocauc"]

That means this code will convert the predictions to labels:

# Other clf metrics always be converted to labels
elif metric in CLF_METRICS:
    if isinstance(pred_array[0], float):
        pred_array = homogenize_clf_array(pred_array, to_labels=True)

in which case afterwards

if metric == "rocauc":
    # Both arrays must be in probability form
    # if pred. array is given in probabilities
    if isinstance(pred_array[0], float):
        true_array = homogenize_clf_array(true_array, to_probs=True)

will never be true and so you'd be trying to compute an ROCAUC from true labels vs predicted labels? Maybe I'm missing something?

@ardunn
Copy link
Collaborator

ardunn commented May 20, 2022

@janosh I think you are correct. I will fix this ASAP

@ardunn ardunn added high priority code Anything having to do with matbench python package code labels Jul 27, 2022
@ardunn ardunn self-assigned this Jul 27, 2022
robinruff added a commit to robinruff/matbench that referenced this issue Mar 14, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
code Anything having to do with matbench python package code high priority
Projects
None yet
Development

No branches or pull requests

2 participants