-
Notifications
You must be signed in to change notification settings - Fork 9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Resolve Inference Selection Bug Affecting Transcription Quality #1377
base: main
Are you sure you want to change the base?
Conversation
When the avg_logprob condition isn't satisfied and the result is re-computed with a greater temperature, the best option is returned
I've been testing with this PR, and the improvement is bigger than I thought. @jongwook 님, Could you please review this PR? |
This change makes sense but I think it should take into account For example only consider the results where the compression ratio is below |
This PR was merged into SYSTRAN/faster-whisper#356 and seems to somewhat improve transcription quality. Is there anything I can do for the review process ? |
Hello guys, I have also encountered this bug recently, so let me share my opinion about solution of this issue. So let's assume that none of the predictions is meeting the criteria for
Let me show you Real Data example that I found with whisper small, version 2 (it would be easy to create lot of mock examples, but this I did not want...):
So in this case, the first result would be selected, although it has very bad So I suggest to calculate something called In the current case, it will be: Possible implementation: def _select_best_prediction(
decoded_results: List[Decoding_result],
logprob_threshold: Optional[float] = -1,
compression_ratio_threshold: Optional[float] = 2.4,
) -> DecodingResult:
"""Select best prediction from decoded results with various temperatures."""
assert len(decoded_results) > 0
predictions_meeting_compression = []
for pred in decoded_results:
if pred.compression_ratio <= compression_ratio_threshold:
predictions_meeting_compression.append(pred)
# Case 1: There exist prediction with compression lower than
# is the threshold
# Then select the prediction with best log_prob
if len(predictions_meeting_compression) > 0:
return max(predictions_meeting_compression, key=lambda x: x.avg_logprob)
# Case 2: There does not exist any prediction with compression ratio
# smaller than the threshold
# Then calculate tradeoff_factor between log_prob and compression ratio as
# (logprob of the prediction/logprob threshold) *
# (compression_ratio of the prediction / compression_ratio threshold)
# and select the prediction with lowest value of this factor
else:
tradeoff_factors = []
for pred in decoded_results:
factor = (pred.avg_logprob / logprob_threshold) * (
pred.compression_ratio / compression_ratio_threshold
)
tradeoff_factors.append(factor)
best_index = tradeoff_factors.index(min(tradeoff_factors))
return decoded_results(best_index) |
Currently, when none of the inference made at different temperatures can satisfy all of the "non-fallback" conditions, the last one is returned (the one with the biggest emperature with default args).
This can lead to weird behaviors, for example:
This PR doesn't change the behaviour when one of the inferences satisfy all of the conditions.
When it's not the case, the result that is returned is the one leading to the highest
avg_logprob
When the avg_logprob condition isn't satisfied and the result is re-computed with a greater temperature, the best option is returned