Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

🐛(back) manage subtitles content starting with a BOM #2604

Merged
merged 1 commit into from
Jul 31, 2024

Conversation

lunika
Copy link
Member

@lunika lunika commented Jul 31, 2024

Purpose

When reading a subtitle content file uploaded, sometimes it starts with a Byte Order Mark and the srt reader is failing to detect the content as a srt one.
We have to remove it before using the detect_format from the pycaption library.

Proposal

  • manage subtitles content starting with a BOM

@lunika lunika added the bug label Jul 31, 2024
@lunika lunika requested review from jbpenrath and wilbrdt July 31, 2024 14:03
@lunika lunika self-assigned this Jul 31, 2024
Comment on lines +65 to 66
timed_text = timed_text_file.read().replace("\ufeff", "")
reader = detect_format(timed_text)
Copy link
Member

@jbpenrath jbpenrath Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a workaround, but I feel this should be a contribution to pycaption, nope ?

Copy link
Member Author

@lunika lunika Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know... pycaption accepts a string in input, now how you retrieve it. I will open an issue to know if there are interested for this contribution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created: pbs/pycaption#341

When reading a subtitle content file uploaded, sometimes it starts with
a Byte Order Mark and the srt reader is failing to detect the content as
a srt one.
We have to remove it before using the detect_format from the pycaption
library.
@lunika lunika enabled auto-merge (rebase) July 31, 2024 14:30
@lunika lunika merged commit 13b1686 into master Jul 31, 2024
32 of 33 checks passed
@lunika lunika deleted the pycatption_srt branch July 31, 2024 14:37
Copy link

sentry-io bot commented Aug 2, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 11: invalid start byte marsha.core.tasks.timed_text_track.convert_time... View Issue

Did you find this useful? React with a 👍 or 👎

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants