Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

how do i deal with SRT files containing HTML? #163

Open
keredson opened this issue Feb 15, 2018 · 2 comments
Open

how do i deal with SRT files containing HTML? #163

keredson opened this issue Feb 15, 2018 · 2 comments

Comments

@keredson
Copy link

example:

1
00:00:00,970 --> 00:00:03,000
<font face="Serif" size="18">Jellyfish at the Monterey Aquarium</font>

2
00:00:04,080 --> 00:00:06,080
<font face="Serif" size="18">Dude - get out of the way!</font>

3
00:00:09,350 --> 00:00:13,350
<font face="Serif" size="18">Shaky Hands...</font>

4
00:00:17,000 --> 00:00:22,000
<font face="Serif" size="18">Ah yes, this is better...</font>

5
00:00:24,825 --> 00:00:27,825
<font face="Serif" size="18">Pro Tip: Turn off the camera flash!</font>

6
00:00:33,000 --> 00:00:45,446
<font face="Serif" size="18">Thanks for watching and I hope you'll have fun with the VideoSub library!</font>

if i convert it to webvtt i get this:

WEBVTT

00:00.970 --> 00:03.000
&lt;font face="Serif" size="18">Jellyfish at the Monterey Aquarium&lt;/font>

00:04.080 --> 00:06.080
&lt;font face="Serif" size="18">Dude - get out of the way!&lt;/font>

00:09.350 --> 00:13.350
&lt;font face="Serif" size="18">Shaky Hands...&lt;/font>

00:17.000 --> 00:22.000
&lt;font face="Serif" size="18">Ah yes, this is better...&lt;/font>

00:24.825 --> 00:27.825
&lt;font face="Serif" size="18">Pro Tip: Turn off the camera flash!&lt;/font>

00:33.000 --> 00:45.446
&lt;font face="Serif" size="18">Thanks for watching and I hope you'll have fun with the VideoSub library!&lt;/font>

i'm converting like this:

      converter = pycaption.CaptionConverter()
      converter.read(srt, pycaption.detect_format(srt)())
      subtitles = converter.write(pycaption.WebVTTWriter())

thanks!

@kdHub
Copy link

kdHub commented Feb 15, 2018

This has been my solution so far post conversion... Also would be interested in resolution using pycaption


try:
    from HTMLParser import HTMLParser
except ImportError:
    # Python 3
    from html.parser import HTMLParser

# Store vtt convert
vtt=WebVTTWriter().write(DFXPReader().read(vtt))

h = HTMLParser()
vtt=(h.unescape(vtt))

@keredson
Copy link
Author

I did similar a work around but from the other end (preventing the escape to begin with).
https://github.com/keredson/gnomecast/blob/9bbb32ef3028dda480d893204aa71be7ea38ccaf/gnomecast.py#L19

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants