-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[duboku] add new extractor #26467
base: master
Are you sure you want to change the base?
[duboku] add new extractor #26467
Conversation
5e26784
to
da2069f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passes tests with all suggested changes applied on top of 530f458.
I'll apply them if you like.
IE_NAME = 'duboku' | ||
IE_DESC = 'www.duboku.co' | ||
|
||
_VALID_URL = r'(?:https?://[^/]+\.duboku\.co/vodplay/)(?P<id>[0-9]+-[0-9-]+)\.html.*' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Require n-n-n in id
field; no need to match the tail:
_VALID_URL = r'(?:https?://[^/]+\.duboku\.co/vodplay/)(?P<id>[0-9]+-[0-9-]+)\.html.*' | |
_VALID_URL = r'(?:https?://[^/]+\.duboku\.co/vodplay/)(?P<id>(?:[0-9]+-){2}[0-9]+)\.html' |
'url': 'https://www.duboku.co/vodplay/1575-1-1.html', | ||
'info_dict': { | ||
'id': '1575-1-1', | ||
'ext': 'ts', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix test:
'ext': 'ts', | |
'ext': 'mp4', |
'url': 'https://www.duboku.co/vodplay/1588-1-1.html', | ||
'info_dict': { | ||
'id': '1588-1-1', | ||
'ext': 'ts', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix test:
'ext': 'ts', | |
'ext': 'mp4', |
'id': '1588-1-1', | ||
'ext': 'ts', | ||
'series': '亲爱的自己', | ||
'title': 'contains:预告片', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Page has changed:
'title': 'contains:预告片', | |
'title': '亲爱的自己 第1集', |
temp = video_id.split('-') | ||
series_id = temp[0] | ||
season_id = temp[1] | ||
episode_id = temp[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simpler:
temp = video_id.split('-') | |
series_id = temp[0] | |
season_id = temp[1] | |
episode_id = temp[2] | |
series_id, season_id, episode_id = video_id.split('-') |
href = extract_attributes(mobj.group(0)).get('href') | ||
if href: | ||
mobj1 = re.search(r'/(\d+)\.html', href) | ||
if mobj1 and mobj1.group(1) == series_id: | ||
series_title = clean_html(mobj.group(0)) | ||
series_title = re.sub(r'[\s\r\n\t]+', ' ', series_title) | ||
title = clean_html(html) | ||
title = re.sub(r'[\s\r\n\t]+', ' ', title) | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- use the resulting match object
- avoid excessive indentation
r'\s'
includes any whitespace- simplify
clean_html()
expressions
href = extract_attributes(mobj.group(0)).get('href') | |
if href: | |
mobj1 = re.search(r'/(\d+)\.html', href) | |
if mobj1 and mobj1.group(1) == series_id: | |
series_title = clean_html(mobj.group(0)) | |
series_title = re.sub(r'[\s\r\n\t]+', ' ', series_title) | |
title = clean_html(html) | |
title = re.sub(r'[\s\r\n\t]+', ' ', title) | |
break | |
href = extract_attributes(html[mobj.start(0):mobj.start('content')]).get('href') | |
if not href: | |
continue | |
mobj1 = re.search(r'/(?P<s_id>\d+)\.html', href) | |
if mobj1 and mobj1.group('s_id') == series_id: | |
series_title = clean_html(re.sub(r'\s+', ' ', mobj.group('content'))) | |
title = clean_html(re.sub(r'\s+', ' ', html)) | |
break |
'episode_id': episode_id, | ||
} | ||
|
||
formats = self._extract_m3u8_formats(data_url, video_id, 'mp4') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass Referer
header to avoid 403:
formats = self._extract_m3u8_formats(data_url, video_id, 'mp4') | |
headers = {'Referer': 'https://www.duboku.co/static/player/videojs.html'} | |
formats = self._extract_m3u8_formats(data_url, video_id, 'mp4', headers=headers) |
'episode_number': int_or_none(episode_id), | ||
'episode_id': episode_id, | ||
'formats': formats, | ||
'http_headers': {'Referer': 'https://www.duboku.co/static/player/videojs.html'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use headers
as introduced above:
'http_headers': {'Referer': 'https://www.duboku.co/static/player/videojs.html'} | |
'http_headers': headers, |
'url': 'https://www.duboku.co/voddetail/1554.html#playlist2', | ||
'info_dict': { | ||
'id': '1554#playlist2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#playlist2
has gone: use #playlist1
instead:
'url': 'https://www.duboku.co/voddetail/1554.html#playlist2', | |
'info_dict': { | |
'id': '1554#playlist2', | |
'url': 'https://www.duboku.co/voddetail/1554.html#playlist1', | |
'info_dict': { | |
'id': '1554#playlist1', |
mobj = re.match(self._VALID_URL, url) | ||
if mobj is None: | ||
raise ExtractorError('Invalid URL: %s' % url) | ||
series_id = mobj.group('id') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplify:
mobj = re.match(self._VALID_URL, url) | |
if mobj is None: | |
raise ExtractorError('Invalid URL: %s' % url) | |
series_id = mobj.group('id') | |
series_id = self._match_id(url) |
@dirkf can you please help me apply the changes, my original repo was being deleted by github.. |
Unfortunately the GH website logic says "diff is outdated" if I try to do that, presumably because the source branch is blocked. Contact GH support to get your repo unblocked (mention #27013). Much the easiest. Or perhaps:
|
You can't delete blocked repo without contacting support. You can create a new fork and a new PR from it though
When I contacted support a while ago to get my fork (not yt-dlp) restored, they only provided the option to delete it. I had to let them delete it and then re-fork. Luckily, I had local copy of all the branches |
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Resolves #22125.