Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

添加了 geonodedaili.py 爬取代理 #186

Merged
merged 2 commits into from
Mar 1, 2023
Merged

Conversation

MGMCN
Copy link
Contributor

@MGMCN MGMCN commented Feb 27, 2023

您好,我是一名在日留学生,最近需要用到一些代理ip服务,于是找到了您的库。
我贡献了一个新的爬取网站,资源很全。
但是我在我自己写的那个类里重写了crawl和fetch方法,因为需要修改爬取间隔时长,同时想传递比较准确的header的参数。(这个网站的代理资源几乎都可用,同时爬起来也很容易被检测到,所以需要修改请求时间间隔和header。)

@MGMCN
Copy link
Contributor Author

MGMCN commented Feb 27, 2023

或者说我是不是可以直接把header通过crawl函数调用的fetch函数传递进去?这样就不用重写fetch函数了。
期待您的回复。

@MGMCN
Copy link
Contributor Author

MGMCN commented Feb 28, 2023

我在日本这边试了下time.sleep(.5)也能一个不漏爬完,但是不知道中国国内访问速度怎样。

@Germey
Copy link
Member

Germey commented Mar 1, 2023

多谢 @MGMCN 您的贡献,因为本身 crawl 函数预留的功能有限,比如增加时延等功能,所以我觉得重写一下也 make sense 的。我运行了下,感觉挺不错的,非常有用!Merge 了

@Germey Germey merged commit 4c50711 into Python3WebSpider:master Mar 1, 2023
@MGMCN MGMCN deleted the MGMCN branch March 3, 2023 01:52
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants