Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

爬取成功,但是db文件是空的 #3

Open
wangbiao92 opened this issue Apr 9, 2017 · 8 comments
Open

爬取成功,但是db文件是空的 #3

wangbiao92 opened this issue Apr 9, 2017 · 8 comments

Comments

@wangbiao92
Copy link

登入也成功,爬的是北京的,没有报错,就是db文件是空的,请问哪里出错了

@wangbiao92
Copy link
Author

我就试了下爬东城的,结果数据库还是空的
E:\Git\LianJiaSpider-master>python LianJiaSpider.py
d:\Anaconda2\lib\site-packages\bs4_init_.py:166: UserWarning: No parser was explicitly spe
arser for this system ("lxml"). This usually isn't a problem, but if you run this code on ano
ment, it may use a different parser and behave differently.

To get rid of this warning, change this:

BeautifulSoup([your markup])

to this:

BeautifulSoup([your markup], "lxml")

markup_type=markup_type))
爬下了 东城 区全部的小区信息
done
all done ^_^

@lanbing510
Copy link
Owner

链家最近加了严格的限制(验证码和流量限制),代码还没有进行更新

@wangbiao92
Copy link
Author

我把代码改了,把小区信息爬了下来,就是爬到成交记录,就ip异常,我再去找找有没有解决的办法,谢谢了

@pfsun
Copy link

pfsun commented Apr 11, 2017

@lanbing510 @wangbiao92 我第一次趴下来的时候数据db也是空的,你后来怎么解决数据库空的,运行第二次就一直有验证码和流量限制了 没办法login了,可以share 一下相应的code或者解决方法么?谢谢

@wangbiao92
Copy link
Author

@pfsun ,链家的网页改动了,所以代码要改动,但还是没有解决流量异常的问题,用了ip代理没用

@XuefengHuang
Copy link

可以试试我这个爬虫 数据会存在mysql。https://github.com/XuefengHuang/lianjia-scrawler

@pfsun
Copy link

pfsun commented Apr 14, 2017

@wangbiao92 好的 我再试试 谢谢

@pfsun
Copy link

pfsun commented Apr 14, 2017

@XuefengHuang 谢谢 我试试去。

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants