Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

IMOOCSpider's issues #10

Open
xiaohua9 opened this issue Jul 28, 2019 · 0 comments
Open

IMOOCSpider's issues #10

xiaohua9 opened this issue Jul 28, 2019 · 0 comments

Comments

@xiaohua9
Copy link

看了您的代码,我受益匪浅,如沐春风

在这里我也想给您反馈一个问题
  • 问题:
    您使用result字符串变量存储页面源码,问题在于您使用了定长字符串String类型,我也看了您爬取的页面,该页面有近一千行,此处用n代替,根据您的写法,那么您将浪费n-1个内存空间。注:不乏几千行的网页源码。
  • 解决方案:
    1、直接将定常字符串改为变长字符串,这样就避免了以上的内存浪费(但是:如果坚持用字符串存储,还存在一个问题,如果页面的数据足够多,那将超出字符串的容量上界);
    2、第二个方案就是,引进文件操作,使用本地文件作为页面源码的临时存取站,虽然麻烦了一些,但是更安全、保险。

略表拙见,不知所言

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant