Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

doc文档抓取 #608

Open
zhumingyu opened this issue Jan 15, 2025 · 2 comments
Open

doc文档抓取 #608

zhumingyu opened this issue Jan 15, 2025 · 2 comments

Comments

@zhumingyu
Copy link

您好作者,请问能捕获这种Doc或zip文档吗?https://www.zxxk.com/soft/45705861.html

@xifangczy
Copy link
Owner

额不行... 原理上做不到...
网站提供的都是非原始文件 比如你这个网站 他把doc转成了svg文件 到设置添加后缀 svg可以得到。

@louiesun
Copy link

额不行... 原理上做不到... 网站提供的都是非原始文件 比如你这个网站 他把doc转成了svg文件 到设置添加后缀 svg可以得到。

这种算好的,你能拿到pdf格式(pdf整体上是矢量图)
像doc88这种你只能拿到位图,要想尽可能还原只能ocr了。
greasyfork上应该有这类项目,你可以去楼搜一下

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants