Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Composer solwing down #78

Open
carlos54 opened this issue Mar 30, 2022 · 2 comments
Open

Composer solwing down #78

carlos54 opened this issue Mar 30, 2022 · 2 comments

Comments

@carlos54
Copy link

Hi, the Composer is slowing down exponentially with the document grow.
Append two document of 1000 pages will take 15 minutes...
Can we reduce this time with some parameter like in VB :

ActiveWindow.View = wdNormalView
Options.Pagination = False

Testing code:

from docx import Document
from docxcompose.composer import Composer

master = Document(
    "./1000pages_1.docx"
)
composer = Composer(master)

doc1 = Document(
    "./1000pages_2.docx"
)
composer.append(doc1)
composer.save(
    "./combined_big.docx"
)

1000pages_2.docx
1000pages_1.docx

@carlos54
Copy link
Author

carlos54 commented Apr 5, 2022

Anyone ?

@BryceStevenWilley
Copy link
Contributor

Looked at this for a bit. Those documents are pretty big, even for Word or Libreoffice (I had to force quit my LibreOffice instance before it was able to open it).

I was able to profile things using cProfile (script at bottom). On my machine it takes ~20 minutes to combine those docs while profiling, and notably, 13 of those minutes are spent in add_footnotes(), of which 8 of those minutes are spent parsing XML. I don't know how many footnotes those documents have, but personally I've run into other issues with footnotes in this library as well. It might be much faster if you have fewer footnotes.

I have a fix in another branch that speeds things up, but only my a few minutes. When factoring out that parse_xml, it still takes ~18 minutes, only saving about 1000 calls out of 6000 calls to parse_xml, so not that much.

I've attached the zipped profile info file that others can look into further. You can view it with snakeviz.

main.zip

Also not sure what you mean by the VB options. This library isn't using the Microsoft word API to compose documents like Visual Basic is, it's doing most if it through docxpython, so options like those wouldn't have any effect.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants