ENH: Add Cloning #1371

pubpub-zz · 2022-09-27T18:14:10Z

The method .clone(pdf_dest,[force_duplicate]) clones the objects and all referenced objects.

If an object is already cloned, the already cloned object is returned (unless force_duplicate is set)
mainly for internal use but can be used on a page
for pageObject/DictionnaryObject/[Encoded/Decoded/Content]Stream an extra parameter ignore_fields list that provide the list of fields that should not be cloned.

When available, the pointer to an object is available in indirect_obj attribute.

New API for add_page/insert_page that :

returns the cloned page object
ignore_fields can be provided as a parameter.

Others

file is closed at the end of PdfWriter.write when a filename is provided
Breaking Change: add_outline_item now has a parameter before which is not the last parameter

Update

The public API of PdfMerger has been added to PdfWriter (ready to make PdfMerger an alias of it)
Process properly Outline merging
Process properly Named destinated

Deals with #1194, #1322, #471, #1337

add cloning capability includes: * add clone function * new API for add_page/insert_page that returns the cloned page object * close file when a file name is provided to PdfWriter.write

fix py-pdf#1338

w.merge and w.append

to be iaw PDF Spec add page clean up for destination in NameObject that are not matching TextStringObject in Names/Dests

codecov · 2022-10-15T08:49:25Z

Codecov Report

Base: 94.14% // Head: 92.70% // Decreases project coverage by -1.43% ⚠️

Coverage data is based on head (4ccfbff) compared to base (7633477).
Patch coverage: 84.45% of modified lines in pull request are covered.

❗ Current head 4ccfbff differs from pull request most recent head afebcab. Consider uploading reports for the commit afebcab to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1371      +/-   ##
==========================================
- Coverage   94.14%   92.70%   -1.44%     
==========================================
  Files          31       29       -2     
  Lines        5480     5691     +211     
  Branches     1037     1112      +75     
==========================================
+ Hits         5159     5276     +117     
- Misses        193      267      +74     
- Partials      128      148      +20

Impacted Files	Coverage Δ
PyPDF2/_merger.py	`97.60% <ø> (+4.42%)`	⬆️
PyPDF2/generic/_data_structures.py	`89.75% <79.08%> (-5.57%)`	⬇️
PyPDF2/_protocols.py	`81.25% <81.25%> (ø)`
PyPDF2/_writer.py	`86.12% <84.11%> (-3.43%)`	⬇️
PyPDF2/generic/_base.py	`99.64% <98.36%> (-0.36%)`	⬇️
PyPDF2/_page.py	`92.23% <100.00%> (+0.28%)`	⬆️
PyPDF2/_reader.py	`90.33% <100.00%> (+0.04%)`	⬆️
PyPDF2/types.py	`100.00% <100.00%> (ø)`
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

MartinThoma

mypy didn't complain when I checked. As you asked me to look at mypy, I checked all 'type: ignore' comments. Several were not necessary at all. In some cases mypy needed an assert variable is not None as a hint. And in some cases I could at least narrow the ignore down to be a bit more specific

PyPDF2/_page.py

PyPDF2/generic/_base.py

PyPDF2/generic/_data_structures.py

Co-authored-by: Martin Thoma <info@martin-thoma.de>

PyPDF2/_writer.py

MartinThoma · 2022-12-11T07:39:12Z

Finally! I'll have another quick look at the code and then merge today :-)

MartinThoma · 2022-12-11T07:51:13Z

@pubpub-zz Thank you so much for this moonshot extension 🙏 ❤️

xilopaint · 2022-12-11T15:55:24Z

@pubpub-zz thanks for all the effort you've put into this PR!

BREAKING CHANGES: - Deprecate features with PyPDF2==3.0.0 (#1489) - Refactor Fit / Zoom parameters (#1437) New Features (ENH): - Add Cloning (#1371) - Allow int for indirect_reference in PdfWriter.get_object (#1490) Documentation (DOC): - How to read PDFs from S3 (#1509) - Make MyST parse all links as simple hyperlinks (#1506) - Changed 'latest' for 'stable' generated docs (#1495) - Adjust deprecation procedure (#1487) Maintenance (MAINT): - Use typing.IO for file streams (#1498) [Full Changelog](2.12.1...3.0.0)

pubpub-zz added 2 commits September 27, 2022 19:11

Add Cloning capability

7d2a74b

add cloning capability includes: * add clone function * new API for add_page/insert_page that returns the cloned page object * close file when a file name is provided to PdfWriter.write

Merge remote-tracking branch 'py-pdf/main' into cloning

661b6bf

pubpub-zz marked this pull request as draft September 27, 2022 18:34

pubpub-zz added 2 commits September 27, 2022 22:25

exclude_fields can be propagated

c6ac1e2

BUG : write reuse

f9d7d19

fix py-pdf#1338

pubpub-zz mentioned this pull request Sep 27, 2022

Reusing PdfMerger after write generates PDF with extra pages #1337

Closed

pubpub-zz added 7 commits October 5, 2022 23:48

cloning, part2

2c78419

w.merge and w.append

cloning part3

54abc77

Fix flake8+ "/Count"

0506ae4

flake8

bd0c855

Flake 8

ffc8e53

Sort DestNames + add page cleanup for annots

a66bcc2

to be iaw PDF Spec add page clean up for destination in NameObject that are not matching TextStringObject in Names/Dests

flake8

90c95b7

pubpub-zz mentioned this pull request Oct 11, 2022

HTML links to document page broken after merge #471

Closed

pubpub-zz added 10 commits October 12, 2022 23:48

mypy 1/n

52e8bcd

add test for iis py-pdf#471

1e55376

flake8

506f35e

mypy

2abe7e9

mypy

6ee6859

Merge remote-tracking branch 'py-pdf/main' into cloning

39e4f9f

B006 fix 1

9bdde0f

B006 fix 2

803becb

mypy

1727985

mypy

f498373

MartinThoma reviewed Oct 16, 2022

View reviewed changes

pubpub-zz and others added 4 commits October 16, 2022 10:34

Martin's recommendation

1c60786

Co-authored-by: Martin Thoma <info@martin-thoma.de>

Martin's recommendation

198ada8

Co-authored-by: Martin Thoma <info@martin-thoma.de>

Update PyPDF2/generic/_data_structures.py

f0fdd4a

Co-authored-by: Martin Thoma <info@martin-thoma.de>

Martin's suggestion

e56555d

Co-authored-by: Martin Thoma <info@martin-thoma.de>