Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

PdfDocument::bookmarks::iter skips the root bookmark #120

Closed
xVanTuring opened this issue Nov 7, 2023 · 5 comments
Closed

PdfDocument::bookmarks::iter skips the root bookmark #120

xVanTuring opened this issue Nov 7, 2023 · 5 comments
Assignees

Comments

@xVanTuring
Copy link
Contributor

The doc says it starting from the top-level root bookmark. I assume that means including the root(first) bookmark.

Code

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings = Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("./"))
        .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document: PdfDocument<'_> = pdfidum.load_pdf_from_file(
        "F:/archive/pdf/NET-Microservices-Architecture-for-Containerized-NET-Applications.pdf",
        None,
    )?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

Output

root: Introduction to Containers and Docker
Iter: ## skipped the root bookmark
0: Choosing Between .NET and .NET Framework for Docker Containers
1: Architecting container and microservice-based applications
2: Development process for Docker-based applications
3: Designing and Developing Multi-Container and Microservice-Based .NET Applications
4: Tackle Business Complexity in a Microservice with DDD and CQRS Patterns
5: Implement resilient applications
6: Make secure .NET Microservices and Web Applications
7: .NET Microservices Architecture key takeaways
@xVanTuring
Copy link
Contributor Author

Also iter_all_descendants seems not working like the description(It should iterator all node and those child)

Code

println!("root: {}", root.title().unwrap());
for (idx, bookmark) in root.iter_all_descendants().enumerate() {
    println!("    {idx}: {}", bookmark.title().unwrap());
}

Output

root: Introduction to Containers and Docker
    0: What is Docker?
    1: Docker terminology
    2: Docker containers, images, and registries

But 0: What is Docker have some sub-bookmarks.

bookmark

@ajrcarey ajrcarey self-assigned this Nov 7, 2023
@ajrcarey
Copy link
Owner

ajrcarey commented Nov 7, 2023

Hi @xVanTuring , thank you for reporting the issue. Let's focus on this issue first, since you have a work-around for your other issue. Are you able to provide a non-copyrighted sample document that demonstrates the problem?

@xVanTuring
Copy link
Contributor Author

Hi @xVanTuring , thank you for reporting the issue. Let's focus on this issue first, since you have a work-around for your other issue. Are you able to provide a non-copyrighted sample document that demonstrates the problem?

Bookmark.pdf
Here is a simple pdf I made contains only some bookmarks.

ajrcarey pushed a commit that referenced this issue Nov 9, 2023
@ajrcarey
Copy link
Owner

ajrcarey commented Nov 9, 2023

I agree, the traversal methodology used by the PdfBookmarksIterator is rather peculiar and it gives unexpected results. I have rewritten the iterator to use a standard depth-first graph traversal technique. Using a slightly adjusted version of your sample code:

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings =
        Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("../pdfium/"))
            .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document = pdfidum.load_pdf_from_file("Bookmark.pdf", None)?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter root direct children:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_direct_children().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root all descendants:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_all_descendants().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter entire tree from root:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

and applying it to your sample document, I now get the following output:

root: Chapter 1
Iter root direct children:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.3
Iter root all descendants:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
Iter entire tree from root:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
8: Chapter 2
9: 2.1
10: 2.2
11: 2.2.1
12: 2.2.2
13: 2.2.2.1
14: 2.2.2.2
15: 2.3
16: 2.3.1
17: 2.3.2

which looks more like the expected result.

@ajrcarey
Copy link
Owner

ajrcarey commented Nov 10, 2023

Extended sample code to check siblings as well:

use pdfium_render::prelude::*;

pub fn main() -> Result<(), PdfiumError> {
    let bindings =
        Pdfium::bind_to_library(Pdfium::pdfium_platform_library_name_at_path("../pdfium/"))
            .or_else(|_| Pdfium::bind_to_system_library())?;

    let pdfidum = Pdfium::new(bindings);
    let document = pdfidum.load_pdf_from_file("Bookmark.pdf", None)?;

    let bookmarks = document.bookmarks();
    println!("root: {}", bookmarks.root().unwrap().title().unwrap());
    println!("Iter root siblings:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_siblings().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root direct children:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_direct_children().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter root all descendants:");
    for (idx, bookmark) in bookmarks.root().unwrap().iter_all_descendants().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    println!("Iter entire tree from root:");
    for (idx, bookmark) in bookmarks.iter().enumerate() {
        println!("{idx}: {}", bookmark.title().unwrap());
    }
    Ok(())
}

Made a small change to PdfBookmarksIterator to ensure a skip sibling is never yielded as part of iteration. This avoids a bookmark being included in its own list of siblings. The sample code output is now:

root: Chapter 1
Iter root siblings:
0: Chapter 2
Iter root direct children:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.3
Iter root all descendants:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
Iter entire tree from root:
0: Chapter 1
1: 1.1
2: 1.2
3: 1.2.1
4: 1.2.2
5: 1.2.2.1
6: 1.2.2.2
7: 1.3
8: Chapter 2
9: 2.1
10: 2.2
11: 2.2.1
12: 2.2.2
13: 2.2.2.1
14: 2.2.2.2
15: 2.3
16: 2.3.1
17: 2.3.2

Updated README. Ready to release as part of 0.8.16.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants