Skip to content

[BUGFIX] Respect language based style names on reading Word files #2597

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sbuerk
Copy link
Contributor

@sbuerk sbuerk commented Apr 2, 2024

[BUGFIX] Respect language based style names on reading Word files

Microsoft Office saves Office document with language based style
mappings for default styles. For example, if a german based Word
version is used, it writes following to the word/styles.xml in
the container archive (*.docs):

<w:style w:type="paragraph" w:styleId="berschrift1">
  <w:name w:val="heading 1"/>
  ....
  </w:style>

versus for a english based version it would be:

<w:style w:type="paragraph" w:styleId="Heading1">
  <w:name w:val="heading 1"/>
  ...
</w:style>

The value of <w:name /> defines the internal native code
identifier, whereas the w:styleId attribute on the outer
<w:style /> tag would describe the virtual or alias name.

Later parsing of the document structure, for example the
paragraphs, references the alias (w:styleId) name of a
style. The reader code uses hardcoded RegEx matchings in
a case-insensitive manner but using the englisch speaking
variant (Header\s+d) - on the language based one, which
would not match at all.

Therefore, multiple tasks need to be done and contained
in this change:

  • A alias map is implementend and used to register title
    aliases. Along with this corresponding lookup method is
    added.
  • Use the lookup method to resolve for alias where the
    hardcoded language RegEx is needed to be used.
  • Gathering all style alias names during reading the
    wordfile styles settings for all possible styles.

@coveralls
Copy link

coveralls commented Apr 2, 2024

Coverage Status

coverage: 97.171% (-0.05%) from 97.217%
when pulling 13a5d65 on sbuerk:stefan-1
into 8b891bb on PHPOffice:master.

Microsoft Office saves Office document with language based style
mappings for default styles. For example, if a german based Word
version is used, it writes following to the `word/styles.xml` in
the container archive (*.docs):

```
<w:style w:type="paragraph" w:styleId="berschrift1">
  <w:name w:val="heading 1"/>
  ....
  </w:style>
```

versus for a english based version it would be:

```
<w:style w:type="paragraph" w:styleId="Heading1">
  <w:name w:val="heading 1"/>
  ...
</w:style>
```

The value of `<w:name />` defines the internal native code
identifier, whereas the `w:styleId` attribute on the outer
`<w:style />` tag would describe the virtual or alias name.

Later parsing of the document structure, for example the
paragraphs, references the alias (`w:styleId`) name of a
style. The reader code uses hardcoded RegEx matchings in
a case-insensitive manner but using the englisch speaking
variant (`Header\s+d`) - on the language based one, which
would not match at all.

Therefore, multiple tasks need to be done and contained
in this change:

* A alias map is implementend and used to register title
  aliases. Along with this corresponding lookup method is
  added.
* Use the lookup method to resolve for alias where the
  hardcoded language RegEx is needed to be used.
* Gathering all style alias names during reading the
  wordfile styles settings for all possible styles.
@Progi1984
Copy link
Member

@sbuerk Have you got a sample file with 🇩🇪 styles ?

@Progi1984 Progi1984 added the Status: Waiting for feedback Question has been asked, waiting for response from PR author label Aug 15, 2024
@sbuerk
Copy link
Contributor Author

sbuerk commented Oct 11, 2024

@sbuerk Have you got a sample file with 🇩🇪 styles ?

Sorry, was kind of busy with mainting other open source stuff, and will be until thusday. Will try to update this and my other pr's in the next 2 weeks, sorry for the delay.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Status: Waiting for feedback Question has been asked, waiting for response from PR author
Development

Successfully merging this pull request may close these issues.

3 participants