Skip to content

Latest commit

 

History

History
94 lines (69 loc) · 3.32 KB

README.md

File metadata and controls

94 lines (69 loc) · 3.32 KB

officeextractor

Test Status Build Status Coverage Status
Version Info PyPI Version PyPI Downloads
Compatibility Python Versions
Style Code Style: Black pre-commit

About

officeextractor is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).


Supported File Types

Supported File Types Supported Media Formats
Microsoft Word docx, docm, dotm, dotx images
Microsoft Excel xlsx, xlsb, xlsm, xltm, xltx images
Microsoft PowerPoint potx, ppsm, ppsx, pptm, pptx, potm images, video & audio
LibreOffice Writer odt, ott images
LibreOffice Calc ods, ots images
LibreOffice Impress odp, otp, odg images
NOTE: Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.

Installation

pip install officeextractor

Usage

>>> import officeextractor

>>> officeextractor.extract(src=("File1.docx", "Folder/File2.xlsx"), dest="Path/To/Output/Folder")

4 media files extracted from File1.docx:
- 2 jpeg
- 1 gif
- 1 png

1 media file extracted from Folder/File2.xlsx:
- 1 png
Parameters

officeextractor.extract(src, dest, log=True)

src : str, list of str or tuple of str

Either a single file (string) or several files (list/tuple of strings) as relative or full path.

dest : str

Output directory as relative or full path. If the directory doesn't exist, it will be created.

log : bool, optional

Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.


Release Notes

Can be found here on GitHub


Licence

GNU General Public License v3.0