officeextractor

Test Status
Version Info
Compatibility
Style

About

officeextractor is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).

Supported File Types

Supported	File Types	Supported Media Formats
Microsoft Word	docx, docm, dotm, dotx	images
Microsoft Excel	xlsx, xlsb, xlsm, xltm, xltx	images
Microsoft PowerPoint	potx, ppsm, ppsx, pptm, pptx, potm	images, video & audio
LibreOffice Writer	odt, ott	images
LibreOffice Calc	ods, ots	images
LibreOffice Impress	odp, otp, odg	images

⚠ NOTE: Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.

Installation

pip install officeextractor

Usage

>>> import officeextractor

>>> officeextractor.extract(src=("File1.docx", "Folder/File2.xlsx"), dest="Path/To/Output/Folder")

4 media files extracted from File1.docx:
- 2 jpeg
- 1 gif
- 1 png

1 media file extracted from Folder/File2.xlsx:
- 1 png

Parameters

officeextractor.extract(src, dest, log=True)

src : str, list of str or tuple of str

Either a single file (string) or several files (list/tuple of strings) as relative or full path.

dest : str

Output directory as relative or full path. If the directory doesn't exist, it will be created.

log : bool, optional

Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.

Release Notes

Can be found here on GitHub

Licence

GNU General Public License v3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

officeextractor

About

Supported File Types

⚠ NOTE: Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.

Installation

Usage

Parameters

Release Notes

Licence

Files

README.md

Latest commit

History

README.md

File metadata and controls

officeextractor

About

Supported File Types

⚠ NOTE: Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.

Installation

Usage

Parameters

Release Notes

Licence