Test Status | |
Version Info | |
Compatibility | |
Style |
officeextractor is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).
Supported | File Types | Supported Media Formats |
---|---|---|
Microsoft Word | docx, docm, dotm, dotx | images |
Microsoft Excel | xlsx, xlsb, xlsm, xltm, xltx | images |
Microsoft PowerPoint | potx, ppsm, ppsx, pptm, pptx, potm | images, video & audio |
LibreOffice Writer | odt, ott | images |
LibreOffice Calc | ods, ots | images |
LibreOffice Impress | odp, otp, odg | images |
pip install officeextractor
>>> import officeextractor
>>> officeextractor.extract(src=("File1.docx", "Folder/File2.xlsx"), dest="Path/To/Output/Folder")
4 media files extracted from File1.docx:
- 2 jpeg
- 1 gif
- 1 png
1 media file extracted from Folder/File2.xlsx:
- 1 png
officeextractor.extract(src, dest, log=True)
src : str, list of str or tuple of str
Either a single file (string) or several files (list/tuple of strings) as relative or full path.
dest : str
Output directory as relative or full path. If the directory doesn't exist, it will be created.
log : bool, optional
Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.