The Media Deduplication Tool is a Python application designed to identify and eliminate duplicate images and videos from specified directories on your disk. It utilizes perceptual hashing to compare files and create a report of similar files. This tool is particularly useful for organizing large media libraries and freeing up disk space.
- Deduplication: Detects and removes duplicate images and videos based on perceptual hashing.
- Error Handling: Optionally includes files that could not be processed (e.g., corrupt files).
- Batch Processing: Processes multiple directories concurrently for faster performance.
- CSV Reporting: Generates a CSV report of similar files for future reference.
The Media Deduplication Tool supports the following file extensions:
-
Images:
.bmp
.jpg
.jpeg
.png
.gif
-
Videos:
.mp4
.mov
You can also specify additional file extensions (without hash calculation) at runtime.
- Python 3.6 or higher
- Required Python packages:
opencv-python
Pillow
imagehash
tqdm
You can install the required packages using pip:
pip install opencv-python Pillow imagehash tqdm
-
Clone the repository:
git clone https://github.com/happy91512/MediaDeduplicationTool.git cd MediaDeduplicationTool
-
Run the script:
python deduplicate.py
-
Input Parameters:
- Enter the disk locations to check (separated by commas).
- Enter the output folder path where unique files will be saved.
- Optionally include videos with errors in the copy list.
- Optionally specify file extensions that should not undergo hash calculation (e.g.,
mp4, mov
).
-
Results:
- The script will output the number of unique images and videos processed.
- A CSV file named
similar_files.csv
will be created in the output folder, listing similar files detected during the process.
When prompted, input your disk locations and output folder as follows:
Enter the disk locations to check (separated by commas): D:\Media, E:\Photos
Enter the output folder path: D:\UniqueMedia
Include videos with errors in the copy list? (y/n): y
Enter specific file extensions without hash calculation (separated by commas): mp4, mov