Skip to content

The Media Deduplication Tool is a Python application designed to identify and eliminate duplicate images and videos from specified directories on your disk. It utilizes perceptual hashing to compare files and create a report of similar files. This tool is particularly useful for organizing large media libraries and freeing up disk space.

License

Notifications You must be signed in to change notification settings

happy91512/MediaDeduplicationTool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Media Deduplication Tool

Overview

The Media Deduplication Tool is a Python application designed to identify and eliminate duplicate images and videos from specified directories on your disk. It utilizes perceptual hashing to compare files and create a report of similar files. This tool is particularly useful for organizing large media libraries and freeing up disk space.

Features

  • Deduplication: Detects and removes duplicate images and videos based on perceptual hashing.
  • Error Handling: Optionally includes files that could not be processed (e.g., corrupt files).
  • Batch Processing: Processes multiple directories concurrently for faster performance.
  • CSV Reporting: Generates a CSV report of similar files for future reference.

Supported File Types

The Media Deduplication Tool supports the following file extensions:

  • Images:

    • .bmp
    • .jpg
    • .jpeg
    • .png
    • .gif
  • Videos:

    • .mp4
    • .mov

You can also specify additional file extensions (without hash calculation) at runtime.

Requirements

  • Python 3.6 or higher
  • Required Python packages:
    • opencv-python
    • Pillow
    • imagehash
    • tqdm

You can install the required packages using pip:

pip install opencv-python Pillow imagehash tqdm

How to Use

  1. Clone the repository:

    git clone https://github.com/happy91512/MediaDeduplicationTool.git
    cd MediaDeduplicationTool
  2. Run the script:

    python deduplicate.py
  3. Input Parameters:

    • Enter the disk locations to check (separated by commas).
    • Enter the output folder path where unique files will be saved.
    • Optionally include videos with errors in the copy list.
    • Optionally specify file extensions that should not undergo hash calculation (e.g., mp4, mov).
  4. Results:

    • The script will output the number of unique images and videos processed.
    • A CSV file named similar_files.csv will be created in the output folder, listing similar files detected during the process.

Usage Example

When prompted, input your disk locations and output folder as follows:

Enter the disk locations to check (separated by commas): D:\Media, E:\Photos
Enter the output folder path: D:\UniqueMedia
Include videos with errors in the copy list? (y/n): y
Enter specific file extensions without hash calculation (separated by commas): mp4, mov

About

The Media Deduplication Tool is a Python application designed to identify and eliminate duplicate images and videos from specified directories on your disk. It utilizes perceptual hashing to compare files and create a report of similar files. This tool is particularly useful for organizing large media libraries and freeing up disk space.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages