Chisel is a powerful preprocessing tool designed to enhance image datasets by filtering out low-quality images and preparing them for AI training. It ensures that your dataset is both high-quality and ethically sound, streamlining the workflow for artists, developers, and researchers.
Chisel serves as the preprocessing powerhouse within the MichaelAngel.io pipeline. Its primary function is to filter out low-quality images and prepare them for AI training by performing the following tasks:
- Quality Control: Removes duplicates and low-quality images based on size and blurriness.
- Consistent Resizing: Ensures all images meet specified dimensions for uniformity.
- Metadata Preservation: Retains essential metadata and attribution information.
- Batch Processing: Efficiently handles large volumes of images using parallel processing.
By automating these critical preprocessing steps, Chisel ensures that your image datasets are clean, organized, and ready for AI-driven creative projects.
- Automated Quality Checks: Filters images based on size and blurriness to maintain high dataset quality.
- Metadata Extraction: Extracts and preserves EXIF metadata for comprehensive dataset information.
- Duplicate Detection: Identifies and removes duplicate images using content hashing.
- Image Enhancement: Enhances image properties such as brightness, contrast, color, and sharpness.
- Flexible Output Formats: Supports saving processed images in JPEG, PNG, or NumPy array formats.
- Efficient Processing: Utilizes parallel processing to handle large datasets swiftly.
- Comprehensive Logging: Logs all preprocessing activities for transparency and troubleshooting.
- Python 3.8+
- Git
git clone https://github.com/M1ck4/MichaelAngel.io.git
cd MichaelAngel.io/chisel
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Ensure you have pip installed, then run:
pip install -r requirements.txt
If you encounter issues with specific packages, ensure you have the necessary system libraries installed, especially for opencv-python
.
Chisel can be executed via the command line with various arguments to customize its behavior.
python Chisel.py --image_folder <input_folder> --output_folder <output_folder> [options]
image_folder
: Path to the input folder containing images to be processed.output_folder
: Path to the output folder where processed images and metadata will be saved.
target_size
: Target size for image resizing in the formatwidth height
. Default is256 256
.output_format
: Output format for processed images. Choices areJPEG
,PNG
,numpy
. Default isJPEG
.min_size
: Minimum size (in pixels) for image quality. Images smaller than this will be excluded. Default is128
.blur_threshold
: Blur threshold for image quality check. Images with blurriness below this threshold will be excluded. Default is100.0
.enhancement_factors
: Enhancement factors for brightness, contrast, color, and sharpness in the formatbrightness contrast color sharpness
. Default is1.0 1.0 1.0 1.0
.quality
: Quality for JPEG output (1-100). Default is85
.
python Chisel.py --image_folder ./raw_images --output_folder ./processed_images
This command processes images from ./raw_images
, resizes them to 256x256
, and saves them in JPEG format.
Save processed images in PNG format instead of the default JPEG.
python Chisel.py --image_folder ./raw_images --output_folder ./processed_images --output_format PNG
Resize images to custom dimensions.
python Chisel.py --image_folder ./raw_images --output_folder ./processed_images --target_size 512 512
Enhance brightness and contrast of images.
python Chisel.py --image_folder ./raw_images --output_folder ./processed_images --enhancement_factors 1.2 1.1 1.0 1.0
Perform advanced preprocessing with custom settings.
python Chisel.py --image_folder ./raw_images --output_folder ./processed_images --target_size 512 512 --output_format numpy --min_size 150 --blur_threshold 120.0 --enhancement_factors 1.2 1.1 1.3 1.0 --quality 90
Chisel provides flexibility through various command-line arguments, allowing you to tailor the preprocessing pipeline to your specific needs.
- Minimum Size (
--min_size
): Ensures that only images above a certain resolution are included. - Blur Threshold (
--blur_threshold
): Filters out blurry images based on variance in Laplacian.
- Brightness: Adjusts the brightness level.
- Contrast: Modifies the contrast of the image.
- Color: Alters the color saturation.
- Sharpness: Changes the sharpness of the image.
- Target Size (
--target_size
): Specifies the dimensions to which images will be resized. - Output Format (
--output_format
): Determines the format in which processed images are saved. - Quality (
--quality
): Sets the quality level for JPEG images.
Chisel maintains a comprehensive log of all preprocessing activities to facilitate transparency and troubleshooting. The logs are saved in a file named preprocessing_log.txt
located in the root directory of the Chisel project.
2024-04-27 12:34:56,789 - INFO - Starting preprocessing for images in ./raw_images
2024-04-27 12:35:01,123 - INFO - Image ./raw_images/image1.jpg is too small.
2024-04-27 12:35:05,456 - ERROR - Error processing ./raw_images/image2.jpg: [Errno 2] No such file or directory
2024-04-27 12:35:10,789 - INFO - Successfully processed and saved ./raw_images/image3.jpg
We welcome contributions to enhance Chisel and make it even more robust and user-friendly. Whether you're fixing bugs, adding new features, or improving documentation, your efforts are highly appreciated!
-
Fork the Repository: Click the "Fork" button at the top-right corner of the repository page to create a personal copy.
-
Clone Your Fork:
git clone https://github.com/M1ck4/MichaelAngel.io.git
-
Create a New Branch:
git checkout -b feature/your-feature-name
-
Make Your Changes: Implement your feature or bug fix.
-
Commit Your Changes:
git commit -m "Add feature: your feature description"
-
Push to Your Fork:
git push origin feature/your-feature-name
-
Create a Pull Request: Navigate to the original repository and click on "Compare & pull request" to submit your changes for review.
- Code Quality: Ensure your code follows Python best practices and is well-documented.
- Testing: Test your changes thoroughly before submitting.
- Documentation: Update the
README.md
or other documentation files if your changes affect usage or functionality.
- Issue: Image is too small or too blurry.
- Solution: Adjust
--min_size
and--blur_threshold
parameters to be more lenient or ensure your dataset contains high-quality images.
- Issue: Missing Python packages.
- Solution: Ensure all dependencies are installed by running
pip install -r requirements.txt
. For issues withopencv-python
, ensure you have the necessary system libraries.
- Issue: Lack of permissions to read/write files.
- Solution: Check directory permissions and ensure you have the necessary rights to access the folders.
We extend our heartfelt gratitude to the following individuals, organizations, and tools that have made Chisel possible:
- Unsplash: Providing high-quality, freely usable images.
- Pixabay: A vast collection of free images and videos.
- Flickr: A platform for sharing creative work and images.
- TensorFlow: An open-source platform for machine learning.
- PyTorch: A deep learning framework that accelerates the path from research prototyping to production deployment.
- Pillow: Python Imaging Library for image processing.
- NumPy: Fundamental package for scientific computing with Python.
- OpenCV: Open Source Computer Vision Library.
- tqdm: Fast, extensible progress bar for Python.
- hashlib: Secure hashes and message digests.
- Contributors: Special thanks to all the contributors who have participated in this project.
- Community Support: Grateful for the continuous support and feedback from our community members.
- Open Source Community: For being a constant source of inspiration and fostering a collaborative spirit.
- Educators, Artists, Developers, and Researchers: Your innovative ideas and creative inputs drive our mission forward.
Ensure all dependencies are installed by running:
pip install -r requirements.txt
Pillow
numpy
opencv-python
tqdm
Ensure that you have the latest versions of these packages for optimal performance.