Detect Objects in Images With Claude Vision API

Claude Vision Object Detection is a free, open-source Python script that quickly and accurately detects and labels objects in images.

It uses Claude 3.5 Sonnet Vision API to process images, draw bounding boxes, and display precise confidence scores.

Features

Batch Processing: Analyze single images or entire directories.
Automatic Object Detection: Identifies and outlines objects with bounding boxes.
Confidence Scores: Provides precision scores for each detected object.
Distinct Visualizations: Uses a unique color for each identified object.
Annotated Image Saving: Saves processed images with detections overlaid.
Support for various image formats: including JPEG, PNG, GIF, and WebP.
Comprehensive Error Handling: Addresses invalid paths, unsupported formats, API issues, and processing errors.

Use Cases

Content Tagging: Automatically tag images for websites and databases.
Image Analysis for Research: Accelerate data collection for computer vision projects.
Security Systems: Enhance object recognition in surveillance footage.
Retail Analytics: Track products and customer behavior in stores.
Robotics: Improve object recognition for navigation and manipulation.

How To Use It

To use Claude Vision Object Detection, follow these steps:

1. Clone the Repository: Run git clone https://github.com/doriandarko/claude-vision-object-detection.git to download the source code.

2. Navigate to Directory: Move to the project folder using cd claude-vision-detection.

3. Install Dependencies: Install required Python packages with pip install -r requirements.txt.

4. Add API Key: In the project root, create a .env file with your Anthropic API key (ANTHROPIC_API_KEY=your_api_key_here).

5. Run the Script: Launch the script using python main.py, entering the path to either a single image or a directory.

6. Review Results: Annotated images are saved in an output folder for easy access.

Pros

Free and Open-Source: No cost barrier for access and modification.
Easy Setup: Straightforward installation process.
Batch Processing: Efficient handling of multiple images.
Clear Visualizations: Easy-to-interpret object identification.
Accurate Detection: Utilizes the advanced Claude 3.5 model.

Cons

Requires Coding Knowledge: Python proficiency needed for setup and use.
Anthropic API Key Required: Needs access to the Anthropic API.
Performance Dependent on API: Processing speed tied to API responsiveness.

How It Works

1. Initialization and Environment Setup
The ClaudeVisionProcessor initializes by loading necessary environment variables, including the Anthropic API key. It also sets up an output directory to store processed images with detected objects.

2. Encoding Images
Images are read and converted to base64-encoded strings, which allows efficient transmission for analysis. The script detects the image file type (JPEG, PNG, GIF, or WebP) and assigns the appropriate media type.

3. Processing Images for Object Detection
When provided with a single image file or a directory, the ClaudeVisionProcessor identifies and reads all compatible image files. Using the Claude API, the processor sends images for object detection, focusing on identifying objects with high precision. For each image, it receives a JSON response containing detected objects, their bounding box coordinates (normalized), and confidence scores.

4. Random Color Generation
For visual clarity, each bounding box is drawn using a unique, vibrant color. This is achieved by generating random colors in the HSV color space to ensure brightness and color variety.

5. Drawing Bounding Boxes
Bounding boxes are drawn directly on each image using the normalized coordinates. The processor scales the coordinates to the image dimensions and draws rectangles with a thick outline. Each bounding box is labeled with the object’s name and confidence score, displayed against a black background for contrast.

6. Output
Each processed image is saved in the output directory with a filename indicating that it has undergone object detection. The console displays each step, helping users track the progress and any errors.

7. User Interaction
Users provide an image file path or directory for processing. After processing, all annotated images are accessible in the output directory. This setup allows easy visualization and interpretation of detected objects in any provided images.

Related Resources

FAQs

Q: What’s the minimum Python version required?
A: Python 3.7 or higher is required for compatibility.

Q: Can it process videos?
A: Currently, only static images are supported.

Q: What happens if the API is unavailable?
A: The script includes error handling to manage API issues gracefully.

Q: Can I customize the colors of bounding boxes?
A: Yes, the tool uses a vibrant color palette, but you may modify the code if specific color requirements are needed.

Q: Where are labeled images saved?
A: All processed images are saved in an output directory within the current working directory, with the prefix “detected_” added to each filename.

Try Claude Vision Object Detection today and enhance your image analysis workflow. If you find it useful, share your experience and feedback in the comments below.