Analyze And Understand Images With Gemini Pro

VisionGPT is an open-source AI tool for quick and precise image analysis.

You upload a photo, and VisionGPT processes it through a Next.js API route and uses the Gemini Pro Vision model to deliver detailed AI insights within seconds.

You can deploy it locally on your machine or self-host it on your preferred server by following the tutorial below.

Additionally, VisionGPT provides a Live Demo where you can register for a free account and experience its capabilities firsthand.

How To Deploy It:

1. Clone the VisionGPT repository from GitHub:

git clone https://github.com/megoxv/visionGPT

2. Create a .env file in the project’s root directory and insert your own API key and other necessary credentials as shown below:

NEXT_PUBLIC_DOMAIN_URL="http://localhost:3000"

# Create ur api key here https://aistudio.google.com/app/apikey
NEXT_PUBLIC_GOOGLE_AI_API_KEY=""

# Google Analytics
NEXT_PUBLIC_MEASUREMENT_ID=G-XXXXXXXXXX

GOOGLE_CLIENT_ID=""
GOOGLE_CLIENT_SECRET=""

#STRIPE
STRIPE_SECRET_KEY=""
STRIPE_WEBHOOK_SECRET=""
NEXT_PUBLIC_STRIPE_PUBLISHABLE_KEY=""


# This was inserted by `prisma init`:
# Environment variables declared in this file are automatically made available to Prisma.
# See the documentation for more detail: https://pris.ly/d/prisma-schema#accessing-environment-variables-from-the-schema

# Prisma supports the native connection string format for PostgreSQL, MySQL, SQLite, SQL Server, MongoDB and CockroachDB.
# See the documentation for all the connection string options: https://pris.ly/d/connection-strings

DATABASE_URL="postgresql://megoxv:mypassword@localhost:5432/mydb?schema=sample"

3. Run the following command to install the necessary dependencies:

npm install

4. Start the app with the command npm run dev. You can then access your local instance of VisionGPT at http://localhost:3000.

5. Upon uploading an image of a woman wearing a crown and sunglasses in a bathtub surrounded by pink flowers, VisionGPT provides the following analysis:

'This image shows a woman wearing a crown and sunglasses, sitting in a bathtub. She is surrounded by pink flowers. The image is taken from a low angle, which makes the woman look powerful and confident. The woman's expression is serious, which adds to the sense of power and confidence. The image is well-lit, which brings out the details of the woman's face and clothing. The overall effect of the image is one of glamour and luxury.'

As you can see, VisionGPT successfully identified the key elements within the image, even discerning details like the woman’s expression and the overall ambiance of the scene.