JsonGenius is a robust, self-hosted, AI-powered scraping API written in Go that makes it easy to extract structured data (defined by a JSON Schema) from any webpage. It uses Chromium to render pages like a normal browser, so it works on complex sites.
With JsonGenius and Docker, you can set up a scraping API to pull data from sites in just a few minutes. Define the schema you want, send a URL, and JsonGenius will return extracted data matching your schema. This makes it easy to collect and work with all kinds of web data.
How to use it:
1. Clone JsonGenius from Github and navigating to the jsongenius directory:
git clone https://github.com/semanser/jsongenius cd jsongenius
2. Insert your OpenAI API Key:
export OPEN_AI_KEY=...
3. Run docker-compose up
and the API will be available at http://localhost:3001.
4. To scrape a website, provide its URL and a desired JSON Schema to extract data:
curl -X POST -H "Content-Type: application/json" -d '{ "url": "/path/to/", "schema": { "type": "object", "properties": { "products": { "type": "array", "items": { "type": "object", "properties": { "name": { "type": "string", "description": "The product name" }, "price": { "type": "number", "description": "The price of the product in USD" } } } } } } }' http://localhost:3001/lookup