Extract Structured Data from Any Site with JsonGenius And AI

JsonGenius uses schema-based extraction and Docker to let you quickly scrape website data. No more painful scraper development.

JsonGenius is a robust, self-hosted, AI-powered scraping API written in Go that makes it easy to extract structured data (defined by a JSON Schema) from any webpage. It uses Chromium to render pages like a normal browser, so it works on complex sites.

With JsonGenius and Docker, you can set up a scraping API to pull data from sites in just a few minutes. Define the schema you want, send a URL, and JsonGenius will return extracted data matching your schema. This makes it easy to collect and work with all kinds of web data.

How to use it:

1. Clone JsonGenius from Github and navigating to the jsongenius directory:

git clone https://github.com/semanser/jsongenius
cd jsongenius

2. Insert your OpenAI API Key:

export OPEN_AI_KEY=...

3. Run docker-compose up and the API will be available at http://localhost:3001.

4. To scrape a website, provide its URL and a desired JSON Schema to extract data:

curl -X POST -H "Content-Type: application/json" -d '{
  "url": "/path/to/",
  "schema": {
    "type": "object",
    "properties": {
      "products": {
        "type": "array",
        "items": {
          "type": "object",
          "properties": {
            "name": {
              "type": "string",
              "description": "The product name"
            },
            "price": {
              "type": "number",
              "description": "The price of the product in USD"
            }
          }
        }
      }
    }
  }
}' http://localhost:3001/lookup

Leave a Reply

Your email address will not be published. Required fields are marked *