GPT Crawler is a TypeScript library that transforms any website into a valuable knowledge base.
It crawls the site(s) you specify to extract key information and formats it into a JSON file that can be used in your own custom GPTs or OpenAI Assistant API. You will get an intelligent agent with domain-specific knowledge from any site.
How to use it:
1. To get started, make sure you have Node.js (>= 16) and Playwright installed.
2. Clone the GPT Crawler repo from GitHub and install necessary dependencies.
git clone https://github.com/builderio/gpt-crawler
npm i
3. Edit the config.ts
file to specify the URL and selectors for the content you wish to crawl.
export const config: Config = { url: "/path/to/", match: "/path/to/docs/**", selector: `.main-container`, maxPagesToCrawl: 50, // max number of pages to crawl outputFileName: "knowledge.json", };
4. Run the crawler using npm start
.
5. Upload the knowledge.json
to your custom GPTs or OpenAI Assistant.

6. Here is a custom GPT demonstrating how to scrape data from builder.io and create an AI assistant.