GPT Crawler is a TypeScript library that transforms any website into a valuable knowledge base.
It crawls the site(s) you specify to extract key information and formats it into a JSON file that can be used in your own custom GPTs or OpenAI Assistant API. You will get an intelligent agent with domain-specific knowledge from any site.
How to use it:
1. To get started, make sure you have Node.js (>= 16) and Playwright installed.
2. Clone the GPT Crawler repo from GitHub and install necessary dependencies.
git clone https://github.com/builderio/gpt-crawler
npm i
3. Edit the config.ts file to specify the URL and selectors for the content you wish to crawl.
export const config: Config = {
url: "/path/to/",
match: "/path/to/docs/**",
selector: `.main-container`,
maxPagesToCrawl: 50, // max number of pages to crawl
outputFileName: "knowledge.json",
};4. Run the crawler using npm start.
5. Upload the knowledge.json to your custom GPTs or OpenAI Assistant.

6. Here is a custom GPT demonstrating how to scrape data from builder.io and create an AI assistant.










