Transform Websites into LLM-Friendly Markdown – Markdowner

The all-in-one solution for converting websites into LLM-compatible markdown. Free API, detailed output, and easy self-hosting.

Markdowner is a tool designed to convert any website into markdown data optimized for large language models (LLMs).

Its core functionality relies on Cloudflare’s Browser rendering and Durable Objects technology to spin up browser instances and subsequently transform the rendered website into markdown using the Turndown library.

Subscribe to our newsletter and get the top 10 AI tools and apps delivered straight to your inbox. Subscribe now!

How to use it:

1. Make a GET request to https://md.dhr.wtf: This initiates the conversion process.

2. Append the url parameter with the website URL you want to convert.

  • For plain text responses, include the header Content-Type: text/plain.
  • For JSON responses, include the header Content-Type: application/json.
$ curl 'https://md.dhr.wtf/?url=https://scriptbyai.com'

3. Optional Parameters:

  • enableDetailedResponse (boolean: false): Include this parameter with a value of true to receive the full HTML content within the markdown.
  • crawlSubpages (boolean: false): Set this parameter to true to automatically crawl and convert up to 10 subpages linked to the provided URL.
  • llmFilter (boolean: false): Activate this parameter with a value of true to filter out potentially irrelevant content using an LLM, further optimizing the markdown for your AI projects.

Self-Hosting:

To self-host Markdowner, follow these steps:

1. Clone the repository from GitHub and download dependencies:

git clone https://github.com/dhravya/markdowner
npm i

2. Create a KV namespace for caching.

npx wrangler kv:namespace create md_cache

3. Open the Wrangler.toml file and update the IDs accordingly.

4. Deploy the project:

npm run deploy

How It Works:

User Interaction:

    • Users make requests to the Markdowner API, providing the URL of the website they wish to convert into markdown.
    • The requests are processed by the Worker, which acts as the core processing unit.

    IP-based Rate Limiting:

      • To prevent abuse, the Worker implements an IP-based rate limiter. This ensures fair usage and protects against excessive or malicious requests.

      Browser Rendering:

        • The Worker uses Cloudflare’s Browser Rendering to get the page contents. This step involves spinning up browser instances that can render the webpage as if viewed in a regular browser.

        Durable Objects:

          • Durable Objects are used to persist browser sessions. This allows for session reuse, which can optimize performance and resource usage.

          Content Processing and Caching:

            • The rendered page contents are processed, and if LLM filtering is enabled, unnecessary information is filtered out.
            • The processed data is saved in KV (Key-Value) storage. This caching mechanism helps in reducing redundant processing and speeds up subsequent requests for the same content.
            • Cache responses are stored with a time-to-live (TTL) of 1 hour, ensuring that the data is relatively fresh but not outdated.

            Vector Generation:

              • For enhanced processing, the content may be further refined using Workers AI. This step involves generating vectors that can be used for various AI-related tasks, such as filtering and content analysis.

              Returning Results:

                • The final markdown data is generated and returned to the user. The Worker ensures that the response is in the desired format, either as plain text or JSON, based on the request headers.

                FAQs:

                Q: What should I do if I encounter Error 1101: Worker threw exception?

                A: This error typically arises when the website you’re trying to convert has restrictions in place that prevent crawling by bots or automated tools. Markdowner needs to read the content of the website to convert it to markdown, so if crawling is blocked, the API cannot function properly.

                Leave a Reply

                Your email address will not be published. Required fields are marked *

                Get the latest & top AI tools sent directly to your email.

                Subscribe now to explore the latest & top AI tools and resources, all in one convenient newsletter. No spam, we promise!