Free CLI Tool Converts GitHub Repos to Text for LLMs – git2txt

An open-source CLI tool that converts any GitHub repository to a single text file, perfect for LLMs and code analysis.

GitHub repositories contain a wealth of code and documentation, but transforming them into a format suitable for AI analysis can be challenging.

Git2txt solves this by downloading any public GitHub repository and compiling its contents into a format LLMs can readily process. This open-source CLI tool benefits developers, AI researchers, and anyone working with code analysis, documentation generation, or AI model training.

Features

  • Download any public GitHub repository: Git2txt fetches any public repository via HTTPS, SSH, or short-format URLs.
  • Convert repository contents to a single text file: It compiles all compatible files into one text file for easy LLM ingestion.
  • Automatic binary file exclusion: Git2txt automatically skips binary files, keeping the output text-focused.
  • Configurable file size threshold: You can set a size limit, excluding larger files from the conversion process. The default is 100KB.
  • Cross-platform support: Git2txt works on Windows, macOS, and Linux systems.
  • Recursive file processing: It processes files throughout all subdirectories excluding “node_modules” and “.git” by default.
  • Clear file markers: File paths and contents are separated by clear markers in the output file for easy readability and parsing.
  • Relative path preservation: Git2txt maintains relative file paths in the output, preserving the repository’s structure.

Use Cases

  • AI Model Training: Feed entire codebases into LLMs for code understanding, generation, or bug detection tasks.
  • Codebase Documentation: Generate comprehensive documentation by converting the repository into a text format suitable for documentation tools.
  • Code Analysis and Research: Analyze code structure, style, and content across entire repositories by processing them as a single text file.
  • Repository Summarization: Quickly summarize the contents of a repository by converting it to text and using summarization tools.
  • Code Migration and Refactoring: Facilitate code migration or refactoring projects by converting the codebase to a searchable and analyzable text format.

Installation

Install Git2txt globally using npm with the command:

npm install -g git2txt

    Usage

    1. Once installed, you can convert any GitHub repository by running the following command:

      git2txt https://github.com/username/repository

      2. Use the --output option to specify a custom filename:

        git2txt username/repository --output=myoutput.txt

        3. Set a file size threshold to exclude large files (in MB):

          git2txt username/repository --threshold=2

          4. To include all files without size/type exclusions, use the --include-all flag:

            git2txt username/repository --include-all

            5. If you encounter issues, you can enable debug mode for more verbose logging:

              git2txt username/repository --debug

              6. Git2txt generates a text file with clear markers separating file paths, sizes, and contents, like this:

              ================================
              File: /path/to/scriptbyai.txt
              Size: 2.5 MB
              ================================

              [File contents here]

              ================================
              File: /path/to/scriptbyai-2.txt
              Size: 5.5 KB
              ================================

              [File contents here]

              ...

              Pros

              • Simple and Efficient: Git2txt provides a straightforward way to convert repositories.
              • LLM-Ready Output: The generated text format is ideal for processing with Large Language Models.
              • Flexible and Customizable: Various options allow for control over file inclusion and output.
              • Free and Open Source: Benefit from a cost-free tool with community support and transparency.
              • Cross-Platform Compatibility: Works across major operating systems without compatibility issues.

              Cons

              • Large Repositories: Processing very large repositories may take time and significant system resources.
              • Limited Preprocessing: The tool does not offer advanced preprocessing options such as code normalization or cleaning beyond basic filtering.
              • Command-Line Only: It lacks a graphical user interface, which might be a barrier for some users.

              FAQs

              Q: What types of GitHub repositories can Git2txt handle?
              A: Git2txt can download and convert any public GitHub repository, whether accessed via HTTPS, SSH, or short format URLs.

              Q: Can I customize which files are included in the conversion?
              A: Yes, you can set a file size threshold or use the --include-all option to bypass size and type filtering.

              Q: How does Git2txt handle binary files?
              A: Git2txt automatically excludes binary files by default to keep the output text-focused.

              Q: Is there a limit to the size of the repository I can convert?
              A: While there is no strict limit, very large repositories may require significant time and system resources to process.

              Q: What if I encounter errors or need help?
              A: You can use the --debug option for verbose logging or consult the tool’s documentation and community forums for support.

                Related Resources

                Leave a Reply

                Your email address will not be published. Required fields are marked *

                Get the latest & top AI tools sent directly to your email.

                Subscribe now to explore the latest & top AI tools and resources, all in one convenient newsletter. No spam, we promise!