GitHub repositories contain a wealth of code and documentation, but transforming them into a format suitable for AI analysis can be challenging.
Git2txt solves this by downloading any public GitHub repository and compiling its contents into a format LLMs can readily process. This open-source CLI tool benefits developers, AI researchers, and anyone working with code analysis, documentation generation, or AI model training.
Features
- Download any public GitHub repository: Git2txt fetches any public repository via HTTPS, SSH, or short-format URLs.
- Convert repository contents to a single text file: It compiles all compatible files into one text file for easy LLM ingestion.
- Automatic binary file exclusion: Git2txt automatically skips binary files, keeping the output text-focused.
- Configurable file size threshold: You can set a size limit, excluding larger files from the conversion process. The default is 100KB.
- Cross-platform support: Git2txt works on Windows, macOS, and Linux systems.
- Recursive file processing: It processes files throughout all subdirectories excluding “node_modules” and “.git” by default.
- Clear file markers: File paths and contents are separated by clear markers in the output file for easy readability and parsing.
- Relative path preservation: Git2txt maintains relative file paths in the output, preserving the repository’s structure.
Use Cases
- AI Model Training: Feed entire codebases into LLMs for code understanding, generation, or bug detection tasks.
- Codebase Documentation: Generate comprehensive documentation by converting the repository into a text format suitable for documentation tools.
- Code Analysis and Research: Analyze code structure, style, and content across entire repositories by processing them as a single text file.
- Repository Summarization: Quickly summarize the contents of a repository by converting it to text and using summarization tools.
- Code Migration and Refactoring: Facilitate code migration or refactoring projects by converting the codebase to a searchable and analyzable text format.
Installation
Install Git2txt globally using npm with the command:
npm install -g git2txtUsage
1. Once installed, you can convert any GitHub repository by running the following command:
git2txt https://github.com/username/repository2. Use the --output option to specify a custom filename:
git2txt username/repository --output=myoutput.txt3. Set a file size threshold to exclude large files (in MB):
git2txt username/repository --threshold=24. To include all files without size/type exclusions, use the --include-all flag:
git2txt username/repository --include-all5. If you encounter issues, you can enable debug mode for more verbose logging:
git2txt username/repository --debug6. Git2txt generates a text file with clear markers separating file paths, sizes, and contents, like this:
================================
File: /path/to/scriptbyai.txt
Size: 2.5 MB
================================
[File contents here]
================================
File: /path/to/scriptbyai-2.txt
Size: 5.5 KB
================================
[File contents here]
...
Pros
- Simple and Efficient: Git2txt provides a straightforward way to convert repositories.
- LLM-Ready Output: The generated text format is ideal for processing with Large Language Models.
- Flexible and Customizable: Various options allow for control over file inclusion and output.
- Free and Open Source: Benefit from a cost-free tool with community support and transparency.
- Cross-Platform Compatibility: Works across major operating systems without compatibility issues.
Cons
- Large Repositories: Processing very large repositories may take time and significant system resources.
- Limited Preprocessing: The tool does not offer advanced preprocessing options such as code normalization or cleaning beyond basic filtering.
- Command-Line Only: It lacks a graphical user interface, which might be a barrier for some users.
FAQs
Q: What types of GitHub repositories can Git2txt handle?
A: Git2txt can download and convert any public GitHub repository, whether accessed via HTTPS, SSH, or short format URLs.
Q: Can I customize which files are included in the conversion?
A: Yes, you can set a file size threshold or use the --include-all option to bypass size and type filtering.
Q: How does Git2txt handle binary files?
A: Git2txt automatically excludes binary files by default to keep the output text-focused.
Q: Is there a limit to the size of the repository I can convert?
A: While there is no strict limit, very large repositories may require significant time and system resources to process.
Q: What if I encounter errors or need help?
A: You can use the --debug option for verbose logging or consult the tool’s documentation and community forums for support.









