WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

lucien-loua/llms.txt

Repository files navigation

llms.txt

llms.txt OpenGraph preview

llms-txt.mp4

Project Overview

llms.txt is a web application for generating consolidated text files from websites, designed for Large Language Model training and inference. It produces:

  • llms.txt: An index of site pages with AI-generated titles and descriptions.
  • llms-full.txt: The full plain text content of all crawled pages.

The project uses Firecrawl for crawling/scraping and OpenAI for generating titles and descriptions.

Technologies

Prerequisites

  • Node.js >= 20
  • pnpm (recommended)

API Key Configuration

  • Firecrawl: Get your key here. Provide it in the UI (Settings) or in a .env file at the project root (FIRECRAWL_API_KEY).
  • OpenAI: Set your key in .env (OPENAI_API_KEY) or export it in your shell.

Example .env file:

FIRECRAWL_API_KEY=fc-...
OPENAI_API_KEY=sk-...

Installation

  1. Clone the repository and install dependencies:
    pnpm install
  2. Add your API keys to the .env file at the project root (see previous section).
  3. Start the development server:
    pnpm dev
    The app will be accessible at http://localhost:3000.

Usage

  1. Enter the website URL to crawl in the input field.
  2. Make sure your API keys are configured (see Settings or .env).
  3. Start the generation and monitor progress.
  4. Download the generated files (llms.txt, llms-full.txt).

Customization

  • Maximum number of URLs to crawl is configurable in the UI.

Resources


© 2025 - Open source project under the MIT license.

About

Generate consolidated text files from websites for LLM training and inference.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published