WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content
Change the repository type filter

All

    Repositories list

    • 0001Updated Nov 1, 2025Nov 1, 2025
    • Python
      1200Updated Oct 30, 2025Oct 30, 2025
    • unstructured-api fork with GPU inference support
      Python
      181000Updated Oct 6, 2025Oct 6, 2025
    • A guidance language for controlling large language models. (Qwen compatible)
      Jupyter Notebook
      1.1k000Updated Oct 2, 2025Oct 2, 2025
    • HTML
      0000Updated Jul 11, 2025Jul 11, 2025
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      12k000Updated Oct 26, 2024Oct 26, 2024
    • Python
      1000Updated Jul 18, 2024Jul 18, 2024
    • qlora

      Public
      QLoRA: Efficient Finetuning of Quantized LLMs
      Jupyter Notebook
      870000Updated Nov 20, 2023Nov 20, 2023
    • llm-awq

      Public
      AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
      Python
      286100Updated Nov 20, 2023Nov 20, 2023
    • OmniQuant

      Public
      OmniQuant is a simple and powerful quantization technique for LLMs.
      Python
      72000Updated Nov 8, 2023Nov 8, 2023
    • rulm

      Public
      Language modeling and instruction tuning for Russian
      Jupyter Notebook
      49000Updated Oct 18, 2023Oct 18, 2023
    • AutoAWQ

      Public
      AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference.
      C++
      295000Updated Oct 16, 2023Oct 16, 2023
    • [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
      Python
      190000Updated Oct 13, 2023Oct 13, 2023
    • peft

      Public
      🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
      Python
      2.1k000Updated Sep 25, 2023Sep 25, 2023
    • Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
      Python
      290000Updated Aug 16, 2023Aug 16, 2023