WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com

Compressa.ai

All

15 repositories

compressa-deploy
Public
0•0•0•1•Updated Nov 1, 2025Nov 1, 2025
compressa-perf
Public
Python
•
MIT License
•1•2•0•0•Updated Oct 30, 2025Oct 30, 2025
compressa-unstructured-api
Public
unstructured-api fork with GPU inference support
Python
•
Apache License 2.0
•181•0•0•0•Updated Oct 6, 2025Oct 6, 2025
compressa-guidance
Public
A guidance language for controlling large language models. (Qwen compatible)
Jupyter Notebook
•
MIT License
•1.1k•0•0•0•Updated Oct 2, 2025Oct 2, 2025
compressa-ai.github.io
Public
HTML
•0•0•0•0•Updated Jul 11, 2025Jul 11, 2025
vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•12k•0•0•0•Updated Oct 26, 2024Oct 26, 2024
langchain_compressa
Public
Python
•
MIT License
•1•0•0•0•Updated Jul 18, 2024Jul 18, 2024
qlora
Public
QLoRA: Efficient Finetuning of Quantized LLMs
Jupyter Notebook
•
MIT License
•870•0•0•0•Updated Nov 20, 2023Nov 20, 2023
llm-awq
Public
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python
•
MIT License
•286•1•0•0•Updated Nov 20, 2023Nov 20, 2023
OmniQuant
Public
OmniQuant is a simple and powerful quantization technique for LLMs.
Python
•72•0•0•0•Updated Nov 8, 2023Nov 8, 2023
rulm
Public
Language modeling and instruction tuning for Russian
Jupyter Notebook
•
Apache License 2.0
•49•0•0•0•Updated Oct 18, 2023Oct 18, 2023
AutoAWQ
Public
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference.
C++
•
MIT License
•295•0•0•0•Updated Oct 16, 2023Oct 16, 2023
smoothquant
Public
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Python
•
MIT License
•190•0•0•0•Updated Oct 13, 2023Oct 13, 2023
peft
Public
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Python
•
Apache License 2.0
•2.1k•0•0•0•Updated Sep 25, 2023Sep 25, 2023
neural-compressor
Public
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
Python
•
Apache License 2.0
•290•0•0•0•Updated Aug 16, 2023Aug 16, 2023