WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content
@kvcache-ai

kvcache.ai

KVCache.AI is a joint research project between MADSys and top industry collaborators, focusing on efficient LLM serving.

Pinned Loading

  1. Mooncake Mooncake Public

    Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

    C++ 4.4k 471

  2. ktransformers ktransformers Public

    A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

    Python 16.2k 1.2k

  3. TrEnv-X TrEnv-X Public

    Go 70 2

Repositories

Showing 10 of 10 repositories
  • Mooncake Public

    Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

    kvcache-ai/Mooncake’s past year of commit activity
    C++ 4,419 Apache-2.0 471 217 (12 issues need help) 69 Updated Dec 15, 2025
  • sglang Public Forked from sgl-project/sglang

    SGLang is a fast serving framework for large language models and vision language models.

    kvcache-ai/sglang’s past year of commit activity
    Python 4 Apache-2.0 3,758 0 1 Updated Dec 15, 2025
  • ktransformers Public

    A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

    kvcache-ai/ktransformers’s past year of commit activity
    Python 16,201 Apache-2.0 1,183 382 (1 issue needs help) 3 Updated Dec 12, 2025
  • sglang_awq Public Forked from sgl-project/sglang

    SGLang is a fast serving framework for large language models and vision language models.

    kvcache-ai/sglang_awq’s past year of commit activity
    Python 1 Apache-2.0 3,761 0 0 Updated Dec 10, 2025
  • gpustack Public Forked from gpustack/gpustack

    GPU cluster manager for optimized AI model deployment

    kvcache-ai/gpustack’s past year of commit activity
    Python 0 Apache-2.0 429 0 0 Updated Dec 7, 2025
  • TrEnv-X Public
    kvcache-ai/TrEnv-X’s past year of commit activity
    Go 70 Apache-2.0 2 0 0 Updated Sep 15, 2025
  • sglang-npu Public Forked from sgl-project/sglang

    SGLang is a fast serving framework for large language models and vision language models.

    kvcache-ai/sglang-npu’s past year of commit activity
    Python 0 Apache-2.0 3,761 0 0 Updated Aug 12, 2025
  • DeepEP_fault_tolerance Public Forked from deepseek-ai/DeepEP

    DeepEP: an efficient expert-parallel communication library that supports fault tolerance

    kvcache-ai/DeepEP_fault_tolerance’s past year of commit activity
    Cuda 3 MIT 1,031 0 0 Updated Jul 31, 2025
  • custom_flashinfer Public Forked from flashinfer-ai/flashinfer

    FlashInfer: Kernel Library for LLM Serving

    kvcache-ai/custom_flashinfer’s past year of commit activity
    Cuda 5 Apache-2.0 596 0 0 Updated Jul 24, 2025
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    kvcache-ai/vllm’s past year of commit activity
    Python 14 Apache-2.0 12,081 0 0 Updated Mar 27, 2025

Most used topics

Loading…