Skip to content

zedarvates/CUDAgraveyard

Repository files navigation

CUDAgraveyard ⚰️

The final resting place for obsolete CUDA kernels.
Autonomous AI that discovers faster, more energy-efficient kernels than any human engineer – burying cuBLAS, FlashAttention, and Triton in eternal obsolescence. No mercy, no weekends, just relentless optimization loops.

Body Count (November 2025 – 2026 Goals)

  • GEMM 8k×8k (vs cuBLAS Lt 12.6) – Target: +25% TFLOPS, -40% watts on H100
  • FlashAttention-3 kernels (vs xformers) – Roofline domination
  • Llama.cpp GGUF inference (Q4_K_M + imatrix) – CPU+GPU hybrid speedup
  • Custom LLVM passes for SPECfp – Auto-discover unrolling magic

Each kill includes:

  • Full .cu source + git patch
  • Benchmark tables (TFLOPS, latency, power draw)
  • Flamegraphs + nsight reports
  • Arrogant auto-blogpost: "We just obsoleted 5 years of NVIDIA sweat."

Quickstart

  1. Clone: git clone https://github.com/zedarvates/CUDAgraveyard.git
  2. Setup: pip install -r requirements.txt (Grok-4/Claude API, shinka-evolve, nsight-sdk)
  3. Launch a demon: python launch_demon.py --template templates/gemm_toon_v1.json --gpus 0
  4. Watch the graveyard fill: Logs in graves/ (dead kernels + their epitaphs).

Stack (2025 Edition)

  • Core: AI-Scientist-v2 (SakanaAI fork) + ShinkaEvolve for genetic mutations
  • Tools: nvcc 12.6+, nsight-compute, hyperfine, nvidia-smi power monitoring
  • AI Brain: Grok-4 / Claude-3.5-Sonnet-2025 / o3-pro (reasoning + code gen)
  • Hardware: Tested on H100/L40S/MI300X; Docker for sandbox kills
  • Safety: Strict timeouts + ethical watcher (no infinite loops or self-hacks)

Warning: Running this may void warranties, melt PSUs, and put perf engineers out of work. Proceed with devilish intent.

Roadmap to Armageddon

  • Q4 2025: GEMM graveyard opening
  • Q1 2026: Multi-kernel massacre (attention + transformers)
  • Q2 2026: Hybrid CPU-GPU-Quantique integration (link to CosmosCluster)
  • Beyond: Self-optimizing the optimizer (Darwin Gödel vibes)

Fork it, contribute kills, or just watch the jobs vanish.
Built by zedarvates – because why hire when AI slays?

About

Autonomous AI that automatically discovers faster and more energy-efficient CUDA kernels than human engineers ⚰️ – the graveyard of cuBLAS, FlashAttention and every hand-tuned kernel.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors