AI-powered project generator that transforms natural language descriptions into complete, production-ready codebases with Docker configurations and automated testing.
📄 Paper submitted to ICML 2026
A novel approach to autonomous code generation using multi-agent systems with iterative self-healing and comprehensive validation across diverse programming paradigms.
- Planning Agent: Analyzes errors and generates comprehensive fix strategies using tool-augmented reasoning
- Correction Agent: Executes fixes with code understanding and validation
- Iterative Self-Healing: Automatically detects and resolves dependency conflicts, build errors, and test failures
- Natural language to production-ready code
- Multi-file project generation with proper structure
- Support for modern languages and frameworks
- Intelligent dependency resolution
- Best practices and design patterns
- Automated Docker container creation
- Isolated build and test environments
- Resource-managed execution (configurable CPU/memory limits)
- Complete validation pipeline from build to test execution
- 40 Programming Challenges across 4 languages:
- CUDA: GPU computing and parallel algorithms (10 challenges)
- Go: Concurrent systems and distributed computing (10 challenges)
- Rust: Memory-safe systems programming (10 challenges)
- TypeScript: Type-safe applications and frameworks (10 challenges)
- 4-Tier Difficulty System: From fundamentals to production systems
- Comprehensive benchmarking and metrics collection
graph LR
A[Natural Language Input] --> B[AI Analysis & Blueprint]
B --> C[Multi-File Code Generation]
C --> D[Dependency Resolution]
D --> E[Docker Configuration]
E --> F[Build Validation]
F --> G{Build Success?}
G -->|No| H[Planning Agent]
H --> I[Correction Agent]
I --> F
G -->|Yes| J[Test Execution]
J --> K{Tests Pass?}
K -->|No| H
K -->|Yes| L[Production-Ready Project]
style A fill:#4A90E2,stroke:#2E5C8A,stroke-width:2px,color:#fff
style B fill:#9B59B6,stroke:#6C3483,stroke-width:2px,color:#fff
style C fill:#E67E22,stroke:#A04000,stroke-width:2px,color:#fff
style D fill:#3498DB,stroke:#1F618D,stroke-width:2px,color:#fff
style E fill:#1ABC9C,stroke:#117A65,stroke-width:2px,color:#fff
style F fill:#E74C3C,stroke:#922B21,stroke-width:2px,color:#fff
style L fill:#27AE60,stroke:#186A3B,stroke-width:2px,color:#fff
Core Generation Pipeline:
- Blueprint Generation: Analyzes requirements and creates software architecture
- Folder Structure: Generates project hierarchy with proper organization
- File Generation: Creates all necessary files with content (source, config, tests, docs)
- Metadata Management: Tracks dependencies, entry points, and test commands
Intelligent Error Resolution:
- Error Tracking: Monitors all errors across build and test phases
- Tool-Augmented Planning: Uses file operations, command execution, and analysis tools
- Context-Aware Fixes: Understands project structure and dependencies
- Iterative Refinement: Continues until success or max iterations reached
Validation & Testing:
- Docker Isolation: Sandboxed build and test environments
- Command Detection: Automatically identifies build/test commands
- Log Analysis: Extracts and analyzes error messages
- Success Verification: Validates complete pipeline execution
Requirements:
- Python 3.9+
- Google Gemini API Key
- Docker (optional, for validation)
# Clone and install
git clone https://github.com/HyperKuvid-Labs/alpha-stack.git
cd alpha-stack
pip install .
# Configure API key
alphastack setupDocker Installation (Recommended):
# Install Docker Engine (Linux)
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
# Or via package manager (Ubuntu/Debian)
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.ioInteractive Mode:
alphastack
# Follow the interactive prompts to generate your projectCommand Line:
# Generate a project
alphastack generate "A Flask REST API with user authentication and JWT tokens"
# Specify output directory
alphastack generate "Python CLI tool for file processing" -o /path/to/output
# Generate with custom name
alphastack generate "React TypeScript dashboard with charts"
# List generated projects
alphastack list
# Clean up projects
alphastack cleanExample Projects:
# Web Applications
alphastack generate "Express.js REST API with MongoDB and authentication"
alphastack generate "FastAPI service with PostgreSQL and async operations"
# CLI Tools
alphastack generate "Python CLI tool for image compression with progress bar"
alphastack generate "Go CLI for log analysis with concurrent processing"
# Data Processing
alphastack generate "Rust program for parallel CSV processing"
alphastack generate "Python script for web scraping with retry logic"
# System Programming
alphastack generate "CUDA kernel for matrix multiplication optimization"
alphastack generate "Go service with gRPC and protocol buffers"AlphaStack includes a comprehensive evaluation framework with 40 carefully designed programming challenges across 4 modern languages, organized into 4 difficulty tiers:
- Focus: Parallel computing, memory management, kernel optimization
- Challenges: Vector operations → Matrix operations → Sparse algorithms → Ray tracing engines
- Tier 4 Example: Ray tracing engine with BVH acceleration structure
- Focus: Distributed systems, goroutines, channels, service architecture
- Challenges: Worker pools → REST APIs → Load balancers → Raft consensus
- Tier 4 Example: Full Raft consensus protocol implementation
- Focus: Memory safety, ownership, lifetimes, zero-cost abstractions
- Challenges: Custom iterators → HTTP parsers → Procedural macros → Custom allocators
- Tier 4 Example: Custom bump allocator as global allocator with FFI
- Focus: Type system, generics, inference, compile-time safety
- Challenges: Event emitters → Type-safe routers → DI containers → Full-stack RPC
- Tier 4 Example: End-to-end type-safe RPC framework with inference
| Tier | Focus | Complexity | Lines of Code | Time |
|---|---|---|---|---|
| Tier 1 | Fundamentals | Single concept, basic algorithms | 150-400 | 2-4h |
| Tier 2 | Architecture | Multiple modules, abstractions | 400-700 | 4-8h |
| Tier 3 | Advanced | Domain expertise, algorithms | 500-900 | 8-16h |
| Tier 4 | Production | Complete systems, optimization | 800-1500 | 16-32h |
- Success Rate: Percentage of challenges solved correctly
- Build Success: Projects that compile/build without errors
- Test Pass Rate: Projects with passing test suites
- Iteration Count: Average iterations needed for error resolution
- Time to Solution: End-to-end generation time
- Code Quality: Adherence to best practices and patterns
Evaluation Location: src/prompts/eval/ contains all challenge specifications and test cases.
alpha-stack/
├── src/
│ ├── agents/ # Multi-agent system
│ │ ├── planner.py # Planning agent for error analysis
│ │ └── corrector.py # Correction agent for fixes
│ ├── docker/ # Docker integration
│ │ ├── generator.py # Dockerfile generation
│ │ └── testing.py # Docker-based validation
│ ├── prompts/ # Jinja2 prompt templates
│ │ └── eval/ # Evaluation challenges
│ │ ├── cuda/ # 10 CUDA challenges
│ │ ├── go/ # 10 Go challenges
│ │ ├── rust/ # 10 Rust challenges
│ │ └── typescript/ # 10 TypeScript challenges
│ ├── utils/ # Core utilities
│ │ ├── helpers.py # Helper functions
│ │ ├── prompt_manager.py # Template management
│ │ ├── error_tracker.py # Error tracking
│ │ └── tools.py # Tool definitions
│ ├── generator.py # Main generation logic
│ ├── eval_generator.py # Evaluation system
│ ├── cli.py # Command-line interface
│ ├── tui.py # Terminal UI
│ └── config.py # Configuration management
├── website/ # Project website
├── test_runner.py # Development test runner
└── pyproject.toml # Project metadata
- Primary Model: Google Gemini (configurable via
MODEL_NAME) - Alternative Support: OpenRouter API for evaluation framework
- Context Management: Intelligent prompt engineering with Jinja2 templates
Planning Agent (src/agents/planner.py):
- Analyzes build/test errors using structured error tracking
- Generates comprehensive fix plans with tool-based reasoning
- Maintains project structure cache for efficient planning
- Supports different error types (dependency, docker, common errors)
Correction Agent (src/agents/corrector.py):
- Executes planned fixes with code understanding
- Validates code changes before application
- Uses language-specific parsers for syntax validation
- Tracks changes to prevent infinite loops
Features:
- Automatic Dockerfile generation based on project type
- Multi-stage builds for optimized images
- Resource management (configurable CPU/memory limits)
- Network isolation and security
- Support for custom base images
Testing Framework (src/docker/testing.py):
- Command detection (build, test, run commands)
- Real-time log capture and analysis
- Iterative error resolution with max iteration limits
- Success/failure validation with detailed reporting
Template System:
- Jinja2-based prompt templates for consistency
- Context-aware prompt rendering
- Specialized templates for different generation phases:
- Software blueprint generation
- Folder structure planning
- File content generation
- Error correction strategies
- Docker configuration
- Languages: Python, JavaScript/TypeScript, Go, Rust, Java, C/C++, CUDA, and more
- Frameworks: Flask, FastAPI, Express.js, React, Vue, Next.js, etc.
- Project Types: Web APIs, CLI tools, data processors, system utilities, GPU kernels
- File Types: Source code, configuration, tests, documentation, Docker files
- Dependency Resolution: Automatically resolves missing packages and version conflicts
- Build Fixes: Corrects syntax errors, import issues, configuration problems
- Test Fixes: Addresses failing tests, missing test dependencies, assertion errors
- Max Iterations: Configurable (default: 5 per phase)
- Build Time: Typically 1-5 minutes depending on project complexity
- Test Execution: Isolated environment with resource limits
- Success Rate: High success rate on Tier 1-2 challenges (>80%)
- Resource Usage: Configurable memory (default: 25% of system) and CPU (default: 50%)
This work introduces a novel approach to autonomous code generation that addresses key challenges in AI-assisted software development:
- Multi-Agent Architecture: Separation of planning and correction concerns for better error resolution
- Iterative Self-Healing: Autonomous error detection and correction without human intervention
- Comprehensive Validation: End-to-end validation from build to test execution in isolated environments
- Cross-Language Evaluation: Diverse evaluation suite spanning different programming paradigms
- Tool-Augmented Reasoning: Integration of file operations and command execution for context-aware fixes
- How effectively can multi-agent systems autonomously resolve software errors?
- What is the success rate across different programming paradigms and difficulty levels?
- How many iterations are typically required for convergence to a working solution?
- What types of errors can be automatically resolved vs. requiring human intervention?
The evaluation framework (src/prompts/eval/) provides a standardized benchmark with:
- 40 challenges across 4 languages and 4 difficulty tiers
- Clear success criteria (build success, test pass rate)
- Reproducible evaluation in Docker containers
- Metrics for iteration count, time to solution, and code quality
For more details on the evaluation suite, see src/prompts/eval/README.md
We welcome contributions! Areas of interest:
- Additional programming language support
- New evaluation challenges
- Performance optimizations
- Documentation improvements
- Bug fixes and error handling
MIT License - see LICENSE file for details
- Repository: github.com/HyperKuvid-Labs/alpha-stack
- Issues: github.com/HyperKuvid-Labs/alpha-stack/issues
- Evaluation Suite: src/prompts/eval/
For research collaborations or questions about the ICML 2026 submission, please open an issue or contact the AlphaStack Team.
AlphaStack - Transforming Ideas into Code
Submitted to ICML 2026
