Skip to content

maree217/data-engineering-journey

Repository files navigation

Data Engineering Journey: From Traditional to AI-Driven Systems

Python Azure Codespaces GitHub stars GitHub last commit License: MIT

A comprehensive hands-on course for engineering graduates transitioning into AI and data engineering, using GitHub Codespaces, Azure, and the Microsoft ecosystem.

🎯 Course Overview

This course takes students through a practical journey from traditional data systems to modern AI-driven data engineering, with hands-on projects using real tools and platforms.

Target Audience

  • Engineering graduates with Python programming experience
  • Professionals looking to transition into AI and data engineering
  • Students wanting practical, hands-on experience with modern data tools

πŸ—οΈ Course Structure

Phase 1: Traditional Data Engineering (Weeks 1-4)

  • Foundation: Data fundamentals, types, and organizational needs
  • Traditional Stack: SQL databases, Python analytics, data warehousing
  • Business Intelligence: Power BI, visualization, reporting
  • Hands-on Project: Interactive data dashboard website

Phase 2: Modern Data Engineering (Weeks 5-8)

  • Cloud Architecture: Azure Data Factory, Synapse Analytics
  • MLOps Foundation: Azure Machine Learning, model deployment
  • Advanced Analytics: Predictive modeling, automated pipelines

Phase 3: AI-Driven Data Systems (Weeks 9-12)

  • Semantic Data: Vector databases, embeddings, Azure Cognitive Search
  • Graph Databases: Knowledge systems with Azure Cosmos DB
  • AI Agents: Automated analysis, memory systems, intelligent workflows

πŸ› οΈ Technology Stack

  • Development Environment: GitHub Codespaces
  • Cloud Platform: Microsoft Azure (free tier)
  • Databases: Azure SQL, PostgreSQL with pgvector, Cosmos DB
  • Analytics: Power BI, Azure Machine Learning, Azure Cognitive Search
  • AI Tools: Claude Code, GitHub Copilot
  • Languages: Python, SQL, JavaScript/HTML/CSS

πŸš€ Getting Started

Prerequisites

  • Basic Python programming knowledge
  • GitHub account
  • Azure account (free tier available)

Setup Instructions

  1. Fork this repository

    git clone https://github.com/your-username/data-ai-course.git
    cd data-ai-course
  2. Open in GitHub Codespaces

    • Click "Code" β†’ "Codespaces" β†’ "Create codespace on main"
    • Wait for environment setup to complete
  3. Start with Phase 1

    cd phase1-traditional/html-dashboard
    live-server --port=3000
  4. Access the Interactive Dashboard

    • Open the forwarded port 3000 in your browser
    • Begin your data engineering journey!

πŸ“š Course Modules

Phase 1 Modules

Week 1: Data Fundamentals

  • Interactive Dashboard: phase1-traditional/html-dashboard/
  • Learning Objectives:
    • Understand data types: structured, unstructured, semi-structured
    • Explore traditional storage systems and their evolution
    • Hands-on text analysis and basic data manipulation

Week 2: SQL Deep Dive

  • Project: Database design and advanced querying
  • Tools: Azure SQL Database, SQL Server Management Studio
  • Practice: Interactive SQL playground with real datasets

Week 3: Python Analytics

  • Project: Data analysis pipeline using pandas and numpy
  • Visualization: matplotlib, seaborn integration
  • Jupyter Labs: Interactive data exploration

Week 4: Business Intelligence

  • Project: Power BI dashboard connected to Azure SQL
  • Skills: DAX formulas, data modeling, report design
  • Integration: Automated refresh and sharing

πŸŽ“ Learning Approach

Hands-On First

  • Every concept introduced with a practical exercise
  • Real-world datasets and scenarios
  • Industry-standard tools and practices

Progressive Complexity

  • Start with familiar concepts (traditional databases)
  • Gradually introduce modern concepts (vector databases, AI agents)
  • Build comprehensive understanding through iteration

Industry Integration

  • Microsoft ecosystem focus (Azure, Power BI, etc.)
  • GitHub-based workflows
  • AI-assisted development with Claude Code and Copilot

πŸ“Š Sample Projects

Phase 1: Traditional Data Dashboard

  • Interactive website demonstrating data concepts
  • SQL query playground
  • Data visualization with Chart.js
  • Text analysis simulation

Phase 2: MLOps Pipeline (Coming Soon)

  • Azure ML pipeline creation
  • Model training and deployment
  • Automated data processing workflows

Phase 3: AI-Driven Analytics (Coming Soon)

  • Vector database implementation
  • AI agent for automated analysis
  • Graph database knowledge system

πŸ”§ Development Environment

The course uses a fully configured development environment with:

  • Python 3.11 with data science libraries
  • Node.js 18 for frontend development
  • Azure CLI for cloud integration
  • VS Code Extensions: Python, Jupyter, Azure tools, GitHub Copilot
  • Pre-configured Ports: 3000 (Dashboard), 8888 (Jupyter), 5432 (PostgreSQL)

πŸ“ˆ Learning Outcomes

By the end of this course, students will be able to:

  1. Design and implement traditional data systems

    • SQL databases, data warehouses, ETL pipelines
    • Business intelligence and reporting systems
  2. Build modern cloud-based data architectures

    • Azure data services integration
    • MLOps workflows and model deployment
  3. Create AI-driven data solutions

    • Vector databases and semantic search
    • AI agents for automated data analysis
    • Graph databases for knowledge management
  4. Use industry-standard tools effectively

    • GitHub workflows and collaboration
    • AI-assisted development practices
    • Cloud platform management

🀝 Contributing

This is an educational project. Contributions welcome:

  • Bug fixes and improvements
  • Additional exercises and examples
  • Documentation enhancements
  • New project ideas

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ†˜ Support

  • Issues: Use GitHub Issues for technical problems
  • Discussions: Use GitHub Discussions for questions and ideas
  • Documentation: Check the docs/ folder for detailed guides

Ready to start your data engineering journey?

  1. Open GitHub Codespaces
  2. Navigate to phase1-traditional/html-dashboard/
  3. Run live-server --port=3000
  4. Open your browser and begin learning! πŸš€

About

πŸš€ Complete hands-on course: Traditional Data β†’ Modern Engineering β†’ AI-Driven Systems. Interactive demos, GitHub Codespaces ready, Azure integration. Perfect for engineering graduates transitioning to AI careers.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors