PDF2Tutorial

A powerful, automated video generation platform designed to create educational tech tutorials from PDF slides. This project leverages AI for script refinement, high-quality Text-to-Speech (TTS), and programmatic video rendering.

Warning

Local Deployment Only: This project is designed as a local productivity tool. It has not been tested or secured for use as a public-facing website. Using this application on a public server is unsafe and not recommended.

Features

PDF to Presentation: Upload PDF slides and automatically extract them into a sequence of video scenes.
AI-Powered Scripting: Integrated with Google Gemini AI to transform fragmented slide notes into coherent, professional scripts.
High-Quality TTS: Supports local and cloud-based Text-to-Speech using Kokoro-js.
- Local Inference: Run TTS entirely locally via Dockerized Kokoro FastAPI.
- Hybrid Voices: Create custom voice blends by mixing two models with adjustable weights.
Rich Media Support: Insert MP4 videos and GIFs seamlessly between slides.
Programmatic Video Rendering: Built on Remotion for frame-perfect assembly.
Smart Audio Engineering:
- Auto-Ducking: Background music volume automatically lowers during voiceovers.
- Normalization: Final render is automatically normalized to YouTube standards (-14 LUFS).
Interactive Slide Editor: Drag-and-drop reordering, real-time preview, and batch script updates.

Getting Started

Prerequisites

Node.js (v20+)
npm or yarn
FFmpeg (required by Remotion for rendering)

Installation

Clone the repository:

git clone https://github.com/techcow2/pdf2tutorial.git
cd pdf2tutorial

Install dependencies:
```
npm install
```
Start the development server (runs both Vite and the rendering server):
```
npm run dev
```

The application will be available at http://localhost:5173.

Usage

1. Upload & Analyze

Drag and drop your presentation PDF into the main upload area. The application will process text from each page to create initial slides.

2. Configure & Enhance

Scroll down to the Configure Slides panel to manage your project globally:

Global Settings: Set a global voice (or create a custom Hybrid Voice), adjust post-slide delays, or run batch operations like "Find & Replace".
Media Assets: Click Insert Video to add MP4 clips or GIFs between slides.
Audio Mixing: Upload custom background music or select from the library (e.g., "Modern EDM"). Use the sliders to mix volume levels.

3. Crafting the Narrative

In the Slide Editor grid:

AI Scripting: Click the AI Fix Script button (Sparkles icon) to have Gemini rewrite raw slide text into a natural spoken script.
Manual Editing: Edit scripts directly. Highlight specific text sections to generate/regenerate audio for just that part.
Generate Output: Click the Generate TTS button (Speech icon) to create voiceovers.
Preview: Click the Play button to hear the result or click the slide thumbnail to expand the visual preview.

4. Render

Click the Download Video button. The server will:

Bundle the Remotion composition.
Render frames in parallel using available CPU cores.
Normalize the final audio mix to -14 LUFS.
Download the resulting MP4.

Configuration

Open the Settings Modal (Gear Icon) to customize the application:

1. API Keys (Script Generation)

Configure the AI model used for script refinement ("AI Fix Script").

Google Gemini: Built-in and recommended. Requires a Google AI Studio API Key.
Custom/OpenAI-Compatible: Point to any OpenAI-compatible endpoint (e.g., LocalAI, Ollama, vLLM).
- Base URL: Enter your provider's URL (e.g., http://localhost:11434/v1).
- Model Name: Specify the model ID (e.g., llama-3).
- API Key: Enter if required by your provider.

2. Text-to-Speech (TTS)

Engine: Choose between the internal Web Worker (client-side) or a local Dockerized Kokoro instance (faster/server-side).
Audio Defaults: Set default voice models and quantization levels (q4/q8).

3. Background Music Library

You can build your own library of background music tracks that will be available in the dropdown menus:

Navigate to the src/assets/music/ directory.
Paste your .mp3 files here.
The application will automatically detect these files and list them in the UI (e.g., my_cool_track.mp3 becomes "My Cool Track").

Tech Stack

Frontend: React 19, Vite, Tailwind CSS (v4)
Video Engine: Remotion (v4)
AI: Google Gemini API (gemini-2.0-flash-lite)
TTS: Kokoro (FastAPI / Web Worker)
Backend: Express.js (serving as a rendering orchestration layer)
Utilities: Lucide React (icons), dnd-kit (drag & drop), pdfjs-dist (PDF processing)

Project Structure

src/video/: Remotion compositions and video components.
src/components/: React UI components (Slide Editor, Modals, Uploaders).
src/services/: Core logic for AI, TTS, PDF processing, and local storage.
server.ts: Express server handling the @remotion/renderer logic.

Roadmap & TODO

YouTube Metadata Generator: Automatically generate optimized titles and descriptions using Gemini.
Thumbnail Generator: Create custom YouTube thumbnails based on slide content.
Voiceover Recording: Support for recording custom voiceovers directly within the app using a microphone.
Header Layout Optimization: Refactor and organize the application header for better aesthetics and usability.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
out		out
public		public
screenshots		screenshots
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
render.mjs		render.mjs
server.ts		server.ts
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
tsconfig.server.json		tsconfig.server.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF2Tutorial

Table of Contents

Features

Getting Started

Prerequisites

Installation

Usage

1. Upload & Analyze

2. Configure & Enhance

3. Crafting the Narrative

4. Render

Configuration

1. API Keys (Script Generation)

2. Text-to-Speech (TTS)

3. Background Music Library

Tech Stack

Project Structure

Roadmap & TODO

License

About

Uh oh!

Releases

Packages

Languages

License

techcow2/pdf2tutorial

Folders and files

Latest commit

History

Repository files navigation

PDF2Tutorial

Table of Contents

Features

Getting Started

Prerequisites

Installation

Usage

1. Upload & Analyze

2. Configure & Enhance

3. Crafting the Narrative

4. Render

Configuration

1. API Keys (Script Generation)

2. Text-to-Speech (TTS)

3. Background Music Library

Tech Stack

Project Structure

Roadmap & TODO

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages