AI Integration Guide

A comprehensive guide for setting up, testing, and integrating AI with epic_engine.

Testing the AI
- Option A: Use curl
- Option B: Use the browser
Vector Store Sync
VS Code / IDE Setup
- How to Fix IDE Warnings
- Why This Happens
API Usage Links
- Supported Providers
Package Installation
epic_engine Integration
UV Package Manager
Running the AI Service
Retriever Comparison
Developer Tools
- Tracing Imports & Dependencies
- VS Code Keyboard Shortcuts
System Resource Monitoring

Testing the AI

Test if Qwen actually generates text using one of these methods:

Option A: Use curl (in a new terminal)

curl -X POST http://localhost:8000/api/generate \
  -H "Content-Type: application/json" \
  -d "{\"prompt\": \"Write one sentence about a knight.\"}"

Option B: Use the browser

Go to: http://localhost:8000/docs

This opens FastAPI's built-in test interface where you can try the /api/generate endpoint with a form.

Test It

Restart your server (Ctrl+C, then uv run python -m api.main)
Go to http://localhost:8000/docs
Try the new POST /api/generate/stream endpoint

uv run python -m api.main

Vector Store Sync

Is Your Existing KG Data Vectorized?

No. Your existing Knowledge Graph data is NOT synced to the vector store yet. The sync notifications only trigger when:

New entities are created (after the code was added)
Existing entities are updated (after the code was added)
Entities are deleted (after the code was added)

To sync existing data, you need to:

Call the full novel sync endpoint for each novel:

POST http://localhost:8000/api/sync/novel
Body: { "novel_id": "your-novel-uuid-here" }

This will:

Fetch all entities for that novel from backend
Chunk each entity
Generate embeddings
Store in ChromaDB

Check Sync Status

GET http://localhost:8000/api/sync/stats

If it returns collection_count: 0, nothing is vectorized yet.

VS Code / IDE Setup

Note: Those are VS Code/Pylance IDE warnings, not actual Python errors. The packages are installed correctly (you verified with pip show), but VS Code is looking at a different Python interpreter.

How to Fix the IDE Warnings

Press Ctrl+Shift+P in VS Code
Type "Python: Select Interpreter"
Choose the interpreter at:

C:\Users\arman\AppData\Roaming\Python\Python313\python.exe

After selecting the correct interpreter, the red squiggly lines should disappear within a few seconds as Pylance re-analyzes the files.

Why This Happens

VS Code might be pointing to:

A different Python version
A virtual environment that doesn't have the packages
The wrong system Python

The pip show output confirmed packages are in C:\Users\arman\AppData\Roaming\Python\Python313\site-packages, so you need to tell VS Code to use that same Python.

API Usage Links

Note: Log in with armandoblancq@gmail.com (or with gmail)

Provider	Link
OpenAI	https://platform.openai.com/logs
Anthropic	https://console.anthropic.com/workspaces/default/logs
Google Gemini	https://aistudio.google.com/app/logs and https://console.cloud.google.com/billing
DeepSeek	https://platform.deepseek.com/usage
Groq	https://console.groq.com/keys
Serper	https://serper.dev/logs

Supported Providers

OpenAI
Anthropic
Google Gemini
DeepSeek
Groq
Serper

Package Installation

Additional dependencies for vector operations:

uv add "numba>=0.59" "umap-learn>=0.5.5"

How epic_engine Integrates with aiservice

Step 1: Install epic_engine as a package

cd package/epic_engine
pip install -e .

The -e flag means "editable" - it installs the package in development mode, linking to your source code rather than copying it.

Step 2: Import in aiservice

Your aiservice files would import from epic_engine instead of their local modules:

# Before (current aiservice)
from retrieval.hybrid_retriever import HybridRetriever
from engine.rag_engine import RAGEngine
from vectorstore.vector_store import VectorStore

# After (using epic_engine)
from epic_engine.retrieval import HybridRetriever
from epic_engine.rag import RAGEngine
from epic_engine.vectors import VectorStore

Step 3: aiservice becomes a thin API layer

Your aiservice would only contain:

Flask/FastAPI routes (api/routes.py)
Request/response schemas (api/schemas.py)
Server configuration (api/server.py)
Any app-specific logic not in the engine

Will Updates to epic_engine Reflect in aiservice?

YES - if you installed with pip install -e . (editable mode)

Installation Method	Updates Reflected?
`pip install -e .` (editable)	Yes - immediately linked to source
`pip install .` (regular)	No - need to reinstall
`pip install epic-engine` (PyPI)	No - need to upgrade

With editable install:

Edit epic_engine/rag/engine.py
Restart aiservice
Changes are immediately available

Without editable install:

Edit epic_engine/rag/engine.py
Must run pip install . again
Then restart aiservice

Recommended Workflow

Scenario	Recommendation
During development	Use `pip install -e .` so changes reflect immediately
For production	Use regular `pip install .` or version from PyPI
For other projects	They can `pip install epic-engine` independently

What Stays in aiservice vs epic_engine

aiservice (API Layer)	epic_engine (Core Library)
Flask/FastAPI routes	RAG engine
HTTP request handling	Vector store
Authentication	Knowledge graph
API schemas	Providers (OpenAI, etc.)
Server startup	Agents
App-specific configs	Prompts

Key Benefit: This separation means you could build a completely different app (CLI tool, desktop app, another API) using the same epic_engine library.

UV Package Manager

pyproject.toml Configuration

Path dependencies need to be in a separate [tool.uv.sources] section:

[project]
dependencies = [
    "epic-engine",  # keep as string here
    "fastapi",
    "uvicorn",
    "httpx",
]

[tool.uv.sources]
epic-engine = { path = "../package/epic_engine" }

Warning: Putting epic-engine = { path = "..." } inside the dependencies array is invalid TOML and will fail.

Transitive Dependencies

The uv.lock file contains all transitive dependencies - not just your 4 direct dependencies, but everything those packages depend on:

Even though your pyproject.toml only lists 4 packages, the full dependency tree is ~143 packages.

When to Run uv sync

You only need to run uv sync again if:

You add/remove dependencies in epic_engine's pyproject.toml
You change the package structure (add new submodules to __init__.py)

For normal code changes (fixing bugs, improving logic, adding functions to existing files), just save and restart the server.

Reinstalling epic-engine

When uv sync uses a cached version that doesn't have your new changes:

uv sync --reinstall-package epic-engine

This tells uv to rebuild and reinstall epic-engine from the source path.

Running the AI Service

Quick Start Commands

# Reinstall epic-engine (after making changes)
uv sync --reinstall-package epic-engine

# Start the AI service
uv run python -m api.main

After starting, test these endpoints:

Endpoint	Expected Response
http://localhost:8000/	"Epic AI Service is running"
http://localhost:8000/api/health	healthy status

Note: The Qwen model uses lazy loading - it loads on first use, not at startup.

Resync Novel Data

Open a new terminal while aiservice is running:

# PowerShell
Invoke-RestMethod -Uri "http://localhost:8000/api/sync/novel" `
  -Method POST `
  -ContentType "application/json" `
  -Body '{"novel_id": "cmise0o310000h21zkiofct47"}'

Test Interface

# Launch the test GUI
uv run python test_interface.py

# Or the EPIC Tester
uv run python EPIC_Tester.py

Note: Make sure your backend is running (npm run dev in the backend folder) since the test interface needs to connect to the Knowledge Graph API.

Start the backend first:

cd backend
npm run dev

Prisma Migrations

To create the PlotThread table in the database:

cd backend

# Quick development sync
npx prisma db push

# Or proper migration
npx prisma migrate dev --name add_plot_thread

The db push is quicker for development - it syncs your schema without creating migration files.

Retriever Comparison

How to Switch Retrievers

In routes.py line 381, change:

# Current: LLM reranking, dual search
_DEFAULT_RETRIEVER_TYPE = "advanced"

# Alternative: Faster, score-based RRF fusion
_DEFAULT_RETRIEVER_TYPE = "hybrid"

HybridRetriever

Feature	Has It?	Details
Query Rewriting	Yes	Uses QueryRewriter (lines 60-65 in hybrid.py)
Vector Search	Yes	Parallel with KG search
KG Search	Yes	Parallel with vector search
KG Traversal	Yes	From seed entities
Dual Query Search	No	Only searches with ONE query (rewritten if enabled)
LLM Reranking	No	Uses RRF score fusion (mathematical, no LLM)
ThreadPoolExecutor Workers	2

AdvancedRetriever

Feature	Has It?	Details
Query Rewriting	Yes	Uses QueryRewriter
Vector Search	Yes	Parallel
KG Search	Yes	Parallel
KG Traversal	Yes	From seed entities
Dual Query Search	Yes	Searches with BOTH original AND rewritten queries
LLM Reranking	Yes	Uses Reranker with LLM to judge relevance
LLM Validation	No
LLM Reasoning	No
ThreadPoolExecutor Workers	3

RVRGRetriever (Retrieve-Validate-Reason-Generate)

The most advanced retriever with full LLM-powered pipeline for highest quality context.

Feature	Has It?	Details
Query Rewriting	Yes	Uses QueryRewriter
Vector Search	Yes	Parallel (original + rewritten queries)
KG Search	Yes	Parallel
KG Traversal	Yes	From seed entities
Dual Query Search	Yes	Searches with BOTH original AND rewritten queries
LLM Reranking	Yes	Uses Reranker with LLM to judge relevance
LLM Validation	Yes	Filters out irrelevant results with explanations
LLM Reasoning	Yes	Extracts insights, connections, and identifies gaps
ThreadPoolExecutor Workers	3

RVRG Pipeline Stages:

Retrieve - Query rewrite → Vector search → KG search → KG traversal → Rerank
Validate - LLM evaluates each result for relevance (score 0-1, with explanations)
Reason - LLM analyzes validated context to extract insights, find connections, identify gaps
Generate - Final answer generation with reasoning summary as guidance

Unique Features:

Validation with explanations - Each result gets a relevance score and reason why it's relevant/irrelevant
Gap identification - Detects what information is missing to fully answer the question
Confidence scoring - Reports whether context is sufficient to answer (can_answer, answer_confidence)
Key insights extraction - Pulls out important facts from context with source attribution
Connection discovery - Finds relationships between entities/events in context

The ACTUAL Key Differences

Feature	HybridRetriever	AdvancedRetriever	RVRGRetriever
Queries Used for Search	1 (rewritten only)	2 (original + rewritten)	2 (original + rewritten)
Ranking Method	RRF score fusion (math)	LLM judges relevance	LLM judges relevance
Validation Step	No	No	Yes (filters irrelevant results)
Reasoning Step	No	No	Yes (extracts insights & gaps)
LLM Calls During Retrieval	1 (rewriting only)	2 (rewriting + reranking)	4 (rewrite + rerank + validate + reason)
Speed	Fastest	Medium	Slowest
Cost	Lowest	Medium	Highest
Quality	Good	Better	Best

All three have query rewriting. The key progression is: Hybrid (fast, math-based) → Advanced (adds LLM reranking) → RVRG (adds validation and reasoning for highest quality).

Which Retriever is Better?

Priority	Better Choice
Highest Quality	RVRGRetriever
Complex queries	RVRGRetriever
Good quality + speed	AdvancedRetriever
Speed	HybridRetriever
Cost (API calls)	HybridRetriever
Simple lookups	HybridRetriever
Know if answer exists	RVRGRetriever (has `can_answer` flag)

What is ThreadPoolExecutor Workers?

ThreadPoolExecutor is Python's way to run multiple tasks in parallel (at the same time).

Workers = 2 means 2 tasks can run simultaneously
Workers = 3 means 3 tasks can run simultaneously

In your retrievers:

HybridRetriever (2 workers): Runs Vector Search + KG Search in parallel
AdvancedRetriever (3 workers): Runs Original Query Search + Rewritten Query Search + KG Search in parallel

More workers = faster retrieval (tasks don't wait for each other).

LLM Reranking vs RRF Score Fusion: Which is More Accurate?

LLM Reranking (AdvancedRetriever) - More Accurate

How it works: An LLM reads each result and the original query, then judges: "Is this result actually relevant to what the user asked?"

Pros:

Understands semantic meaning and context
Can recognize when a result looks related but isn't actually helpful
Handles nuance, synonyms, and intent

Cons:

Slower (requires LLM API call)
Costs money (API tokens)
Can hallucinate or make mistakes

RRF Score Fusion (HybridRetriever) - Faster, but Less Accurate

How it works: Mathematical formula that combines rankings from different sources:

RRF_score = Σ (1 / (k + rank_i))

Where k is typically 60, and rank_i is the position in each result list.

Pros:

Instant (pure math, no LLM call)
Free (no API costs)
Consistent and predictable

Cons:

Doesn't understand meaning - just combines numbers
A result ranked #1 in both lists wins, even if it's not actually relevant
Can't handle cases where high-scoring results are semantically wrong

Bottom Line

Method	Accuracy	Speed	Cost
LLM Reranking	Higher	Slower	Higher
RRF Score Fusion	Lower	Faster	Free

LLM reranking is more accurate because it actually understands the query and results. RRF just does math on rankings without understanding content. For a creative writing app like EPIC where context quality matters, AdvancedRetriever with LLM reranking will give better results - but at the cost of speed and API calls.

Speed & Cost Comparison

Speed Difference

Retriever	LLM Calls During Retrieval	Estimated Time
HybridRetriever	1 call (query rewriting)	~1-2 seconds
AdvancedRetriever	2 calls (rewriting + reranking)	~3-5 seconds
RVRGRetriever	4 calls (rewrite + rerank + validate + reason)	~6-12 seconds

The validation and reasoning steps add significant time because:

Validation sends all reranked results to LLM for relevance scoring with explanations
Reasoning analyzes validated context to extract insights, connections, and gaps
Each step requires a full LLM inference pass

Rough estimate: AdvancedRetriever is 2-3x slower than Hybrid; RVRGRetriever is 2-3x slower than Advanced

API Cost Difference

Both retrievers use the user-selected model from the test interface dropdown (e.g., openai/gpt-4o-mini, anthropic/claude-3-haiku, deepseek/deepseek-chat, etc.). The model is passed through the provider/model settings.

Example pricing (GPT-4o-mini as of early 2025):

Input: ~$0.15 per 1M tokens
Output: ~$0.60 per 1M tokens

Retriever	Tokens per Query (estimate)	Cost per Query
HybridRetriever	~500 tokens (rewrite only)	~$0.0001
AdvancedRetriever	~2000-3000 tokens (rewrite + rerank 20 results)	~$0.0003-0.0005
RVRGRetriever	~4000-6000 tokens (rewrite + rerank + validate + reason)	~$0.0006-0.001

RVRGRetriever costs roughly 6-10x more per query than HybridRetriever due to additional LLM calls for validation and reasoning

Real-World Impact

Usage	HybridRetriever	AdvancedRetriever	RVRGRetriever
100 queries/day	~$0.01/day	~$0.03-0.05/day	~$0.06-0.10/day
1000 queries/day	~$0.10/day	~$0.30-0.50/day	~$0.60-1.00/day

Bottom line: The cost difference is negligible for personal use. RVRGRetriever provides the highest quality context with validation and reasoning, but at 2-3x the cost of AdvancedRetriever. The speed difference is more noticeable than the cost.

Can Qwen 2.5 7B Handle AdvancedRetriever?

Yes, but with caveats.

What Qwen 2.5 7B Needs to Do

Task	Difficulty for 7B Model
Query Rewriting	Easy - Simple text transformation
LLM Reranking	Harder - Must evaluate 10-20 results against query

Potential Issues

1. Reranking Quality

Reranking requires the model to understand relevance at a deeper level
7B models can do this, but not as well as GPT-4o-mini or Claude
You might get slightly worse ranking than with a cloud model

2. Context Window Pressure

Reranking sends ALL candidate results (up to 20 entities) to the LLM
Each entity has name + description + metadata
Could easily be 2000-4000 tokens just for the reranking prompt
Your Qwen is configured with 8192 context window - should be fine, but tight

3. Speed

Local 7B model on RTX 3050 6GB is slower than cloud APIs
Reranking adds another full inference pass
Expect 5-15 seconds per RAG query with AdvancedRetriever on local

Recommendation

Scenario	Use
Testing/Development	HybridRetriever (faster iteration)
Final output quality matters	AdvancedRetriever
Using cloud provider (OpenAI, etc.)	AdvancedRetriever works great
Using local Qwen only	HybridRetriever is probably better tradeoff

Best of Both Worlds

Your current setup already defaults to openai/gpt-4o-mini for the rewriter and reranker in AdvancedRetriever (see advanced_retriever.py:121-122). So even if you use Qwen for the final generation, the retrieval/reranking still uses the cheap, fast cloud model for best quality context selection.

Developer Tools

Tracing Imports & Dependencies

Terminal Commands

grep / ripgrep (rg)

# Find all files importing a specific module
grep -r "from module_name import" .
grep -r "import module_name" .

# ripgrep is faster
rg "from config import"
rg "import config"

Python-specific

# Show module dependencies
python -c "import module_name; print(module_name.__file__)"

# Use pydeps to visualize dependencies
pydeps your_module.py

# Use pipdeptree for package dependencies
pipdeptree

Node.js/TypeScript-specific

# Find imports of a file
grep -r "from './filename'" .
grep -r "require('./filename')" .

# Use madge for dependency graphs
npx madge --circular src/
npx madge src/index.ts

IDE Features

VSCode: Right-click -> "Find All References" (Shift+F12)
VSCode: Right-click -> "Go to References"
PyCharm/WebStorm: Right-click -> "Find Usages" (Alt+F7)

Specialized Tools

Tool	Language	What it does
madge	JS/TS	Dependency graphs, circular detection
pydeps	Python	Visual dependency graphs
import-js	JS	Import analysis
vulture	Python	Find unused code/imports
ts-unused-exports	TS	Find unused exports

Quick One-Liners

# Count how many files import something
rg -l "import.*SomeClass" | wc -l

# See the actual import lines with context
rg -C 2 "from epic_engine"

Practical Examples

Finding Python Imports

# Find everything that imports from epic_engine
rg "from epic_engine" .

# Find what imports the config module specifically
rg "from epic_engine.core.config import"

# Find any file importing the reranker
rg "import.*reranker" --type py

Finding TypeScript/JavaScript Imports

# Find what imports the aiChat service
rg "from.*aiChat" frontend/

# Find all files importing from a specific hooks folder
rg "from.*AIChatModuleHooks" .

# Find require statements
rg "require\(.*aiChat" .

Finding Who Uses a Specific Function/Class

# Find all usages of a function called "retrieve_context"
rg "retrieve_context" --type py

# Find where AdvancedRetriever is used
rg "AdvancedRetriever" .

# Find with surrounding context (2 lines before/after)
rg -C 2 "useAIChat" frontend/

Checking Circular Dependencies

# For JavaScript/TypeScript projects
npx madge --circular frontend/

# For Python
pip install pydeps
pydeps aiservice/ --show-cycles

VS Code Keyboard Shortcuts

Multi-Cursor & Selection

Shortcut	What It Does
`Ctrl+D`	Select next occurrence of current word (keep pressing for more)
`Ctrl+Shift+L`	Select ALL occurrences of current word at once
`Alt+Click`	Add cursor at click location
`Ctrl+Alt+Up/Down`	Add cursor above/below current line
`Shift+Alt+I`	Add cursor at end of each selected line
`Ctrl+U`	Undo last cursor operation

Line Manipulation

Shortcut	What It Does
`Alt+Up/Down`	Move entire line up/down
`Shift+Alt+Up/Down`	Duplicate line up/down
`Ctrl+Shift+K`	Delete entire line
`Ctrl+Enter`	Insert blank line below
`Ctrl+Shift+Enter`	Insert blank line above
`Ctrl+]` / `Ctrl+[`	Indent/outdent line

Shortcut	What It Does
`Ctrl+G`	Go to specific line number
`Ctrl+P`	Quick open file by name
`Ctrl+Shift+O`	Go to symbol in current file
`Ctrl+T`	Go to symbol across entire workspace
`F12`	Go to definition
`Alt+F12`	Peek definition (inline popup)
`Shift+F12`	Find all references
`Ctrl+Shift+\`	Jump to matching bracket
`Alt+Left/Right`	Navigate back/forward (history)

Selection Expansion

Shortcut	What It Does
`Shift+Alt+Right`	Expand selection (word -> line -> block -> function)
`Shift+Alt+Left`	Shrink selection
`Ctrl+L`	Select entire current line
`Ctrl+Shift+[` / `]`	Fold/unfold code block

Search & Replace

Shortcut	What It Does
`Ctrl+F`	Find in file
`Ctrl+H`	Find and replace in file
`Ctrl+Shift+F`	Find across all files
`Ctrl+Shift+H`	Find and replace across all files
`F3` / `Shift+F3`	Next/previous match

Code Actions

Shortcut	What It Does
`Ctrl+.`	Quick fix / show code actions (auto-imports, refactors)
`F2`	Rename symbol (updates all references)
`Ctrl+Space`	Trigger IntelliSense/autocomplete
`Ctrl+Shift+Space`	Show parameter hints
`Shift+Alt+F`	Format entire document
`Ctrl+K Ctrl+F`	Format selected code only

Commenting

Shortcut	What It Does
`Ctrl+/`	Toggle line comment
`Shift+Alt+A`	Toggle block comment

Editor Management

Shortcut	What It Does
`Ctrl+\`	Split editor
`Ctrl+1/2/3`	Focus editor group 1/2/3
`Ctrl+W`	Close current tab
`Ctrl+K Z`	Zen mode (distraction-free)
`Ctrl+B`	Toggle sidebar visibility
`Ctrl+J`	Toggle terminal panel

Top Recommendations for Speed

Ctrl+D - Essential for quick renames
Alt+Up/Down - Move lines without cut/paste
Ctrl+Shift+K - Delete lines instantly
F2 - Smart rename across files
Ctrl+. - Auto-fix problems, add imports
Ctrl+P - Navigate files without touching mouse
Shift+Alt+Up/Down - Duplicate code instantly

System Resource Monitoring

The EPIC Tester includes a real-time System Resource Monitor panel that displays CPU, RAM, GPU, and VRAM usage as you interact with the AI.

What the Monitor Tracks

Metric	Description	Update Interval
CPU	Overall CPU usage percentage across all cores	1 second
RAM	System memory usage (used GB / total GB)	1 second
GPU	Graphics card utilization percentage	1 second
VRAM	Video memory usage (used GB / total GB)	1 second

The monitor uses:

psutil for CPU, RAM, and Disk metrics
GPUtil for GPU and VRAM metrics (NVIDIA GPUs)

Understanding GPU/VRAM Usage

When GPU/VRAM Shows 0%

If you see GPU and VRAM staying at 0% while making queries, it means the model is NOT running on your local GPU. This happens when:

Scenario	GPU/VRAM Usage
Using cloud providers (OpenAI, Anthropic, etc.)	0% - Model runs on provider's servers
Using `local/qwen`	Active - Model runs on your GPU
Using Ollama with GPU offload	Active - Model runs on your GPU

When You WILL See GPU/VRAM Activity

GPU and VRAM graphs will show activity when:

Local Qwen model (local/qwen) - The model loads into VRAM and runs inference on your GPU
Embedding generation - If using a local embedding model
Any local LLM that uses GPU acceleration

Resource Usage by Provider

Provider	CPU Usage	RAM Usage	GPU Usage	VRAM Usage
`openai/gpt-4o-mini`	Low (network I/O)	Low (Python + response parsing)	None	None
`anthropic/claude-3-haiku`	Low (network I/O)	Low (Python + response parsing)	None	None
`deepseek/deepseek-chat`	Low (network I/O)	Low (Python + response parsing)	None	None
`local/qwen`	Medium (preprocessing)	Medium (model loading)	High (inference)	High (model weights)

Cloud Provider Resource Pattern

When using cloud providers like OpenAI:

CPU:  [====------] 20-40%  (Python + async networking)
RAM:  [===-------] 15-30%  (Python runtime + response buffers)
GPU:  [----------] 0%      (Not used - inference in cloud)
VRAM: [----------] 0%      (Not used - model not loaded locally)

Local Model Resource Pattern

When using local/qwen:

CPU:  [=====-----] 40-60%  (Token processing + context)
RAM:  [======----] 50-70%  (Model metadata + context window)
GPU:  [========--] 70-95%  (Matrix operations during inference)
VRAM: [=======---] 60-80%  (Model weights + KV cache)

Hardware Requirements

Minimum Requirements for Local Models

Component	Minimum	Recommended	Notes
GPU VRAM	4GB	6GB+	Qwen 2.5 7B needs ~5-6GB VRAM
System RAM	8GB	16GB+	For model loading + context
GPU	NVIDIA GTX 1060	RTX 3050+	CUDA support required

VRAM Usage Estimates by Model Size

Model Size	Estimated VRAM (FP16)	Estimated VRAM (Q4 Quantized)
3B params	~6GB	~2GB
7B params	~14GB	~4-5GB
13B params	~26GB	~8GB
70B params	~140GB	~40GB

Note: EPIC uses quantized models (Q4_K_M) for efficiency. Your RTX 3050 6GB can comfortably run Qwen 2.5 7B quantized.

Troubleshooting GPU Detection

GPU/VRAM Shows 0% Even with Local Model

Check GPUtil installation:
```
pip install gputil
```
Verify NVIDIA drivers:
```
nvidia-smi
```
This should show your GPU and current usage.

Check if llama-cpp-python has GPU support:

python -c "from llama_cpp import Llama; print('GPU support available')"

Verify CUDA is available:

python -c "import torch; print(torch.cuda.is_available())"

Common Issues

Issue	Cause	Solution
GPUtil not detecting GPU	Missing NVIDIA drivers	Install/update NVIDIA drivers
GPU shows but VRAM is 0	Model not using GPU	Reinstall llama-cpp-python with CUDA
High RAM but no GPU usage	Model running on CPU	Check CUDA installation
Monitor panel missing	Import error	Check `pip install psutil gputil`

Force GPU Usage for Local Models

If your local model is running on CPU instead of GPU, you may need to reinstall llama-cpp-python with CUDA support:

# Uninstall existing
pip uninstall llama-cpp-python

# Reinstall with CUDA support
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python --force-reinstall --no-cache-dir

For Windows:

$env:CMAKE_ARGS="-DLLAMA_CUDA=on"
pip install llama-cpp-python --force-reinstall --no-cache-dir

AI Integration Guide

AI Integration Guide

Table of Contents

Testing the AI

Option A: Use curl (in a new terminal)

Option B: Use the browser

Test It

Vector Store Sync

Is Your Existing KG Data Vectorized?

To sync existing data, you need to:

Check Sync Status

VS Code / IDE Setup

How to Fix the IDE Warnings

Why This Happens

API Usage Links

Supported Providers

Package Installation

How epic_engine Integrates with aiservice

Step 1: Install epic_engine as a package

Step 2: Import in aiservice

Step 3: aiservice becomes a thin API layer

Will Updates to epic_engine Reflect in aiservice?

With editable install:

Without editable install:

Recommended Workflow

What Stays in aiservice vs epic_engine

UV Package Manager

pyproject.toml Configuration

Transitive Dependencies

When to Run uv sync

Reinstalling epic-engine

Running the AI Service

Quick Start Commands

Resync Novel Data

Test Interface

Prisma Migrations

Retriever Comparison

How to Switch Retrievers

HybridRetriever

AdvancedRetriever

RVRGRetriever (Retrieve-Validate-Reason-Generate)

The ACTUAL Key Differences

Which Retriever is Better?

What is ThreadPoolExecutor Workers?

LLM Reranking vs RRF Score Fusion: Which is More Accurate?

LLM Reranking (AdvancedRetriever) - More Accurate

RRF Score Fusion (HybridRetriever) - Faster, but Less Accurate

Bottom Line

Speed & Cost Comparison

Speed Difference

API Cost Difference

Real-World Impact

Can Qwen 2.5 7B Handle AdvancedRetriever?

What Qwen 2.5 7B Needs to Do

Potential Issues

Recommendation

Best of Both Worlds

Developer Tools

Tracing Imports & Dependencies

Terminal Commands

IDE Features

Specialized Tools

Quick One-Liners

Practical Examples

VS Code Keyboard Shortcuts

Multi-Cursor & Selection

Line Manipulation

Navigation

Selection Expansion

Search & Replace

Code Actions

Commenting

Editor Management

Top Recommendations for Speed

System Resource Monitoring

What the Monitor Tracks

Understanding GPU/VRAM Usage

When GPU/VRAM Shows 0%

When You WILL See GPU/VRAM Activity

Resource Usage by Provider