Configuration
Complete guide to configuring Mimir with clear, organized environment variables
Configuration in Mimir is done through environment variables in a .env file. This guide organizes them by required and optional settings.
Configuration Overview
Mimir uses environment variables organized into these categories:
- Server Configuration - API keys and server settings
- Database Configuration - PostgreSQL connection and settings
- GitHub Repositories - Where to fetch code and documentation
- Parser Configuration - What to extract from code
- Documentation Configuration - Documentation URL generation
- LLM Configuration - Embedding and chat model settings
1. Server Configuration
Server API Key (required)
MIMIR_SERVER_API_KEY=your-generated-api-keyThis is your server's authentication key. It protects your API endpoints from unauthorized access. Generate one using npm run generate-apikey in the mimir-rag directory.
GitHub Webhook Secret (optional)
MIMIR_SERVER_GITHUB_WEBHOOK_SECRET=your-webhook-secretWhen GitHub sends webhook events (like when code is pushed), this secret verifies that the request actually came from GitHub. Only needed if you want automatic ingestion when code changes in your repository.
Fallback Ingest Interval (optional)
MIMIR_SERVER_FALLBACK_INGEST_INTERVAL_MINUTES=60If webhooks fail or aren't configured, this sets up a scheduled backup that periodically re-ingests your repositories. Useful for keeping your index fresh automatically.
2. Database Configuration
Mimir works with any PostgreSQL database that supports the pgvector extension (Supabase, Neon, self-hosted, etc.).
Database URL (required)
MIMIR_DATABASE_URL=postgresql://user:password@host:5432/databaseYour PostgreSQL connection string. Mimir uses it to connect directly to your database and store/query vector embeddings.
Docker and managed Postgres (Supabase, Neon, etc.): If the container cannot reach your database, run with host network: docker run --rm --network host -v $(pwd)/.env:/app/.env:ro mimir-rag:local. That uses the host's network and usually clears connection issues. Alternatively, for Supabase use the Session Pooler connection string (port 6543) from Dashboard → Settings → Database → Connection Pooling → Session mode.
Embedding Dimension
The default schema uses vector(3072) for embeddings. If your embedding model uses a different dimension, update the database schema:
- Check your embedding model's dimension (e.g., OpenAI
text-embedding-3-smalluses 1536,text-embedding-3-largeuses 3072) - Update
prisma/migrations/0_init/migration.sqlto changevector(3072)to your model's dimension - Update
prisma/schema.prismato changeUnsupported("vector(3072)")to match - Run migrations:
make setup-db
Similarity Threshold (optional)
MIMIR_DATABASE_SIMILARITY_THRESHOLD=0.2Default: 0.2
Range: 0.0 to 1.0
Controls how similar a document chunk must be to your query to be included in results. Lower values (0.1-0.3) return more results but may include less relevant content. Higher values (0.5-0.8) return fewer, more precise matches.
Match Count (optional)
MIMIR_DATABASE_MATCH_COUNT=10Default: 10
Limits how many document chunks are returned per query from vector search. More chunks provide more context but increase API costs and response time. Fewer chunks are faster but may miss relevant information.
BM25 Match Count (optional)
MIMIR_DATABASE_BM25_MATCH_COUNT=10Default: 10
Limits how many document chunks are returned per query from full-text (BM25) search. Used when hybrid search is enabled.
Enable Hybrid Search (optional)
MIMIR_DATABASE_ENABLE_HYBRID_SEARCH=trueDefault: true
When enabled, combines vector similarity search with full-text (BM25) search for better results. Disable if you only want vector search.
3. GitHub Repository Configuration
Mimir fetches code and documentation from GitHub repositories. You can configure single or multiple repositories.
Single Repository
For most projects, start with a single repository that contains both code and documentation:
MIMIR_GITHUB_URL=https://github.com/your-org/your-repo
MIMIR_GITHUB_BRANCH=main
MIMIR_GITHUB_TOKEN=ghp_your_token_hereMIMIR_GITHUB_URL: The main repository URL. Mimir will fetch both code and documentation from here. Required for single-repo setup. Optional if using separate code/docs repos or multiple repos.
MIMIR_GITHUB_BRANCH: Which branch to fetch from. Defaults to main if not specified.
MIMIR_GITHUB_TOKEN: GitHub personal access token. Required for private repositories or to avoid rate limits on public repos.
MIMIR_GITHUB_DIRECTORY: Base directory to start from in the main repo. Optional.
MIMIR_GITHUB_INCLUDE_DIRECTORIES: Comma-separated list of directories to include from the main repo. Optional.
Separate Code and Documentation Repos
If your code and docs are in different repositories, use separate configuration:
# Code repository (TypeScript, Python, Rust, etc.)
MIMIR_GITHUB_CODE_URL=https://github.com/your-org/code-repo
MIMIR_GITHUB_CODE_DIRECTORY=src
MIMIR_GITHUB_CODE_INCLUDE_DIRECTORIES=src,lib
# Documentation repository (MDX files)
MIMIR_GITHUB_DOCS_URL=https://github.com/your-org/docs-repo
MIMIR_GITHUB_DOCS_DIRECTORY=docs
MIMIR_GITHUB_DOCS_INCLUDE_DIRECTORIES=docs,guidesNote: When using MIMIR_GITHUB_CODE_URL or MIMIR_GITHUB_DOCS_URL, those take precedence over MIMIR_GITHUB_URL for that type. MIMIR_GITHUB_URL is used as a fallback if neither MIMIR_GITHUB_CODE_URL nor MIMIR_GITHUB_DOCS_URL is set.
DIRECTORY: Base directory to start from. If your code is in src/, set this to src to avoid indexing root-level files.
INCLUDE_DIRECTORIES: Comma-separated list of directories to include. Useful when you only want specific folders indexed.
Multiple Repositories
For larger projects with multiple codebases or documentation sources, use numbered environment variables:
# ============================================
# CODE REPOSITORIES
# ============================================
MIMIR_GITHUB_CODE_REPO_1_URL=https://github.com/your-org/repo1
MIMIR_GITHUB_CODE_REPO_1_DIRECTORY=src
MIMIR_GITHUB_CODE_REPO_1_INCLUDE_DIRECTORIES=src,lib
MIMIR_GITHUB_CODE_REPO_1_EXCLUDE_PATTERNS=*.test.ts,test/
MIMIR_GITHUB_CODE_REPO_2_URL=https://github.com/your-org/repo2
MIMIR_GITHUB_CODE_REPO_2_DIRECTORY=packages
# ============================================
# DOCUMENTATION REPOSITORIES
# ============================================
MIMIR_GITHUB_DOCS_REPO_1_URL=https://github.com/your-org/docs1
MIMIR_GITHUB_DOCS_REPO_1_DIRECTORY=docs
MIMIR_GITHUB_DOCS_REPO_1_INCLUDE_DIRECTORIES=docs,guides
MIMIR_GITHUB_DOCS_REPO_1_BASE_URL=https://docs.example.com
MIMIR_GITHUB_DOCS_REPO_1_CONTENT_PATH=content/docs
MIMIR_GITHUB_DOCS_REPO_2_URL=https://github.com/your-org/docs2
MIMIR_GITHUB_DOCS_REPO_2_BASE_URL=https://docs2.example.comNote: When using multiple repos (numbered variables), the single-repo variables (MIMIR_GITHUB_CODE_URL, MIMIR_GITHUB_DOCS_URL) are ignored. Number repos starting from 1 and increment sequentially (1, 2, 3, etc.). Each repo can have its own DIRECTORY, INCLUDE_DIRECTORIES, and EXCLUDE_PATTERNS settings.
EXCLUDE_PATTERNS: Comma-separated patterns to skip. Useful for excluding test files, build artifacts, or generated code.
BASE_URL (docs only): The public URL where your documentation is hosted. Used to generate clickable links in search results.
CONTENT_PATH (docs only): The path prefix in your repository where content lives. Used to correctly map repository paths to documentation URLs.
4. Parser Configuration
Control what code entities get extracted and indexed from your codebase.
Extract Variables (optional)
MIMIR_EXTRACT_VARIABLES=falseDefault: false
Controls whether top-level variable declarations are extracted as separate entities. If false, only functions, classes, and interfaces are indexed. If true, variables like export const config = {...} are also indexed. Useful if you have important configuration objects or constants that developers frequently search for.
Note: Exported const functions (like export const myFunction = () => {}) are always extracted regardless of this setting.
Extract Methods (optional)
MIMIR_EXTRACT_METHODS=trueDefault: true
Controls whether class methods are extracted as separate entities. If true, each method becomes its own searchable entity. If false, only the class itself is indexed. Disable if you have many small methods and want to reduce index size, or if you prefer searching at the class level.
Exclude Patterns (optional)
MIMIR_EXCLUDE_PATTERNS=*.test.ts,*.spec.ts,test/,__tests__/,tests/Prevents test files and other non-production code from being indexed. This keeps your search results focused on actual implementation code and reduces index size.
Patterns supported:
- File patterns:
*.test.ts,*.spec.ts,*.d.ts - Directory patterns:
test/,__tests__/,tests/,node_modules/
Common test patterns are excluded automatically if not specified.
Include Directories (optional)
MIMIR_GITHUB_INCLUDE_DIRECTORIES=src,lib,packagesComma-separated list of directories to include when parsing. Only files in these directories will be indexed. Useful for large repositories where you only want specific folders.
5. Documentation Configuration (Optional)
Configure how documentation URLs are generated in search results.
Documentation Base URL (optional)
MIMIR_DOCS_BASE_URL=https://docs.example.comThe base URL where your documentation is hosted. Used to generate clickable links in search results when repository paths don't have per-repo BASE_URL configured.
Documentation Content Path (optional)
MIMIR_DOCS_CONTENT_PATH=content/docsThe path prefix in your repository where documentation content lives. Used to correctly map repository paths to documentation URLs.
6. LLM Configuration
Mimir uses LLMs for two purposes: creating embeddings (vector representations) and generating chat responses. You can use different providers for each.
Embedding Configuration (required)
Embeddings convert your documentation and code into vectors that can be searched semantically.
Embedding Provider (required)
MIMIR_LLM_EMBEDDING_PROVIDER=openaiWhich provider to use. Options: openai, google, mistral
- OpenAI: Fast, cost-effective, widely used. Good default choice.
- Google: Alternative option, good quality embeddings.
- Mistral: European provider, good for compliance requirements.
Embedding Model (required)
MIMIR_LLM_EMBEDDING_MODEL=text-embedding-3-smallThe specific model to use. Different models have different quality/cost tradeoffs:
text-embedding-3-small: Fast and cheap, good for most use casestext-embedding-3-large: Higher quality, more expensivetext-embedding-004(Google): Alternative option
Embedding API Key (required)
MIMIR_LLM_EMBEDDING_API_KEY=sk-your-key-hereYour API key for the chosen provider. Required to make embedding API calls.
Chat Configuration (required)
Chat completions generate natural language answers from retrieved documentation.
Chat Provider (required)
MIMIR_LLM_CHAT_PROVIDER=openaiWhich provider to use. Options: openai, google, anthropic, mistral
- OpenAI: Fast, reliable, good default
- Anthropic (Claude): High quality responses, better reasoning
- Google (Gemini): Alternative option
- Mistral: European provider
Chat Model (required)
MIMIR_LLM_CHAT_MODEL=gpt-4The specific model. Examples:
- OpenAI:
gpt-4,gpt-4-turbo,gpt-3.5-turbo - Anthropic:
claude-3-opus,claude-3-sonnet,claude-3-haiku - Google:
gemini-pro - Mistral:
mistral-large
Chat API Key (required)
MIMIR_LLM_CHAT_API_KEY=sk-your-key-hereYour API key for the chosen provider.
Chat Temperature (optional)
MIMIR_LLM_CHAT_TEMPERATURE=0Default: 0
Controls randomness (0.0 to 2.0). Lower values (0-0.3) give more deterministic, factual answers. Higher values (0.7-1.0) give more creative responses.
Chat Max Output Tokens (optional)
MIMIR_LLM_CHAT_MAX_OUTPUT_TOKENS=8000Default: 8000
Maximum number of tokens the chat model can generate in a single response. Increase for longer responses, decrease to limit response length.
Custom Base URLs (optional)
MIMIR_LLM_EMBEDDING_BASE_URL=https://api.openai.com/v1
MIMIR_LLM_CHAT_BASE_URL=https://api.openai.com/v1Override the default API base URL for your LLM provider. Useful for self-hosted models or custom API endpoints.
Mixing Providers
You can use different providers for embeddings and chat:
# Use OpenAI for embeddings (fast, cheap)
MIMIR_LLM_EMBEDDING_PROVIDER=openai
MIMIR_LLM_EMBEDDING_MODEL=text-embedding-3-small
# Use Anthropic for chat (high quality)
MIMIR_LLM_CHAT_PROVIDER=anthropic
MIMIR_LLM_CHAT_MODEL=claude-3-sonnetThis allows you to optimize for cost (cheap embeddings) and quality (better chat model) separately.
Complete Example
Here's a minimal .env file example with required settings:
# Server (Required)
MIMIR_SERVER_API_KEY=your-generated-api-key
# Database (Required)
MIMIR_DATABASE_URL=postgresql://user:password@host:5432/database
# GitHub - Single Repository
MIMIR_GITHUB_URL=https://github.com/your-org/your-repo
MIMIR_GITHUB_BRANCH=main
MIMIR_GITHUB_TOKEN=ghp_your_token_here
# LLM - Embeddings (Required)
MIMIR_LLM_EMBEDDING_PROVIDER=openai
MIMIR_LLM_EMBEDDING_MODEL=text-embedding-3-small
MIMIR_LLM_EMBEDDING_API_KEY=sk-your-openai-key
# LLM - Chat (Required)
MIMIR_LLM_CHAT_PROVIDER=openai
MIMIR_LLM_CHAT_MODEL=gpt-4
MIMIR_LLM_CHAT_API_KEY=sk-your-openai-keyFor a complete example with all optional settings, see .env.example in the mimir-rag directory.
Next Steps
- Learn about Deployment options
- Check the API Reference for programmatic access
- Set up MCP integration for AI assistants