Mimir

Configuration

Complete guide to configuring Mimir with clear, organized environment variables

Configuration in Mimir is done through environment variables in a .env file. This guide organizes them by required and optional settings.

Configuration Overview

Mimir uses environment variables organized into these categories:

  1. Server Configuration - API keys and server settings
  2. Database Configuration - PostgreSQL connection and settings
  3. GitHub Repositories - Where to fetch code and documentation
  4. Parser Configuration - What to extract from code
  5. Documentation Configuration - Documentation URL generation
  6. LLM Configuration - Embedding and chat model settings

1. Server Configuration

Server API Key (required)

MIMIR_SERVER_API_KEY=your-generated-api-key

This is your server's authentication key. It protects your API endpoints from unauthorized access. Generate one using npm run generate-apikey in the mimir-rag directory.

GitHub Webhook Secret (optional)

MIMIR_SERVER_GITHUB_WEBHOOK_SECRET=your-webhook-secret

When GitHub sends webhook events (like when code is pushed), this secret verifies that the request actually came from GitHub. Only needed if you want automatic ingestion when code changes in your repository.

Fallback Ingest Interval (optional)

MIMIR_SERVER_FALLBACK_INGEST_INTERVAL_MINUTES=60

If webhooks fail or aren't configured, this sets up a scheduled backup that periodically re-ingests your repositories. Useful for keeping your index fresh automatically.

2. Database Configuration

Mimir works with any PostgreSQL database that supports the pgvector extension (Supabase, Neon, self-hosted, etc.).

Database URL (required)

MIMIR_DATABASE_URL=postgresql://user:password@host:5432/database

Your PostgreSQL connection string. Mimir uses it to connect directly to your database and store/query vector embeddings.

Docker and managed Postgres (Supabase, Neon, etc.): If the container cannot reach your database, run with host network: docker run --rm --network host -v $(pwd)/.env:/app/.env:ro mimir-rag:local. That uses the host's network and usually clears connection issues. Alternatively, for Supabase use the Session Pooler connection string (port 6543) from Dashboard → Settings → Database → Connection Pooling → Session mode.

Embedding Dimension

The default schema uses vector(3072) for embeddings. If your embedding model uses a different dimension, update the database schema:

  1. Check your embedding model's dimension (e.g., OpenAI text-embedding-3-small uses 1536, text-embedding-3-large uses 3072)
  2. Update prisma/migrations/0_init/migration.sql to change vector(3072) to your model's dimension
  3. Update prisma/schema.prisma to change Unsupported("vector(3072)") to match
  4. Run migrations: make setup-db

Similarity Threshold (optional)

MIMIR_DATABASE_SIMILARITY_THRESHOLD=0.2

Default: 0.2
Range: 0.0 to 1.0

Controls how similar a document chunk must be to your query to be included in results. Lower values (0.1-0.3) return more results but may include less relevant content. Higher values (0.5-0.8) return fewer, more precise matches.

Match Count (optional)

MIMIR_DATABASE_MATCH_COUNT=10

Default: 10

Limits how many document chunks are returned per query from vector search. More chunks provide more context but increase API costs and response time. Fewer chunks are faster but may miss relevant information.

BM25 Match Count (optional)

MIMIR_DATABASE_BM25_MATCH_COUNT=10

Default: 10

Limits how many document chunks are returned per query from full-text (BM25) search. Used when hybrid search is enabled.

Enable Hybrid Search (optional)

MIMIR_DATABASE_ENABLE_HYBRID_SEARCH=true

Default: true

When enabled, combines vector similarity search with full-text (BM25) search for better results. Disable if you only want vector search.

3. GitHub Repository Configuration

Mimir fetches code and documentation from GitHub repositories. You can configure single or multiple repositories.

Single Repository

For most projects, start with a single repository that contains both code and documentation:

MIMIR_GITHUB_URL=https://github.com/your-org/your-repo
MIMIR_GITHUB_BRANCH=main
MIMIR_GITHUB_TOKEN=ghp_your_token_here

MIMIR_GITHUB_URL: The main repository URL. Mimir will fetch both code and documentation from here. Required for single-repo setup. Optional if using separate code/docs repos or multiple repos.

MIMIR_GITHUB_BRANCH: Which branch to fetch from. Defaults to main if not specified.

MIMIR_GITHUB_TOKEN: GitHub personal access token. Required for private repositories or to avoid rate limits on public repos.

MIMIR_GITHUB_DIRECTORY: Base directory to start from in the main repo. Optional.

MIMIR_GITHUB_INCLUDE_DIRECTORIES: Comma-separated list of directories to include from the main repo. Optional.

Separate Code and Documentation Repos

If your code and docs are in different repositories, use separate configuration:

# Code repository (TypeScript, Python, Rust, etc.)
MIMIR_GITHUB_CODE_URL=https://github.com/your-org/code-repo
MIMIR_GITHUB_CODE_DIRECTORY=src
MIMIR_GITHUB_CODE_INCLUDE_DIRECTORIES=src,lib

# Documentation repository (MDX files)
MIMIR_GITHUB_DOCS_URL=https://github.com/your-org/docs-repo
MIMIR_GITHUB_DOCS_DIRECTORY=docs
MIMIR_GITHUB_DOCS_INCLUDE_DIRECTORIES=docs,guides

Note: When using MIMIR_GITHUB_CODE_URL or MIMIR_GITHUB_DOCS_URL, those take precedence over MIMIR_GITHUB_URL for that type. MIMIR_GITHUB_URL is used as a fallback if neither MIMIR_GITHUB_CODE_URL nor MIMIR_GITHUB_DOCS_URL is set.

DIRECTORY: Base directory to start from. If your code is in src/, set this to src to avoid indexing root-level files.

INCLUDE_DIRECTORIES: Comma-separated list of directories to include. Useful when you only want specific folders indexed.

Multiple Repositories

For larger projects with multiple codebases or documentation sources, use numbered environment variables:

# ============================================
# CODE REPOSITORIES
# ============================================
MIMIR_GITHUB_CODE_REPO_1_URL=https://github.com/your-org/repo1
MIMIR_GITHUB_CODE_REPO_1_DIRECTORY=src
MIMIR_GITHUB_CODE_REPO_1_INCLUDE_DIRECTORIES=src,lib
MIMIR_GITHUB_CODE_REPO_1_EXCLUDE_PATTERNS=*.test.ts,test/

MIMIR_GITHUB_CODE_REPO_2_URL=https://github.com/your-org/repo2
MIMIR_GITHUB_CODE_REPO_2_DIRECTORY=packages

# ============================================
# DOCUMENTATION REPOSITORIES
# ============================================
MIMIR_GITHUB_DOCS_REPO_1_URL=https://github.com/your-org/docs1
MIMIR_GITHUB_DOCS_REPO_1_DIRECTORY=docs
MIMIR_GITHUB_DOCS_REPO_1_INCLUDE_DIRECTORIES=docs,guides
MIMIR_GITHUB_DOCS_REPO_1_BASE_URL=https://docs.example.com
MIMIR_GITHUB_DOCS_REPO_1_CONTENT_PATH=content/docs

MIMIR_GITHUB_DOCS_REPO_2_URL=https://github.com/your-org/docs2
MIMIR_GITHUB_DOCS_REPO_2_BASE_URL=https://docs2.example.com

Note: When using multiple repos (numbered variables), the single-repo variables (MIMIR_GITHUB_CODE_URL, MIMIR_GITHUB_DOCS_URL) are ignored. Number repos starting from 1 and increment sequentially (1, 2, 3, etc.). Each repo can have its own DIRECTORY, INCLUDE_DIRECTORIES, and EXCLUDE_PATTERNS settings.

EXCLUDE_PATTERNS: Comma-separated patterns to skip. Useful for excluding test files, build artifacts, or generated code.

BASE_URL (docs only): The public URL where your documentation is hosted. Used to generate clickable links in search results.

CONTENT_PATH (docs only): The path prefix in your repository where content lives. Used to correctly map repository paths to documentation URLs.

4. Parser Configuration

Control what code entities get extracted and indexed from your codebase.

Extract Variables (optional)

MIMIR_EXTRACT_VARIABLES=false

Default: false

Controls whether top-level variable declarations are extracted as separate entities. If false, only functions, classes, and interfaces are indexed. If true, variables like export const config = {...} are also indexed. Useful if you have important configuration objects or constants that developers frequently search for.

Note: Exported const functions (like export const myFunction = () => {}) are always extracted regardless of this setting.

Extract Methods (optional)

MIMIR_EXTRACT_METHODS=true

Default: true

Controls whether class methods are extracted as separate entities. If true, each method becomes its own searchable entity. If false, only the class itself is indexed. Disable if you have many small methods and want to reduce index size, or if you prefer searching at the class level.

Exclude Patterns (optional)

MIMIR_EXCLUDE_PATTERNS=*.test.ts,*.spec.ts,test/,__tests__/,tests/

Prevents test files and other non-production code from being indexed. This keeps your search results focused on actual implementation code and reduces index size.

Patterns supported:

  • File patterns: *.test.ts, *.spec.ts, *.d.ts
  • Directory patterns: test/, __tests__/, tests/, node_modules/

Common test patterns are excluded automatically if not specified.

Include Directories (optional)

MIMIR_GITHUB_INCLUDE_DIRECTORIES=src,lib,packages

Comma-separated list of directories to include when parsing. Only files in these directories will be indexed. Useful for large repositories where you only want specific folders.

5. Documentation Configuration (Optional)

Configure how documentation URLs are generated in search results.

Documentation Base URL (optional)

MIMIR_DOCS_BASE_URL=https://docs.example.com

The base URL where your documentation is hosted. Used to generate clickable links in search results when repository paths don't have per-repo BASE_URL configured.

Documentation Content Path (optional)

MIMIR_DOCS_CONTENT_PATH=content/docs

The path prefix in your repository where documentation content lives. Used to correctly map repository paths to documentation URLs.

6. LLM Configuration

Mimir uses LLMs for two purposes: creating embeddings (vector representations) and generating chat responses. You can use different providers for each.

Embedding Configuration (required)

Embeddings convert your documentation and code into vectors that can be searched semantically.

Embedding Provider (required)

MIMIR_LLM_EMBEDDING_PROVIDER=openai

Which provider to use. Options: openai, google, mistral

  • OpenAI: Fast, cost-effective, widely used. Good default choice.
  • Google: Alternative option, good quality embeddings.
  • Mistral: European provider, good for compliance requirements.

Embedding Model (required)

MIMIR_LLM_EMBEDDING_MODEL=text-embedding-3-small

The specific model to use. Different models have different quality/cost tradeoffs:

  • text-embedding-3-small: Fast and cheap, good for most use cases
  • text-embedding-3-large: Higher quality, more expensive
  • text-embedding-004 (Google): Alternative option

Embedding API Key (required)

MIMIR_LLM_EMBEDDING_API_KEY=sk-your-key-here

Your API key for the chosen provider. Required to make embedding API calls.

Chat Configuration (required)

Chat completions generate natural language answers from retrieved documentation.

Chat Provider (required)

MIMIR_LLM_CHAT_PROVIDER=openai

Which provider to use. Options: openai, google, anthropic, mistral

  • OpenAI: Fast, reliable, good default
  • Anthropic (Claude): High quality responses, better reasoning
  • Google (Gemini): Alternative option
  • Mistral: European provider

Chat Model (required)

MIMIR_LLM_CHAT_MODEL=gpt-4

The specific model. Examples:

  • OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
  • Anthropic: claude-3-opus, claude-3-sonnet, claude-3-haiku
  • Google: gemini-pro
  • Mistral: mistral-large

Chat API Key (required)

MIMIR_LLM_CHAT_API_KEY=sk-your-key-here

Your API key for the chosen provider.

Chat Temperature (optional)

MIMIR_LLM_CHAT_TEMPERATURE=0

Default: 0

Controls randomness (0.0 to 2.0). Lower values (0-0.3) give more deterministic, factual answers. Higher values (0.7-1.0) give more creative responses.

Chat Max Output Tokens (optional)

MIMIR_LLM_CHAT_MAX_OUTPUT_TOKENS=8000

Default: 8000

Maximum number of tokens the chat model can generate in a single response. Increase for longer responses, decrease to limit response length.

Custom Base URLs (optional)

MIMIR_LLM_EMBEDDING_BASE_URL=https://api.openai.com/v1
MIMIR_LLM_CHAT_BASE_URL=https://api.openai.com/v1

Override the default API base URL for your LLM provider. Useful for self-hosted models or custom API endpoints.

Mixing Providers

You can use different providers for embeddings and chat:

# Use OpenAI for embeddings (fast, cheap)
MIMIR_LLM_EMBEDDING_PROVIDER=openai
MIMIR_LLM_EMBEDDING_MODEL=text-embedding-3-small

# Use Anthropic for chat (high quality)
MIMIR_LLM_CHAT_PROVIDER=anthropic
MIMIR_LLM_CHAT_MODEL=claude-3-sonnet

This allows you to optimize for cost (cheap embeddings) and quality (better chat model) separately.

Complete Example

Here's a minimal .env file example with required settings:

# Server (Required)
MIMIR_SERVER_API_KEY=your-generated-api-key

# Database (Required)
MIMIR_DATABASE_URL=postgresql://user:password@host:5432/database

# GitHub - Single Repository
MIMIR_GITHUB_URL=https://github.com/your-org/your-repo
MIMIR_GITHUB_BRANCH=main
MIMIR_GITHUB_TOKEN=ghp_your_token_here

# LLM - Embeddings (Required)
MIMIR_LLM_EMBEDDING_PROVIDER=openai
MIMIR_LLM_EMBEDDING_MODEL=text-embedding-3-small
MIMIR_LLM_EMBEDDING_API_KEY=sk-your-openai-key

# LLM - Chat (Required)
MIMIR_LLM_CHAT_PROVIDER=openai
MIMIR_LLM_CHAT_MODEL=gpt-4
MIMIR_LLM_CHAT_API_KEY=sk-your-openai-key

For a complete example with all optional settings, see .env.example in the mimir-rag directory.

Next Steps