Code Intelligence Spec

Scope

This spec realizes Code Intelligence PRD, ADR-0004.03, and ADR-0006.09.

It defines the first Mission-native path for GitNexus-like code intelligence using Mission-owned scopes, SurrealDB-backed graph storage, and Agent execution semantic operations over open-mission-mcp.

This document is temporary. Durable vocabulary belongs in CONTEXT.md; durable decisions belong in ADRs. When implementation converges, fold stable details into permanent architecture docs and delete this working spec.

Authoritative Inputs

CONTEXT.md: canonical Mission language.
ADR-0003.03: State store transactions are the canonical write interface for Mission state.
ADR-0001.03: Entity classes own behavior.
ADR-0001.05: Entity commands are the canonical operator surface.
ADR-0006.06: open-mission-mcp is daemon-owned local MCP infrastructure.
ADR-0006.08: AgentExecution interaction journals record semantic runtime truth.
ADR-0006.09: open-mission-mcp exposes Agent execution semantic operations.
ADR-0004.03: Code intelligence index is SurrealDB-backed derived read material.
@flying-pillow/zod-surreal: Zod-first SurrealDB schema metadata and DDL primitives.

Target Runtime Shape

daemon startup
  -> MissionMcpServer
  -> AgentExecutionRegistry
  -> CodeIntelligenceService available for Repository preparation and semantic operations
  -> no eager repository indexing

Repository preparation
  -> Repository.initialize or completed Repository.setup
  -> CodeIntelligenceService.ensureIndex for the prepared Repository root
  -> CodeIndexer scans and extracts provider-backed material
  -> CodeGraphStore writes an active snapshot under the Repository `.mission/runtime`

AgentExecution launch
  -> create AgentExecution protocol descriptor
  -> register open-mission-mcp access
  -> semantic operation catalog filters tools by AgentExecution scope
  -> adapter receives MCP config

Agent calls code_search
  -> open-mission-mcp authorizes AgentExecution token
  -> validates code_search input
  -> AgentExecutionRegistry resolves active execution
  -> AgentExecution invokes AgentExecutionSemanticOperations
  -> semantic operation resolves one Code root from AgentExecution scope
  -> CodeIntelligenceService ensures usable index snapshot
  -> CodeGraphStore runs bounded query
  -> AgentExecution journal records bounded code-search observation
  -> structured result returns through MCP

Ownership Map

AgentExecutionSemanticOperations

Owns the Agent-facing semantic operation catalog.

Responsibilities:

define operation descriptors
define Zod input and result schemas
validate operation availability by AgentExecution scope
invoke operation handlers
record bounded daemon-observed AgentExecution observations
publish journal records for active executions

It must not parse source code, run raw SurrealQL inline, or decide repository/worktree index lifecycle directly.

MissionMcpServer

Owns MCP transport ingress only.

Required change: semantic operation tool registration and stdio bridge support must carry descriptor-provided input schemas. The current bridge shape that assumes every tool maps to AgentSignalPayloadSchema must be split into:

type MissionMcpToolDescriptor =
  | MissionMcpSignalToolDescriptor
  | MissionMcpSemanticOperationToolDescriptor;

Signal tools keep the existing signal wrapping path. Semantic operation tools pass operation input directly with transport fields (agentExecutionId, token, optional eventId) added by the bridge.

CodeIntelligenceService

Owns use cases over code intelligence indexes.

Suggested methods:

type CodeIntelligenceService = {
  ensureIndex(input: EnsureCodeIndexInput): Promise<CodeIndexSnapshot>;
  search(input: CodeSearchInput): Promise<CodeSearchResult>;
  readSymbolContext(input: SymbolContextInput): Promise<SymbolContextResult>;
  analyzeImpact(input: ImpactAnalysisInput): Promise<ImpactAnalysisResult>;
  analyzeChangedCode(input: ChangedCodeImpactInput): Promise<ChangedCodeImpactResult>;
  readRouteImpact(input: RouteImpactInput): Promise<RouteImpactResult>;
  readToolContext(input: ToolContextInput): Promise<ToolContextResult>;
};

It coordinates index freshness, root resolution, graph store reads, Git adapter reads, and result shaping. It does not own MCP transport or AgentExecution journaling.

CodeGraphStore

Owns physical graph persistence and safe graph queries.

It provisions SurrealDB from Mission-owned zod-surreal model definitions. Provisioning must compile the code graph Zod schemas into a zod-surreal model snapshot and render SurrealQL statements from that snapshot. Store code must not depend on hand-maintained SurrealQL table definitions as the source of truth.

Suggested methods:

type CodeGraphStore = {
  provision(): Promise<void>;
  replaceIndex(input: ReplaceCodeIndexInput): Promise<CodeIndexSnapshot>;
  readSnapshot(input: ReadCodeIndexSnapshotInput): Promise<CodeIndexSnapshot | undefined>;
  search(input: StoreCodeSearchInput): Promise<StoreCodeSearchRow[]>;
  readSymbolContext(input: StoreSymbolContextInput): Promise<StoreSymbolContext>;
  traverseImpact(input: StoreImpactTraversalInput): Promise<StoreImpactTraversal>;
  readRouteImpact(input: StoreRouteImpactInput): Promise<StoreRouteImpact>;
  readToolContext(input: StoreToolContextInput): Promise<StoreToolContext>;
};

The store owns parameter binding, table naming, relation table traversal, query limits, and SurrealDB-specific behavior. No caller receives a raw SurrealDB client.

SurrealDB provisioning should use generated statements from the code graph schema module. A package script such as pnpm --filter @flying-pillow/open-mission-core generate:code-graph-schema should refresh any checked-in .surql fixture or generated schema file. Tests should fail when generated SurQL drifts from the committed fixture.

CodeIndexer

Owns source scanning and graph build output.

Responsibilities:

discover indexable files
apply ignore, generated, binary, oversized, and sensitivity filters
parse language-specific syntax
emit code files, symbols, relations, routes, tools, processes, and clusters
produce deterministic output for fixture tests

Phase one should favor TypeScript/JavaScript support for Mission itself. Extraction must be parser-backed rather than regex-driven. The first TS/JS slice may use the TypeScript compiler API because it is already in the runtime toolchain; broader language coverage should follow a GitNexus-like provider model with Tree-sitter grammars, per-language query/capture configuration, and explicit import-resolution hooks behind the same indexer contract.

The CodeIndexer orchestration must be language-provider shaped from the start:

scan eligible text files independently from semantic parser support
honor .gitignore rules while always excluding Mission runtime state such as .mission/
detect language from a centralized registry of extensions and known filenames
write Code file records for supported, unsupported, and unknown text files
delegate symbols, imports, calls, routes, tools, type facts, and scope-resolution facts only to providers that advertise those capabilities
keep provider failures isolated so one unavailable parser does not prevent file-level indexing for the Code root

The TypeScript/JavaScript compiler provider is only the first provider. Future Tree-sitter providers should plug into the same extraction contract rather than adding language branches to the scanner.

The structural graph target is defined by Code Intelligence Node Edge Graph Spec. The indexer should therefore emit canonical node and edge records with explicit objectKind and relationKind vocabulary rather than treating table names such as CodeFile or CodeSymbol as the durable type system.

Markdown and other non-code text files should still participate in baseline indexing as file/document nodes. Optional Agent-assisted semantic enrichment for those files may run behind the indexer as a provider capability, but it must remain disabled by default and must not be required for a valid graph snapshot.

Code Root Boundary

Repository roots and Mission worktree roots both resolve to Code roots before indexing. A Mission task should resolve to the Mission worktree root when one exists because the active work may differ from the main Repository root, but the indexer, graph store, schemas, and semantic operations treat the resolved path as the same Code root concept.

Root resolution rules belong to a small scope resolver used by semantic operations:

type CodeIntelligenceScopeRoot = {
  rootPath: string;
  repositoryRootPath?: string;
  missionId?: string;
  taskId?: string;
};

repositoryRootPath, missionId, and taskId are resolver context only. They must not become graph table prefixes, alternate schema families, or separate indexer modes.

Schema Model

Use Zod v4 schemas with zod-surreal metadata for every table, relation table, indexed field, analyzer, and generated SurrealDB definition owned by the code intelligence index.

Mission owns these schemas. @flying-pillow/zod-surreal remains a standalone generic package that supplies metadata registries, model compilation, deterministic SurrealQL DDL generation, typed query helpers, and provisioning primitives. It must not import Mission code or Mission vocabulary.

The code graph schema module should export:

Zod schemas for each code graph record shape.
z.infer TypeScript types for validated records.
zod-surreal model definitions for every SurrealDB table and relation table.
a compiled schema snapshot helper for tests and provisioning.
a generated SurQL helper that calls compileDefineStatements(compileSchema({ models })).

The exact field metadata may change during implementation, but the direction is fixed: Zod schemas plus zod-surreal metadata generate the SurrealQL DDL.

The canonical persisted graph shape is now defined by Code Intelligence Node Edge Graph Spec, not by the older CodeFile / CodeSymbol / CodeRelation table split. The broader code-intelligence spec should therefore treat the node-edge model as authoritative for:

code_index_snapshot ownership;
code_object node storage with explicit objectKind;
code_relation edge storage with explicit relationKind;
snapshotId as the canonical graph-record ownership field.

Higher-level vocabulary such as routes, tools, processes, and clusters remains useful at the semantic-operation layer, but their persisted structural representation should follow the node-edge graph model unless a later spec or ADR explicitly specializes them.

Fields:

id
snapshotId
name
handlerSymbolId
handlerFilePath
description
inputSchemaSummary

CodeProcess And CodeCluster

These are heuristic. Store confidence and derivation metadata, and never treat them as Mission workflow truth.

Semantic Operation Schemas

All operation schemas live with AgentExecutionSemanticOperations or operation-specific modules imported by it. Export TypeScript types with z.infer only.

code_search

Input:

{
  query: string;
  limit?: number;
  includeKinds?: CodeSearchKind[];
  freshness?: 'allow-stale' | 'prefer-fresh' | 'require-fresh';
  eventId?: string;
}

Result includes snapshot metadata, staleness, ranked hits, and suggested follow-up operations.

symbol_context

Input:

{
  symbol?: string;
  symbolId?: string;
  filePath?: string;
  includeProcesses?: boolean;
  eventId?: string;
}

Result includes disambiguation when needed. Do not silently pick among multiple same-name symbols without returning alternatives unless symbolId is provided.

impact_analysis

Input:

{
  target?: string;
  id?: string;
  filePath?: string;
  direction: 'upstream' | 'downstream';
  maxDepth?: number;
  relationTypes?: CodeRelationType[];
  minConfidence?: number;
  includeTests?: boolean;
  freshness?: 'prefer-fresh' | 'require-fresh';
  eventId?: string;
}

Result groups affected nodes by depth and includes affected processes/routes/tools/clusters.

changed_code_impact

Input:

{
  scope?: 'unstaged' | 'staged' | 'all' | 'compare';
  baseRef?: string;
  maxDepth?: number;
  eventId?: string;
}

Result maps diff hunks to symbols and then to impact results.

route_impact

Input:

{
  route?: string;
  filePath?: string;
  eventId?: string;
}

Result includes handler, consumers, response keys, middleware, process links, and mismatch warnings when response shape support exists.

tool_context

Input:

{
  tool?: string;
  eventId?: string;
}

Result includes detected tool definitions and handlers.

Semantic Operation Journal Records

Each accepted operation records a bounded daemon-observed AgentExecution observation:

type CodeIntelligenceOperationObservation = {
  observationKind:
    | 'code-search'
    | 'symbol-context-read'
    | 'impact-analysis-read'
    | 'changed-code-impact-read'
    | 'route-impact-read'
    | 'tool-context-read';
  operationName: string;
  snapshotId: string;
  codeRootId: string;
  rootPathHash: string;
  querySummary: Record<string, unknown>;
  resultSummary: Record<string, unknown>;
  stale: boolean;
};

Do not store full result payloads in journal observations by default. Store enough to audit that the Agent used code intelligence and what target/query it used.

Index Lifecycle

Indexing is owned by the Code intelligence service, not by AgentExecution. AgentExecution can cause indexing to be needed by calling a semantic operation, but it does not parse files, write graph records, watch the filesystem, or decide index freshness policy.

Triggers

The service should support three trigger paths:

Repository preparation trigger: after Repository setup or initialization produces valid Mission repository control state, the daemon may resolve the Repository root as a Code root and enqueue an index build with prefer-fresh priority.
Mission worktree trigger: after a Mission worktree is materialized and initialized, the daemon may resolve the Mission worktree root as a Code root and enqueue an index build because active Mission code may diverge from the Repository root.
Semantic operation trigger: when an Agent execution calls code_search, symbol_context, impact_analysis, or a related operation, the semantic operation delegates to ensureIndex; ensureIndex decides whether to use, rebuild, enqueue, or reject based on freshness policy.

Repository-scoped Agent executions may therefore start after repository preparation without waiting for a complete index. Their first code intelligence call can use a warm index if one exists or request a fresh one through ensureIndex. Task-scoped Agent executions should resolve to the Mission worktree Code root when one exists.

Ensure Index

ensureIndex decides whether an index exists and whether its freshness is acceptable for the requested operation.

Freshness policy:

allow-stale: use existing snapshot and report staleness.
prefer-fresh: rebuild if cheap and no active rebuild is running; otherwise return stale with warning.
require-fresh: rebuild or return an explicit stale/unavailable result.

Rebuild

The rebuild flow:

resolve scoped root
  -> compute root fingerprint
  -> scan eligible files
  -> parse source files
  -> emit graph records
  -> replace graph records for index id atomically in store
  -> write CodeIndexSnapshot
  -> return snapshot

If graph replacement cannot be atomic in the first SurrealDB implementation, write a new snapshot id and mark it active only after all records load. Readers use the active snapshot id.

Updates

Index updates are snapshot replacements, not in-place domain mutations. The first implementation should rebuild the active snapshot when the root fingerprint changes. Later incremental updates may re-parse changed files only, but they must still publish a coherent snapshot and keep stale readers away from half-written graph state.

Freshness inputs should include Git commit, dirty worktree status, indexed file hashes, and indexer version. For Mission worktrees, dirty file changes matter even when the branch commit has not changed.

File watching is optional and should be treated as an optimization. The authoritative freshness check remains ensureIndex, because daemon restart, missed watcher events, Git operations, or external editors can all bypass a watcher.

Background Work

Index rebuilds may be long-running. Phase one can run rebuilds synchronously for tests and explicit operation calls. A later daemon background worker may prebuild indexes after Repository initialization or Mission worktree creation.

Query Safety

Use parameterized SurrealDB queries.
Keep label/table/relation names from fixed schema registries.
Enforce max result limits and max traversal depth.
Reject path traversal and absolute paths from operation input.
Redact sensitive path names or content according to the indexer’s sensitive-file policy.
Return structured errors, not stack traces.

Open Mission Web Visualization

Open Mission web will eventually render a visual representation of the Code graph. This is a surface feature over daemon-owned code intelligence, not a separate graph product or an alternate graph authority.

The visualization should read from bounded daemon APIs that expose active snapshot metadata, visible nodes, visible relations, impact paths, and freshness status. It must not receive a raw SurrealDB client, run raw SurrealQL, mutate index records, create relationship semantics, or choose Code roots independently of daemon scope resolution.

Baseline interactions:

select a Code root snapshot
filter by file, symbol, relation type, impact depth, stale/fresh status, or search query
inspect a node or relation summary
trace an impact path returned by semantic operations
open a related file or Artifact through existing daemon/Open Mission navigation affordances

This view should follow the Agent-facing semantic operation model. The Agent path proves the graph, scope, staleness, and runtime fact contracts first; Open Mission web visualization becomes a read-only operator lens after those contracts are stable.

Implementation Sequence

Update open-mission mcp connect bridge to support semantic operation descriptors.
Refactor read_artifact to prove signal and semantic operation tools both work through the bridge.
Add code intelligence schemas and graph store interface with an in-memory fake for tests.
Add zod-surreal model definitions for code index snapshots, files, symbols, relation tables, routes, tools, processes, and clusters.
Add a generator that compiles those models and emits deterministic SurQL provisioning statements.
Add SurrealDB-backed graph store provisioning behind the interface using generated statements.
Add parser-backed TypeScript/JavaScript indexer fixture support.
Implement code_search and symbol_context over fake store, then SurrealDB store.
Implement impact_analysis traversal.
Add changed_code_impact, route_impact, and tool_context.
Add staleness diagnostics and optional explicit rebuild command.
Add read-only Open Mission web visual graph APIs and UI after the Agent path is stable.

Testing

Required tests:

semantic operation descriptor materialization includes signal and semantic operation tools.
stdio bridge proxies semantic operation inputs without signal wrapping.
unauthorized semantic operation calls reject before service invocation.
scope resolver chooses the scoped Code root for task-scoped executions.
path traversal and out-of-scope roots reject.
code graph schema compiles through zod-surreal.
generated SurQL provisioning output is deterministic and covered by a committed snapshot or fixture comparison.
fixture index output is deterministic.
code_search returns bounded ranked results.
symbol_context returns disambiguation for duplicate names.
impact_analysis respects depth, direction, relation filters, and confidence.
stale indexes are reported in operation results.
bounded daemon-observed observations are appended for accepted operations.
Open Mission web visual graph APIs expose only daemon-owned read models and reject mutation or raw query behavior.

Open Implementation Questions

Should the first graph store share the daemon in-memory datastore instance or use a separate SurrealDB database namespace behind the same daemon process?
Which TypeScript parser path should phase one use: TypeScript compiler API, tree-sitter, or a smaller local extractor tuned for Mission’s codebase?
Should code graph records include snippets, or should snippets always be fetched through read_artifact after graph narrowing?
How should index rebuild progress and visual graph loading state be surfaced to Open Mission without making Open Mission the owner of index lifecycle?