Async ingest jobs¶
Large ingests (whole GitHub repos, big local directories, batches of URLs) used to block the tool-use call for hours — and because the interactive session sat on that, it also quietly burned through Opus quota while waiting. As of v0.37, ingest is an async job: it enqueues, returns quickly, and a worker inside the MCP server processes it with Haiku, independent of whatever model powers your conversation.
TL;DR¶
# Enqueue an ingest from an agent / MCP client — returns either the
# result (small job) or a job handle (big job).
ingest(source="https://github.com/vllm-project/vllm", scope="docs")
# From the terminal:
alxia jobs list
alxia jobs status job-20260422-abcd1234efgh
alxia jobs tail job-20260422-abcd1234efgh # stream progress
alxia jobs cancel job-20260422-abcd1234efgh # cooperative cancel
Why this shape¶
Three problems with the old synchronous ingest:
- No visibility.
ingest()held the tool call for hours. The agent had no stdout to read, no partial output to share. You had to trust silence for 10h. That's the bug that triggered this. - Opus burn. An Opus session orchestrating long ingests stayed active the whole time, consuming Opus tokens while it waited on work that itself was running Haiku.
- No cancellation. Only way out was killing the MCP server, which orphaned state.
The async model fixes all three.
Architecture¶
agent / CLI → ingest() → enqueue_ingest() → jobs table (SQLite)
↑ claim
worker loop (asyncio task inside MCP server)
↓
runs ingest_file / ingest_repo
updates progress + run_ids
honours cancel between files
The worker is an asyncio task started alongside the MCP server (stdio or HTTP). One worker per server process, single concurrency — this is the smallest useful unit and keeps rate-limit math simple.
Jobs persist in SQLite. If the server dies mid-ingest, the next server
picks up queued work. Running jobs are left in running state until
the next worker claims them (future work: reclaim stuck jobs).
The ingest tool's new contract¶
ingest(source,
workspace=None,
topic=None,
no_merge=False,
scope="all", # or "docs"
wait_s=60) # 0 = always async
Behavior:
- Enqueues a job immediately, regardless of
wait_s. - Polls the job for up to
wait_sseconds. If it finishes in time, returns the full result inline (so short URL ingests still feel synchronous). - If it doesn't finish in time, returns a compact job handle with the
job_idand progress-so-far. Agent is expected to calljobs_statuslater.
Scope control¶
Repositories and directories default to scope="all" — every supported
file goes through the pipeline. For large codebases, use scope="docs"
to restrict to:
README*at the root- top-level
*.md,*.rst,*.txt - everything under
docs/,doc/,documentation/
Typical repo sizes with the two scopes:
| Repo | scope="all" |
scope="docs" |
|---|---|---|
vllm-project/vllm |
4,698 files / ~40h | 23 files / ~20min |
llm-d/llm-d |
467 files / ~6h | 12 files / ~10min |
Rule of thumb: default to docs for repos unless the user explicitly wants the codebase.
Model pinning¶
~/.alexandria/config.toml:
[jobs]
model = "haiku" # pinned on the ingest subprocess env
poll_interval_s = 1.0
default_wait_s = 60
The worker sets ALEXANDRIA_CLAUDE_MODEL=<jobs.model> on its process
environment before running the LLM work for each job. That env is
picked up by the Claude Code SDK provider in llm_ingest.py and
passed as --model to claude -p. This happens regardless of what the
MCP server was registered with — so accidental Opus ingestion is
impossible when jobs.model = "haiku".
Cancellation¶
Cancellation is cooperative. jobs_cancel (MCP) or
alxia jobs cancel <id> (CLI) marks the job as cancelled in the
database. The worker checks this flag between files and exits the
ingest loop cleanly. Already-committed wiki pages stay.
If the worker is stuck on a single long file (big PDF, slow LLM call), cancellation takes effect as soon as that file completes — not mid-file.
CLI reference¶
alxia jobs list # recent jobs in the current workspace
alxia jobs list -s running # only running
alxia jobs list -w other # specific workspace
alxia jobs list -n 100 # more rows
alxia jobs status <job_id> # full detail including progress + ETA
alxia jobs tail <job_id> # stream updates until terminal
alxia jobs cancel <job_id> # cooperative cancel
Observed behaviour you might wonder about¶
- Jobs that never leave
queued. The worker runs inside the MCP server. If no one has startedalxia mcp serve(or your agent has not connected to one), nothing processes the queue. Start a server and the queue drains. started_atset but no progress. Worker picked up the job and is in the initial scope-counting phase (cloning a repo, walking a directory). Progress updates begin once the first file is processed.- ETA shows
~XXh. Derived from files-done / elapsed-seconds; it gets more accurate as more files complete. Early ETAs are noisy.
Reliability (v0.37.3+)¶
Single worker per home — enforced by file lock¶
The worker grabs an exclusive fcntl lock on ~/.alexandria/jobs.lock
at startup. If another worker is already running (e.g. a previous
Claude Code session that didn't quit), the new process exits cleanly
rather than spawning a second worker that would race on the queue.
This removes the "N concurrent MCP servers → N workers hammering the queue → Claude rate-limit saturation" failure mode.
Heartbeat during long-running jobs¶
Every jobs.heartbeat_s seconds (default 30) a background task
touches updated_at on the in-progress job. So even a big ingest that
spends minutes inside claude -p shows recent activity — no
false-positive "stuck" reads from jobs_status.
Stale-job reclaim at worker startup¶
After acquiring the lock, the worker checks for running jobs whose
updated_at is older than jobs.stale_after_s (default 300). Those
are from a previous worker that died without marking them terminal —
they get reset to queued and picked up by the fresh worker.
Combined with the heartbeat, a healthy worker never trips the stale threshold; only crashed/killed workers leave reclaimable jobs.
Stage-aware messages¶
The job message field now advances through stages, not just "fetching" for the whole life of the job:
fetching <url> ← before the HTTP request
extracting <filename> ← after fetch, during LLM + verify
committed: <filename> ← after successful commit
jobs_status and alxia jobs list show the current stage at a glance.
Mid-URL cancellation¶
Cancel-before-side-effects works on single-URL ingests, not just repo
ingests. The worker checks should_cancel before fetch and before the
LLM extraction pass. A job cancelled between stages finalises as
cancelled with committed-so-far recorded in result.
The one place cancel still can't interrupt is inside the blocking LLM call itself — that's a future release (#4 in the TODO below).
Config knobs¶
[jobs]
model = "haiku" # LLM model pinned on the ingest subprocess
poll_interval_s = 1.0 # queue poll cadence when idle
default_wait_s = 60 # default `ingest(wait_s=...)` for MCP tool
heartbeat_s = 30.0 # how often updated_at gets touched while busy
stale_after_s = 300 # reclaim 'running' jobs idle longer than this
What's not yet built¶
- Parallel workers (per-workspace or shared). Today it's one at a time. The file lock guarantees this even when multiple MCP servers run.
- Mid-LLM-call cancellation. Cancel requests honour stage boundaries,
not the inside of a
claude -pinvocation. - Priority queue. Jobs run strictly FIFO.