Where the engineering lives
The System chapter walks the architecture end to end. This one picks the five pieces where the engineering was hardest to get right, and walks them at the level a reviewer who wants to push back would need.
The syllogism node
Legal reasoning has a classical shape: a major premise (the norm), a minor premise (the facts), and a conclusion that subsumes one under the other. Most legal AI projects treat rulings as text and lose this structure on ingestion. Silo materializes it.
The syllogism is a first-class node in the graph — labelled
Subsuncao (the Portuguese word for subsumption) — created every time a decision actually
performs the operation. Its properties are the premises and the result: artigo_raw and
norma_raw hold the cited article and its parent norm (the major premise); interpretacao holds the specific reading the court applied in this case; resultado holds the conclusion;
confianca tracks the extractor's confidence in the reading.
The node is wired to the rest of the graph in three places. A
Decisao realizes it via REALIZA_SUBSUNCAO. The subsumption points at
the norm it applied via
SOBRE_NORMA, connecting to a DispositivoLegal. The facts the decision
weighed sit one hop away through Decisao's own
CONSIDERA_FATO edges to FatoRelevante.
What this unlocks is a class of queries impossible over flat text.
"Which rulings have applied Article X with interpretation Y, and with what result?" is a
single traversal over Subsuncao nodes.
"Which interpretations of this norm get reversed on appeal?"
is two hops. "Which panels tend to read this article narrowly?"
joins through RELATADA_POR. None of these can be expressed over raw passages.
The syllogism node is why the graph is load-bearing, not decorative. It is where the reasoning actually lives.
Cross-tribunal entity resolution
Without cross-tribunal entity resolution, the graph is really several graphs: one per tribunal, with no edges between them. A legal concept like "inversão do ônus da prova" that appears in a TJPR ruling would be a completely different node from the same concept in an STJ decision, and the litigator question "how has this doctrine been treated across tribunals?" would be structurally unanswerable.
Silo resolves the three instance-heavy types — Criterio,
Tese, and FatoRelevante — with a two-stage pipeline. First, sentence-level
embeddings are computed per instance, candidates above a 0.92 cosine-similarity threshold are short-listed,
and candidates between 0.85 and 0.92 are flagged for human review rather than silently merged. Second,
Sonnet 4.6 adjudicates each short-listed pair, rejecting false positives that the embedding space
happens to pull close but that mean different things in context.
When a pair is confirmed, a SAME_AS edge is written. The canonical instance is picked
by a three-level tie-breaker:
support_count (how many decisions reference it) wins first; a hard-coded tribunal rank
(STF > STJ > TRF > TJ) wins second; text length wins third.
The graph currently holds roughly 6,294 SAME_AS edges generated by this pipeline. Every
edge carries provenance identifying which decisions sat on each side of the merge, so the resolution
history is auditable after the fact.
Ministros and judges are normalized separately, via deterministic name-normalization on the Julgador.nome_normalized
field — not through the SAME_AS pipeline.
Hybrid search
Neither lexical search nor vector retrieval alone answers real legal queries. Lexical search catches exact token overlap — strong for "Article 20 of the CPC," useless for "reversal of the burden of proof" when the ruling uses a synonym. Vector retrieval catches semantic similarity — strong for paraphrase, weak when the question asks for a specific statute or party name that the embedding collapses into its neighborhood. Neither knows anything about the structure of the graph.
Silo's retriever combines three signals into a single ranking. The lexical signal handles exact-token precision. The semantic signal handles paraphrase and concept matching. The structural signal — derived from the graph — gives credit to candidate documents whose decisions match the query along axes a similarity metric cannot see: which article the decision interprets, which criterion it applies, which panel decided it, which precedents it cites.
The three signals are normalized, weighted, and fused into a final score, with the lexical and semantic components carrying most of the weight and the structural boost acting as a tie-breaker that pulls the right legal documents to the top when the surface text alone is ambiguous.
What matters more than the specific weights is the discipline: every score in Silo carries a breakdown identifying exactly which signal contributed how much. A reviewer who does not trust the ranking can inspect the breakdown and see why a document ranked where it did. That is the price of traceability — every component of every score is readable — and it stays true regardless of which retrieval engine is running underneath.
The MCP tool surface
The Model Context Protocol is Anthropic's standard for letting any LLM client call out to external tools and data with a stable schema. Silo's intelligence is exposed as 45 MCP tools: structural search across the graph, citation chain traversal, minister and panel profiling, optimal argument retrieval, contested criteria, gap detection, dispositivo divergence, case diagnosis briefs, and the case analysis proxies that pipe a raw PDF through to the autonomous agent and back. The server runs production OAuth 2.0, tier enforcement (free versus pro), and daily rate limiting per user.
What MCP unlocks is the contract between intelligence and orchestration. Before MCP, integrating a domain-specific reasoning system into a general-purpose model meant choosing between a brittle tool-use prompt or a proprietary chat UI. MCP made the tool surface a first-class interface — every tool has a name, a schema, and a contract — and made the same Silo backend addressable from Claude, ChatGPT, or a custom agent without code changes.
The ingestion pipeline
Pipeline v2.3 is the offline batch system that builds the graph from raw legal corpora. It has seven phases — F0 through F6 — and each is designed to be independently rerunnable, validated against the previous phase, and stamped with provenance.
F0 ingests, validates, and deduplicates raw decisions and resolves tribunal aliases. F1 segments
and filters by stage (monocratic versus collegiate). F2 through F4 run LLM extraction across
three stages on
groq/openai/gpt-oss-120b: facts in stage A, criteria and theses in stage B,
dispositivos and structured outcomes in stage C. F5 canonicalizes (F5.1), embeds with
Legal-BERTimbau (F5.2), and runs the cross-tribunal entity resolution from the previous section
(F5.3). F6 writes the final node-and-edge load into Neo4j Aura with
SCHEMA_VERSION = "v2.3" stamped on every node.
Two design choices keep the pipeline honest. First, every node carries a schema_version, a load_run_label, and timestamps — so any batch can be deleted surgically by run
label if a re-extraction is needed. (And re-extractions happen: the v2.1 → v2.2 upgrade in March
was exactly this — 1,563 STJ decisions reprocessed under run label gpt_oss_120b_full, dropping 8,668 nodes from the older Grok-3-mini batch and replacing them with 26,138 nodes
from the GPT-OSS-120B run, in a single coordinated swap.) Second, every stage transition has a
calibrated cross-stage validator with structured tolerances — check_1 at 15%,
check_2 at 10%, check_4 at 12%, with
blocking_checks as a list rather than a binary pass/fail — so silent data loss between
stages is impossible.
What this document does not cover
- The exact prompt text, few-shot examples, and prompt-versioning history used in the structuring and analysis stages
- Vendor specifics: Groq endpoints, the Neo4j Aura instance, OpenRouter routing
- Current infrastructure layout, deployment topology, and secrets management
- Internal benchmark numbers from the prompt evaluation harness
- The Elasticsearch 8.x migration in progress for the search engine layer
Diego is happy to walk an interested reviewer through any of these in a working session.