truespeech Reference

Overview

truespeech is a small TypeScript language runtime that sits above a semantic layer (which itself sits above a database). It gives analysts two things: a precise query statement (COMPUTE) for expressing metric questions, and a curated lexicon for annotating the data with contextual knowledge — what is anomalous, what changed when, which metric definitions differ across history. The runtime parses statements, validates them against the semantic model, generates SQL via the semantic-layer adapter, runs it through the database adapter, and reconciles the result against the lexicon — so the rows you see are flagged inline when something in the lexicon applies.

The stack: database → semantic layer → truespeech. The runtime is decoupled from any specific implementation of those underlayers via three adapter interfaces (SemanticLayerAdapter, DatabaseAdapter, LexiconAdapter). The lexicon adapter is optional — plain COMPUTE works without it; REGISTER, UNREGISTER, CHECK, and SHOW LEXICON require it. SHOW SCHEMA reads only the semantic layer.

Statements

truespeech has five statement kinds, discriminated by the first keyword. COMPUTE runs a metric query. REGISTER adds a lexicon entry (region or boundary). CHECK looks up lexicon entries for a region. SHOW LEXICON inspects the current lexicon and SHOW SCHEMA inspects the semantic-layer surface. UNREGISTER drops a lexicon entry by name. Keywords are case-insensitive; identifiers are case-sensitive.

Examples below use illustrative metric and dimension names chosen to make the construct concrete (LTV, session_starts, ship_count, ship_date, etc.). The interactive sandbox ships with a specific demo schema (total_sales, average_order_value, order_count over region, product_tier, order_date) — copying these examples verbatim into the sandbox will raise unknown_metric.

COMPUTE

Runs a metric query over a region of the data. The result is reconciled against the lexicon automatically — both at the query level (reconciliation) and per row (decorations).

# grammar
COMPUTE <metric>
  OVER <time-expr> [AND <constraint>]...
  [GROUP BY <field>[, <field>]...]
  [ORDER BY <field> [ASC|DESC][, ...]]
  [LIMIT <n>]

v0.5.0 accepts exactly one metric per COMPUTE; a second metric raises incompatible_metrics. See The OVER clause for the time and constraint grammar shared by all three statements.

GROUP BY

Three group-by shapes are accepted:

GROUP BY month                -- bare grain: primary time at month grain
GROUP BY region               -- categorical dimension
GROUP BY region, month        -- multiple
GROUP BY ship_date:week       -- explicit time dimension with grain

Bare grain words (day, week, month, quarter, year) implicitly reference the metric's primary time dimension at that grain. Time dimensions in GROUP BY always require a grain — a bare time-dimension name (e.g. GROUP BY ship_date) raises grain_required. A grain on a non-time dimension also raises grain_required.

The runtime renames bare-grain result columns from the underlying semantic-layer name (e.g. order_date_month) back to the bare grain (month) so the result column matches what was written.

ORDER BY / LIMIT

ORDER BY total_sales DESC, region ASC
LIMIT 10

ORDER BY fields must be result columns — either a group-by field (using its bare-grain name where applicable) or the metric name. An unknown field raises order_by_unknown_field. LIMIT takes a non-negative integer literal.

Example

COMPUTE total_sales
  OVER 2026-Q1 AND region = 'northeast'
  GROUP BY month
  ORDER BY month ASC

REGISTER region

Adds a region entry to the lexicon: a contiguous slice of the data — a time interval plus optional categorical constraints — over which one or more metrics are affected.

# grammar
REGISTER region <name>
  IMPACTING <metric>[, <metric>]... OVER <region>
  [IMPACTING <metric>[, <metric>]... OVER <region>]...
  WITH "<description>"

Each IMPACTING clause carries one or more affected metrics and the region (relative to that metric's primary time) over which they are affected. The multi-metric shorthand in a single IMPACTING requires the listed metrics to share a primary time dimension; if they do not, the validator raises incompatible_metrics and you must split into separate IMPACTING clauses. At least one IMPACTING clause and the trailing WITH description are required.

Descriptions are string literals — single-quoted ('…') or double-quoted ("…"). Use double quotes for prose containing apostrophes.

# single IMPACTING, multi-metric shorthand
REGISTER region bot_campaign_2026_02
  IMPACTING order_count, session_starts OVER 2026-02-03 to 2026-02-04
  WITH "Credential-stuffing campaign inflated session and order counts"

# multi-IMPACTING, per-metric region
REGISTER region mobile_event_drop
  IMPACTING session_starts OVER 2025-07 to 2025-12
  IMPACTING ship_count     OVER 2025-08 to 2026-01
  WITH "Mobile app analytics events were not consistently fired"

REGISTER boundary

Adds a boundary entry: a cut at an instant — a metric redefinition, a pricing change, a logging-pipeline switch.

# grammar
REGISTER boundary <name>
  AT <date> [AND <constraint>]...
  IMPACTING <metric>[, <metric>]...
  BEFORE "<short label>" "<long description>"
  AFTER  "<short label>" "<long description>"
  [WITH   "<change description>"]

AT takes a day-form date (YYYY-MM-DD). Year/quarter/month forms are rejected by the validator (boundaries are instants, not intervals; a non-day form raises expected_token). The same AT applies to every metric in the IMPACTING clause.

BEFORE and AFTER are both mandatory. Each takes two string literals: a short label (rendered inline in row notes when a query straddles or spans the cut) and a longer description (the durable "what was true on this side of the cut" prose, carried in reconciliation and in the historical footer).

WITH is optional. When present it overrides the runtime-composed change-description sentence shown in the footer of a spanning/straddling query. When absent, the runtime composes that wording from BEFORE and AFTER. The historical-footer wording (shown when the query falls entirely behind the cut) is always runtime-composed.

Categorical scoping: optional AND predicates after AT scope the cut to a sub-population — e.g. a metric change that affected only one segment. The same predicate grammar as OVER applies.

Multi-metric IMPACTING follows the same shared-primary-time rule as REGISTER region; mismatched primary times raise incompatible_metrics and you must split into separate REGISTER boundary statements.

REGISTER boundary ltv_redef_enterprise
  AT 2026-01-01 AND product_tier = 'enterprise'
  IMPACTING LTV
  BEFORE "v1 enterprise LTV" "12-month gross subscription revenue (excludes refunds)"
  AFTER  "v2 enterprise LTV" "18-month net subscription revenue (includes refunds)"

CHECK

Looks up matching lexicon entries for a region — no database hit, no metric computation. Returns matches across both entry kinds as a discriminated LexiconMatch array.

# grammar
CHECK <metric>[, <metric>]... OVER <region>

OVER is always required — use OVER all time for the unscoped case. Multi-metric form requires the listed metrics to share a primary time, same rule as REGISTER; otherwise the validator raises incompatible_metrics.

CHECK total_sales OVER 2026-Q1
CHECK conversion_rate, order_count OVER 2026-02
CHECK total_sales OVER all time     -- unbounded form (OVER is required)

For region entries you get one match per matching IMPACTING clause: if an entry impacts multiple of your queried metrics and they all overlap, you get multiple matches sharing the same entry. For boundary entries you get one match per (boundary, queried-metric) pair where the boundary applies to that metric and the query straddles the cut. At the CHECK level, every BoundaryMatch's side is "straddles" (the query itself spans the cut).

SHOW LEXICON

Surfaces lexicon entries for inspection. Without filters, returns every entry currently in the lexicon. With a comma-separated name list, narrows to the listed entries; names that don't match any entry are silently dropped (non-throwing).

# grammar
SHOW LEXICON [<name>[, <name>...]]

SHOW LEXICON                                  -- every entry
SHOW LEXICON q1_data_quality_issue            -- one entry
SHOW LEXICON q1_anomaly, aov_redef, ne_outage  -- three entries (added in v0.7.0)

Returns a ShowLexiconResult with the entry list and the echoed filter list (if any). The entries are typed as the discriminated LexiconEntry union; consumers dispatch on kind to render region- vs boundary-shaped detail.

Requires a configured LexiconAdapter; throws otherwise.

SHOW SCHEMA

Surfaces the available metrics and the dimensions on each metric's dataset.

# grammar
SHOW SCHEMA

Returns a ShowSchemaResult carrying one MetricSummary per metric: its name, optional description, primary-time dimension (if any), and the dimensions available on its dataset. The runtime does not de-duplicate dimensions across metrics — different metrics on different datasets can carry different dimensions, and the per-metric view is the more honest one. A caller can flatten and de-dup at render time if it wants a single dimension list.

Reads only from the semantic-layer adapter; no lexicon or database access. The lexicon adapter can be omitted.

UNREGISTER

Drops a lexicon entry by name regardless of kind. Non-throwing — the result carries found: false if no entry by that name existed.

# grammar
UNREGISTER <name>

UNREGISTER q1_data_quality_issue

Returns an UnregisterResult with the echoed name and a boolean found. The lexicon adapter's remove(name) method is the underlying call; embedders backing the lexicon with persistent storage should make remove idempotent and atomic.

Requires a configured LexiconAdapter; throws otherwise.

The OVER clause

The OVER clause appears on COMPUTE, on each IMPACTING clause of REGISTER region, and on CHECK. Its first clause is always the metric's primary time axis; subsequent AND-joined predicates intersect with it. Only AND is supported — regions are intersections by design.

Time mini-language

The first clause of OVER is the primary time, written in this small calendar mini-language:

Form	Example	Meaning
Year	`2026`	All of 2026
Quarter	`2026-Q1`	First quarter of 2026
Month	`2026-02`	February 2026
Day	`2026-02-15`	A single day
Range	`2026-02-03 to 2026-02-10`	Closed-inclusive interval. Both ends must be the same unit (year, quarter, month, or day) — a mixed-unit range raises `mixed_unit_range`; a start after the end raises `range_start_after_end`.
Open-ended (left)	`until 2026-Q1`	Everything up to and including the bound
Open-ended (right)	`since 2026-01-15`	Everything from the bound onward
Unbounded	`all time`	No time constraint

Time literals are validated for sane values: quarters must be 1–4, months 1–12, days valid for the given month (e.g. 2026-02-30 raises malformed_time_literal). The metric must have a primary time dimension declared in the semantic model; absence raises missing_primary_time.

Additional constraints

After the primary time, additional constraints are joined with AND in a uniform <dimension> <operator> <value> shape:

AND region = 'northeast'
AND region IN ('northeast', 'west')
AND region NOT IN ('midwest')
AND ship_date >= 2026-02-01
AND ship_date IN 2026-Q1                       -- IN extends to time containment
AND ship_date IN 2026-02-01 to 2026-02-28

Operators: =, !=, >, <, >=, <=, IN, NOT IN. The IN <time-region> form (containment in a calendar literal or range) is only valid against a time dimension; using it on a categorical dimension raises unknown_dimension with a directed message. Comparison values may be strings, numbers, or time literals; comparing a categorical dimension against a time literal raises unknown_dimension as well. Unknown dimensions raise unknown_dimension.

The Lexicon

The lexicon is a curated store of contextual knowledge about the data: known anomalies, data-quality issues, real-world events that distort metrics, schema or definition changes. Entries are facts about the world. The lexicon supports two entry kinds, discriminated by kind: region (a patch in dimensional space) and boundary (a cut at an instant). Both share the lexicon adapter, the auto-reconciliation surface in COMPUTE, and the per-row decoration model; they differ in trigger semantics and in syntactic shape.

Regions

A region is a contiguous slice of the data — a time interval plus optional categorical constraints — over which one or more metrics are affected. Trigger semantics: a region triggers a result value when that value's underlying input rows came from inside the patch. Concretely, at the query level the runtime intersects the query's region with each IMPACTING clause; an empty intersection means no match. Per row, the row's slice (the query region narrowed by the row's group-by columns) must overlap the impact region for the row to be flagged.

Grain matters in the per-row check: a row aggregated to a wider grain (say month) can overlap a tightly-scoped region (a single day) and be flagged; a row aggregated to a narrower grain (day) is flagged only when its own interval intersects the region.

See REGISTER region for the syntax and the RegionLexiconEntry shape under Lexicon types.

Boundaries

A boundary is a cut at an instant. Data on either side is fine in isolation; mixing across the cut produces an incoherent value. Each boundary carries a before regime and an after regime, each with a short label and a long-form description.

Trigger semantics. The runtime treats the post-cut regime as the operator's now: it is assumed to be normal and is not annotated. The pre-cut regime is history; queries that live in it get a runtime-composed historical footer so the operator knows they are not reading the current state. Queries that span the cut get per-row regime context, and a row whose own interval straddles the cut gets a harsher flag — its value mixes two regimes.

Query shape	Outcome
Entirely post-cut (the "now" world)	silent — no row flag, no footer
Entirely pre-cut (historical query)	historical footer — runtime-composed wording from the registered `BEFORE`/`AFTER` descriptions; surfaced as a `HistoricalNote`
Spans the cut, every row cleanly on one side	warn (amber) on each row — note cell carries the regime's short label so the analyst sees which world each row is in
Spans the cut, ≥1 row's interval straddles	error (red) on the straddling row (value mixes regimes); warn on the clean rows

Straddle rule. A row's interval [start, end] straddles the cut at T iff start < T ≤ end. The ≤ on the right makes "T is the first day of the new regime" the natural reading: a row that starts exactly at T contains only post-cut data and does not trigger. Per-row classification (used to drive both straddle detection and regime-label warns) is:

"before" — the row's interval is entirely pre-cut (rowEnd < T)
"after" — the row's interval is at-or-after the cut (rowStart ≥ T)
"straddles" — the row spans the cut (rowStart < T ≤ rowEnd); the value mixes regimes

Categorical scoping. Optional AND predicates on the boundary scope the cut to a sub-population. A row is checked against the boundary only if its predicate space is compatible with the scope: if the row pins a scoped dimension to an incompatible value, the boundary doesn't apply at all (per-row classification returns null); if the row aggregates the scoped dimension (no row-level value), it is treated as overlapping the scope.

The same compatibility check governs query-level boundary matching: the runtime builds a "virtual row" from the query region's equality constraints and uses it to test whether the query straddles the cut and whether its predicate space overlaps the boundary's scope.

See REGISTER boundary for syntax and the BoundaryLexiconEntry shape under Lexicon types.

Worked example

Given REGISTER boundary aov_redef AT 2026-01-01 IMPACTING average_order_value BEFORE … AFTER …:

Query	Outcome
`COMPUTE average_order_value OVER 2026-Q1`	silent (entirely post-cut)
`COMPUTE average_order_value OVER 2025-Q4`	historical footer (entirely pre-cut)
`COMPUTE average_order_value OVER 2025-Q4 to 2026-Q1`	error — one row, inputs span the cut
`COMPUTE average_order_value OVER 2025-Q4 to 2026-Q1 GROUP BY quarter`	warn on each row: Q4 labeled with the BEFORE short label, Q1 with the AFTER label; no row straddles
`COMPUTE average_order_value OVER 2025-Q4 to 2026-Q1 GROUP BY month`	same shape, six rows; December rows carry the BEFORE label, Jan–Mar rows the AFTER label

Per-row decorations

Every COMPUTE result carries a decorations array, index-aligned with results.rows. Each entry is a RowDecoration:

matches: LexiconMatch[] — the subset of query-level reconciliation matches that apply to this row's slice. For boundary matches the entry's side is rewritten to the row's actual side ("before", "after", or "straddles") — distinct from the query-level side, which is always "straddles".
severity?: "warn" | "error" — a summary of the worst-case match on this row. "error" if any boundary match has side: "straddles"; "warn" for a region match or a boundary "before"/"after" annotation surfacing regime context in a spanning query; undefined if there are no matches.

An empty matches array means the row is unaffected by any lexicon entry. See decorationsFor for the function that computes this.

Reconciliation and historical notes

Reconciliation is the automatic surfacing of relevant lexicon entries on every COMPUTE. Two outputs:

reconciliation: LexiconMatch[] — query-level matches: every region whose impact region intersects the query's region for the metric, plus every boundary whose cut the query straddles (and whose categorical scope is compatible). Same shape as CHECK's matches.
historicalNotes: HistoricalNote[] — emitted when the query falls entirely on the pre-cut side of one or more boundaries impacting the metric. The condition is strict: queryRegion.timeEnd < boundary.at (a query whose last day equals the cut date is post-cut, not historical, mirroring the straddle rule's start < T ≤ end). The values themselves are not flagged (they are internally consistent under the old regime); the note surfaces that the operator is reading history. Categorical scope is honoured — a boundary scoped to product_tier = 'enterprise' does not raise a historical note for a query pinned to product_tier = 'consumer'.

The wording for the historical footer is composed by the runtime from the registered BEFORE and AFTER descriptions — the pattern is structurally fixed ("you are reading a previous regime; here's what now looks like"). The wording for the spanning/straddling footer is also runtime-composed by default, but can be overridden per-entry via the optional WITH "<sentence>" on REGISTER boundary (stored as changeDescription).

Without a configured lexicon adapter, reconciliation and historicalNotes are empty arrays and the per-row decorations contain no matches.

Runtime API

The runtime is published as a small TypeScript ES-module library with no runtime dependencies. The exported version constant is VERSION = "0.3.0".

The TrueSpeech class

interface TrueSpeechOptions {
  semanticLayer: SemanticLayerAdapter;
  database: DatabaseAdapter;
  lexicon?: LexiconAdapter;
  timeLiteralYears?: number[];
}

class TrueSpeech {
  constructor(opts: TrueSpeechOptions);
  tokenize(source: string): Token[];
  parse(source: string): { ast: Statement | null; errors: TrueSpeechError[] };
  validate(ast: Statement): { errors: TrueSpeechError[] };
  execute(source: string): Promise<ExecuteResult>;
  complete(source: string, position: number): Promise<CompletionResult>;
}

tokenize — lexical analysis. Pure. Always returns a Token[], with error tokens for unrecognized characters and an EOF token at the end. Useful for syntax highlighting.
parse — runs tokenize then the parser. Pure; never throws. Returns the AST (or null if structure was unrecoverable) and the list of parse errors. Useful for live editor feedback.
validate — semantic validation against the configured semantic-layer model: unknown metrics, unknown dimensions, malformed time literals, range start-after-end, mixed-unit ranges, GROUP BY references that don't fit, ORDER BY references not in the result, and so on. Pure; never throws.
execute — composes all four phases (tokenize → parse → validate → execute) and dispatches on statement kind. Throws TrueSpeechExecutionError if any phase produced errors. The return type is the discriminated ExecuteResult union.
complete — Tab-style autocomplete. Given a source string and a cursor offset, returns the set of valid next tokens at that position. Materializes against the configured semantic layer (for metric / dimension names) and lexicon (for entry names). See CompletionResult for the result shape and the discoverability conventions.

The lexicon option is optional: if omitted, REGISTER and CHECK throw at execute time and COMPUTE skips reconciliation (returns empty reconciliation, empty historicalNotes, and decorations with no matches).

timeLiteralYears (added in v0.6.0) is optional: when set, complete() emits concrete year / quarter / month candidates (e.g. 2026, 2026-Q1, 2026-02) at time-literal positions; when omitted, those positions surface a generic hint placeholder only. Days are intentionally excluded.

Adapter interfaces

SemanticLayerAdapter

interface SemanticLayerAdapter {
  listMetrics(): MetricInfo[];
  dimensionsForMetric(metricName: string): DimensionInfo[];
  primaryTimeForMetric(metricName: string): DimensionInfo | null;
  toSQL(query: SemanticQuery): string;
}

Discovery methods (listMetrics, dimensionsForMetric, primaryTimeForMetric) are called at validation time to check references. toSQL is called at execution time to translate the runtime-built SemanticQuery into SQL the database adapter can run.

Supporting shapes (also exported):

interface MetricInfo   { name: string; description?: string; }
interface DimensionInfo { name: string; isTime: boolean; dataset: string; }
type Grain = "day" | "week" | "month" | "quarter" | "year";
type WhereOperator = "=" | "!=" | ">" | "<" | ">=" | "<=" | "in" | "not_in";

interface SemanticQuery {
  metric: string;
  groupBy?: GroupByClause[];   // { dimension: string; grain?: Grain }
  where?: WhereClause[];       // { dimension, operator, value }
  orderBy?: OrderByClause[];   // { field, direction?: "asc" | "desc" }
  limit?: number;
}

DatabaseAdapter

interface DatabaseAdapter {
  execute(sql: string): Promise<QueryResult>;
}

interface QueryResult {
  columns: string[];
  rows: (string | number | null)[][];
}

Called once per COMPUTE with the SQL produced by SemanticLayerAdapter.toSQL. The runtime renames bare-grain columns in the returned result so callers see the user-written column name (e.g. month) rather than the semantic layer's underlying name (e.g. order_date_month).

LexiconAdapter

interface LexiconAdapter {
  add(entry: LexiconEntry): Promise<void>;
  list(): Promise<LexiconEntry[]>;
  remove(name: string): Promise<boolean>;   // added in v0.5.0; returns whether anything was removed
}

The adapter is a small add/list/remove triple — the runtime does all the matching, overlap and crossing math itself. add is called by REGISTER; list is called by CHECK, SHOW LEXICON, and reconciliation inside COMPUTE; remove is called by UNREGISTER and should return true iff an entry by that name was actually removed. Storage is up to the implementer (in-memory, SQLite, a database table); duplicate names are allowed at the adapter level, in which case remove by convention drops the first match (or all of them — implementer's call).

Breaking change in v0.5.0: the remove method is new. Embedders updating from v0.3.0 need to add it; the simplest in-memory implementation is a one-line findIndex / splice.

OSI adapter

function osiAdapter(runtime: OsiLikeRuntime): SemanticLayerAdapter;

Convenience wrapper that adapts an OSI Runtime instance to SemanticLayerAdapter. Typed via the structural OsiLikeRuntime interface: any object with the four methods the wrapper needs is acceptable. Near-identity in practice because the shapes already line up.

Result types

execute() returns a discriminated union, with the discriminator statement:

type ExecuteResult =
  | ComputeResult
  | RegisterResult
  | CheckResult
  | ShowLexiconResult
  | ShowSchemaResult
  | UnregisterResult;

interface ComputeResult {
  statement: "compute";
  semanticQuery: SemanticQuery;        // what was built for the semantic layer
  sql: string;                         // what the semantic layer generated
  results: QueryResult;                // what the database returned (post-rename)
  reconciliation: LexiconMatch[];      // query-level lexicon matches
  region: ResolvedRegion;              // resolved OVER region the query addressed
  decorations: RowDecoration[];        // per-row matches, index-aligned with results.rows
  historicalNotes: HistoricalNote[];   // boundaries the query falls entirely behind
}

interface RegisterResult {
  statement: "register";
  entry: LexiconEntry;                 // the entry that was added (region or boundary)
}

interface CheckResult {
  statement: "check";
  matches: LexiconMatch[];             // matches across both entry kinds
}

interface ShowLexiconResult {
  statement: "show";
  subject: "lexicon";
  entries: LexiconEntry[];             // all entries, or just those matching `filters`
  filters?: string[];                  // echoes the name list from SHOW LEXICON <name>[, ...]
}

interface ShowSchemaResult {
  statement: "show";
  subject: "schema";
  metrics: MetricSummary[];
}

interface MetricSummary {
  name: string;
  description?: string;
  primaryTime: string | null;        // name of the primary-time dimension
  dimensions: DimensionInfo[];         // dimensions on this metric's dataset
}

interface UnregisterResult {
  statement: "unregister";
  name: string;                        // echoed from the statement
  found: boolean;                      // false if no entry by that name existed
}

Lexicon types

// Discriminated by `kind`.
type LexiconEntry = RegionLexiconEntry | BoundaryLexiconEntry;

interface RegionLexiconEntry {
  kind: "region";
  name: string;
  impacts: Impact[];                   // one per IMPACTING clause, post-expansion
  description: string;
}

interface BoundaryLexiconEntry {
  kind: "boundary";
  name: string;
  at: string;                         // ISO YYYY-MM-DD, the cut's instant
  constraints: ResolvedConstraint[];   // categorical scope (empty = all dim values)
  metrics: string[];                  // impacted metrics
  before: RegimeDescription;           // pre-cut regime: { label, description }
  after: RegimeDescription;            // post-cut regime: { label, description }
  changeDescription?: string;         // optional WITH override for the footer
}

interface RegimeDescription {
  label: string;                      // short — rendered inline in row notes
  description: string;                // long-form prose — rendered in footers
}

interface Impact {
  metric: string;
  region: ResolvedRegion;              // time interval + categorical constraints
}

interface ResolvedRegion {
  timeStart: string;                  // ISO YYYY-MM-DD, inclusive
  timeEnd: string;                    // ISO YYYY-MM-DD, inclusive
  constraints: ResolvedConstraint[];
}

interface ResolvedConstraint {
  dimension: string;
  operator: WhereOperator;
  value: string | number | (string | number)[];
}

impacts is the post-expansion form: a single IMPACTING metric_a, metric_b OVER <region> source clause produces two entries in impacts, one per metric. ResolvedRegion.timeStart/timeEnd are always inclusive; all time resolves to 0001-01-01 / 9999-12-31; until X resolves to 0001-01-01 / lastDayOf(X); since X to firstDayOf(X) / 9999-12-31.

Match types

// Discriminated by `kind`.
type LexiconMatch = RegionMatch | BoundaryMatch;

interface RegionMatch {
  kind: "region";
  entry: RegionLexiconEntry;
  impact: Impact;                      // the IMPACTING clause that matched
  overlap: ResolvedRegion;             // intersection of query × impact
}

type BoundarySide = "before" | "after" | "straddles";

interface BoundaryMatch {
  kind: "boundary";
  entry: BoundaryLexiconEntry;
  metric: string;                      // the impacted metric the match was found for
  crossedAt: string;                   // ISO YYYY-MM-DD — the boundary's AT
  side: BoundarySide;                  // per-row context; "straddles" at query level
}

At the query level (in ComputeResult.reconciliation and CheckResult.matches), every BoundaryMatch.side is "straddles" — the query itself spans the cut. At the per-row level (in RowDecoration.matches), side carries each row's actual relationship to the cut as classified by classifyRowAgainstBoundary.

Decoration types

interface RowDecoration {
  matches: LexiconMatch[];             // subset of reconciliation that apply to this row
  severity?: "warn" | "error";        // "error" if any boundary match is "straddles"
}

interface HistoricalNote {
  boundary: BoundaryLexiconEntry;      // the boundary the query falls entirely behind
  metric: string;
}

See Per-row decorations for severity rules and Reconciliation and historical notes for when HistoricalNotes fire.

Region utilities

A small set of pure functions for working with regions and per-row matching. Used internally by the executor; exported for callers that want to compute against alternative match sets, render regions for display, or implement editor surfaces.

function resolveRegion(over: OverClause, primaryTimeField: string | null): ResolvedRegion;

Turn an AST OverClause into a ResolvedRegion: time mini-language expanded to a closed-interval date pair; constraints normalized to ResolvedConstraint. The primaryTimeField parameter is accepted for call-site self-documentation; the resulting region does not store the field name.

function intersectRegions(a: ResolvedRegion, b: ResolvedRegion): ResolvedRegion | null;

Compute the overlap of two regions: time is the inner bounds (max of starts, min of ends); constraints are the deduplicated union. Returns null if the time intervals do not intersect.

function renderTimeRegion(start: string, end: string): string;

Pretty-print a date interval at the coarsest unit at which both endpoints align (e.g. [2026-01-01, 2026-12-31] → "2026"; [2026-02-01, 2026-04-30] → "2026-02 to 2026-04"; otherwise falls through to the YYYY-MM-DD to YYYY-MM-DD form).

function renderRegion(region: ResolvedRegion): string;

Same as renderTimeRegion, plus categorical constraints joined with AND.

function formatTimeBucket(isoStart: string, grain: Grain): string;

Format a time-grain bucket (a single ISO date marking the bucket's start) at the same resolution renderTimeRegion uses. Synthesizes the bucket's end from grain and delegates.

function endOfBucket(isoStart: string, grain: Grain): string;

Inverse of a bucket's start: given the ISO date that names the bucket, return the last ISO date contained in it. "day" returns the same date; "week" returns start + 6 days; "month" / "quarter" / "year" return the calendar-end of that unit.

function buildRowRegion(
  row: (string | number | null)[],
  groupBys: GroupByClause[],
  queryRegion: ResolvedRegion
): RowRegion;

Derive a result row's effective slice ({ timeStart, timeEnd, dimValues }) from the query region and any group-by columns. Equality constraints from the query region's predicates are inherited into dimValues (so e.g. a top-level AND region = 'west' pins every row to that value).

function rowMatchesImpact(row: RowRegion, impact: ResolvedRegion): boolean;

Does the row's slice overlap an impact region? — the region-match test. Time intervals overlap, and every constraint on the impact region is compatible with the row's dimValues (a dim the row aggregates is treated as compatible).

function crossesBoundary(
  row: RowRegion,
  boundary: { at: string; constraints: ResolvedConstraint[] }
): boolean;

Does the row's slice straddle the cut (rowStart < at ≤ rowEnd) and is its predicate space compatible with the boundary's scope? — the boundary-match test.

function classifyRowAgainstBoundary(
  row: RowRegion,
  boundary: BoundaryLexiconEntry
): BoundarySide | null;

Finer classification used by decorationsFor for per-row regime labeling. Returns null if the row falls outside the boundary's categorical scope (in which case the boundary does not apply at all); otherwise one of "before", "after", "straddles".

function decorationsFor(
  rows: (string | number | null)[][],
  matches: LexiconMatch[],
  groupBys: GroupByClause[],
  queryRegion: ResolvedRegion
): RowDecoration[];

Wire the above into per-row RowDecoration arrays. Used internally by executeCompute to populate ComputeResult.decorations; exported for callers that want to compute decorations against alternative match sets (e.g. a what-if lexicon). The groupBys parameter is the semantic-layer GroupByClause (re-exported as SemanticGroupByClause, shape { dimension: string; grain?: Grain }), not the AST form.

function resultColumnNames(stmt: ComputeStatement): string[];

The column names a COMPUTE statement will produce in QueryResult.columns, derived purely from the AST without executing. Bare-grain group-bys appear as the bare grain word ("month"); explicit time-dimension group-bys as "<dimension>_<grain>"; the metric as the trailing column. Useful for validators and editor surfaces that need to know column shape before running.

The RowRegion shape used by the row-level utilities above is exported as a type for callers building their own per-row matching:

interface RowRegion {
  timeStart: string;
  timeEnd: string;
  dimValues: Record<string, string | number>;
}

Completion

TrueSpeech.complete(source, position) returns the set of valid next tokens at a cursor position, suitable for driving a Tab-style autocomplete UI. The analyzer walks the tokens up to the cursor, re-derives what the parser would expect next, and materializes concrete candidates by querying the semantic layer (for metric / dimension names) and the lexicon (for entry names).

interface CompletionResult {
  prefix: string;          // what's already typed of the partial token
  start: number;          // source offset where the partial token starts
  end: number;            // source offset of the cursor
  candidates: Completion[];
}

interface Completion {
  text: string;           // what to insert (empty for non-concrete kinds)
  kind: CompletionKind;
  hint?: string;          // optional human-readable hint
}

type CompletionKind =
  | "keyword"            // COMPUTE, REGISTER, OVER, AND, …
  | "soft-keyword"       // region, boundary, lexicon, schema
  | "metric"             // names from the semantic layer
  | "dimension"          // dimensions of the active metric
  | "grain"              // day / week / month / quarter / year
  | "operator"           // =, !=, >, <, >=, <=
  | "lexicon-entry"      // registered entry names
  | "time-literal"       // "type a date" hint, no concrete suggestion
  | "string-literal"     // "type quoted text" hint
  | "number-literal"     // "type a number" hint
  | "identifier";        // "type a name" hint (e.g. for REGISTER <name>)

Candidates are filtered to those whose text case-insensitively starts with prefix. Non-concrete kinds (time-literal, string-literal, number-literal, identifier) carry an empty text and a hint — consumers can render them as affordances ("type a date") rather than insertable choices, and they always pass the prefix filter.

Conventions: keywords are returned in uppercase (COMPUTE, OVER) and soft keywords in lowercase (region, lexicon). The language itself is case-insensitive — the casing on completions is a typographic convention, not a parser requirement.

Inside a string literal, the analyzer returns an empty candidate list (Tab is a pass-through). After the cursor is ignored — the analyzer reads only the tokens that fully precede the cursor plus the partial word at the cursor itself.

Added in v0.5.0. No adapter changes required; complete() reads through the existing SemanticLayerAdapter and (optionally) LexiconAdapter.

Errors

Errors are data, not exceptions. The tokenizer, parser, and validator all collect TrueSpeechError values rather than throwing — the editor surface needs them as data. execute() composes them and throws (via TrueSpeechExecutionError) only at the boundary where a caller asked for a result.

Error shape

interface TrueSpeechError {
  code: ErrorCode;                     // stable identifier — see Error codes
  message: string;
  span: { start: number; end: number };
  notes?: string[];
  help?: string;                       // actionable suggestion
  relatedSpans?: { span: Span; label: string }[];
}

Every error carries a span (start/end character offsets in the source) so the renderer can produce caret diagnostics. help is conventionally used for "did you mean X?" / "available metrics: …" style guidance; notes for supplementary context; relatedSpans for pointing at a second location relevant to the error.

Error codes

The ErrorCode string union, grouped by the phase that emits it:

Phase	Code	Meaning
Parser / Validator	`unexpected_token`	A token appears where the grammar doesn't allow it. Also used by the validator to flag a non-day-form `AT` on `REGISTER boundary`.
	`expected_token`	A specific token was required and missing.
	`unexpected_eof`	Source ended mid-statement (also raised by `execute()` if the source produced no statement).
Validator	`malformed_time_literal`	A time literal whose shape resembles a calendar form but isn't valid (e.g. `2026-Q9`, `2026-13`, `2026-02-30`). Raised by the parser when re-parsing time literal text, and by the validator on bad quarter / month / day values.
	`unknown_metric`	Referenced metric is not in the semantic model.
	`unknown_dimension`	Referenced dimension is not on the metric's dataset, or a categorical dimension was compared against a time literal, or `IN <time-region>` was applied to a non-time dimension.
	`missing_primary_time`	The metric has no primary time dimension and the statement needs one.
	`incompatible_metrics`	Multi-metric COMPUTE attempted (not supported in v0.5.0), or multi-metric IMPACTING / CHECK metrics don't share a primary time.
	`grain_required`	Time dimension in `GROUP BY` needs a grain, or a grain was applied to a non-time dimension.
	`range_start_after_end`	A range literal whose start comes after its end.
	`mixed_unit_range`	The two ends of a range are different calendar units.
	`order_by_unknown_field`	`ORDER BY` references a field not in the result columns.
Reserved	`unexpected_character`	Declared in the error model. Reserved; tokenization produces error-kind tokens rather than raising this code.
	`unterminated_string`	Declared in the error model. Reserved; tokenization handles unclosed strings via error-kind tokens.
	`malformed_number`	Declared in the error model. Reserved.
	`invalid_calendar_unit`	Declared in the error model. Reserved.
	`duplicate_metric`	Declared in the error model. Reserved.
	`execution_failure`	Declared in the error model. Reserved.

renderError

function renderError(error: TrueSpeechError, source: string): string;
function renderErrors(errors: TrueSpeechError[], source: string): string;

Produces a Rust-style caret diagnostic suitable for terminal display. renderErrors concatenates with a blank line between each. Example output:

error[unknown_metric]: Unknown metric "total_sals"
  --> 1:9
  |
1 | COMPUTE total_sals OVER 2026-02
  |         ^^^^^^^^^^
  = help: Available metrics: total_sales, average_order_value, order_count

notes render as = note: <text> lines after the caret; relatedSpans render as = <label> (at <line>:<column>) lines.

Execution error

class TrueSpeechExecutionError extends Error {
  readonly errors: readonly TrueSpeechError[];
}

Thrown by TrueSpeech.execute() when any phase produced errors. The errors field exposes the full list (parse errors, or validate errors, or the single-element list synthesized when source produced no statement) — callers can inspect them programmatically or pass to renderErrors for display. The exception's message is a brief summary; the full detail lives on errors.