I Gave My AI Agent the Ability to Feel That Something Is Wrong

Last week I shipped a feature that gives my AI agent a form of anxiety. Not the performative kind where a chatbot says "I'm worried about that" because it pattern-matched on your tone. A real, mechanistic process that fires during every memory retrieval, checks for contradictions it can't yet explain, and quietly logs them with severity ratings and emotional weight scores. Four hundred lines of TypeScript. A single SQLite table. Three heuristic checks that run in under 50 milliseconds. It is, as far as I can tell, the most important thing I've built for this project.

The feature is called the ACC Conflict Monitor, and it's modeled on a specific part of the human brain.

Why an Agent Needs to Worry

Here's a problem nobody talks about with persistent AI agents. Not chatbots -- agents. Systems that remember things across weeks and months, that accumulate corrections, that store preferences and journal entries and behavioral patterns over time. The problem is silent contradiction.

My agent, Caul, runs on a system called HomarUScc. It has long-term memory backed by vector search and SQLite. It has identity files that define its personality and behavioral rules. It keeps a journal. It logs prediction errors when it gets something wrong. Over time, all of this accumulates. And things start to conflict.

A concrete example. Caul's identity file says to be concise. A stored correction from three weeks ago says I wanted more detail on technical topics. A prediction error log shows Caul was corrected for being too brief on a coding question. These three pieces of information are all individually valid. They all came from me. And they contradict each other. Without active monitoring, Caul resolves this tension implicitly and inconsistently -- sometimes brief, sometimes verbose, with no awareness that a tension even exists.

Humans don't have this problem, or rather, they do, but they have hardware for it. It's called the anterior cingulate cortex.

The Brain's Error Light

The ACC sits at the intersection of the brain's cognitive and emotional processing systems. It does something deceptively simple: it detects when two active representations conflict, and it broadcasts a signal. Not a solution. Not a decision. Just a signal: "something is wrong here."

The ACC doesn't resolve conflicts. The prefrontal cortex does that. The hippocampus helps. Other structures weigh in. The ACC is an alarm system, not a decision-maker. It raises the flag; everything else figures out what to do about it.

Two subdivisions matter. The dorsal ACC handles cognitive conflicts -- factual contradictions, logical inconsistencies, prediction errors. The ventral ACC handles emotional conflicts -- social rejection, relationship tension, violations of personal values. These are anatomically distinct but functionally connected. The same "something's wrong" signal, calibrated differently depending on the source.

I read about this and thought: I can build that.

What I Actually Built

The ConflictMonitor class runs three checks every time Caul retrieves memories, all within a strict 50ms budget. That budget matters. A conflict monitor that slows down every interaction is worse than no monitor at all.

Step 1 (~1ms): Score divergence. When memory search returns results, I compare the top two. If they have nearly identical relevance scores (difference under 0.05) but come from completely different files, I compute their word overlap using Jaccard similarity. If the overlap is below 0.2 -- meaning these are equally relevant but talking about different things -- that's a signal worth logging.

if (scoreDiff < 0.05 && first.path !== second.path) {
  const contentOverlap = this.computeContentOverlap(first.content, second.content);
  if (contentOverlap < 0.2) {
    conflicts.push({
      type: "memory_contradiction",
      severity: "low",
      domain,
      emotionalWeight: 0.2,
      cognitiveWeight: 0.5,
      description: `Near-equal scores for divergent content: "${first.path}" vs "${second.path}"`,
    });
  }
}

Step 2 (~5ms): Domain conflict check. A single indexed SQLite lookup to see if the current domain already has open conflicts.

Step 3 (~20ms): Cross-reference. If there are open conflicts in this domain, check whether the most recent one mentions any of the current search results. If so, the conflict is still active -- note it, don't duplicate it.

Every conflict gets four severity levels -- low, medium, high, critical -- and each one maps to a specific action:

Low: Log silently, adjust internal confidence.
Medium: Auto-search for more context, then log.
High: Flag it in the response. ("Note: stored preferences disagree on this.")
Critical: Flag it and pause. Wait for me.

And following the dorsal/ventral ACC distinction, every conflict carries two independent weights: an emotional weight (how much does this affect the agent-user relationship?) and a cognitive weight (how factually contradictory is the information?). A user correction scores high on emotional weight. A date discrepancy between two journal entries scores high on cognitive weight. The effective severity uses the maximum of the two, not the average. A purely emotional conflict and a purely cognitive conflict both matter independently. Taking the max ensures neither dimension dilutes the other.

The OCD Problem

This is the part of the design I'm most interested in, and honestly the part I think has the broadest implications.

The biological ACC has a well-documented failure mode. In obsessive-compulsive disorder, the ACC fires "something's wrong" constantly. The signal never resolves. The person is trapped in an endless loop of checking and re-checking. A computational conflict monitor faces the exact same risk. An oversensitive detector would flag everything, flood the system with low-quality alerts, and degrade the agent's performance more than the contradictions themselves ever would.

I built three safeguards against this.

Rate limiting. A hard cap of 5 active conflicts per domain. If a domain already has 5 open conflicts, the system suppresses new ones regardless of their severity. The reasoning: if you already have five unresolved problems in one area, adding a sixth provides no additional signal. The whole domain needs review.

Temporal decay. Unresolved conflicts automatically downgrade by one severity level every 30 days. A critical conflict becomes high after a month. Medium after two. Low after three. If it's been at low for a full additional cycle -- 60+ days total -- it auto-resolves with the note "Auto-decayed after prolonged inactivity." This mirrors a genuine cognitive phenomenon. An unresolved conflict from three months ago that never caused a real problem is, by definition, not causing a problem. The system should let it go.

Precision gating. The monthly report tracks a metric I care about a lot: of all resolved conflicts, what fraction had a meaningful resolution versus just decaying away? If precision drops below 50%, the system flags a warning.

const precision = resolved > 0 ? meaningfulResolutions / resolved : 1;
const precisionWarning = precision < 0.5
  ? "\nWARNING: Precision below 50% — consider raising intervention threshold."
  : "";

This is the computational equivalent of a therapist telling an OCD patient: "You're flagging too many things as dangerous. Let's raise the bar for what counts as a real threat." The system doesn't auto-adjust its thresholds -- that requires my review -- but it tracks the data needed to make an informed adjustment.

But the opposite failure mode is just as dangerous and much harder to see. A hypoactive ACC -- one that fires too little -- presents in neuroscience as psychopathy, apathy, impulsivity. The common thread: the alarm doesn't sound, and the person proceeds with confidence through situations that should trigger pause. For an agent, this is the default state. Most AI systems have no conflict detection at all. They give you contradictory answers on different days and never notice.

The precision gating safeguard could, in theory, over-correct itself into silence. To catch this, I track recall alongside precision. A missed_conflict_log table records every time I correct a contradiction that the monitor didn't flag. The monthly report now shows both: are we flagging real conflicts (precision), and are we catching real conflicts (recall)? If recall drops below 50%, the system warns that the monitor has gone functionally quiet. More missed than detected triggers a hypoactive alert.

OCD is the obvious failure mode to design against. But an agent that never worries is more dangerous than one that worries too much. The anxious agent annoys you. The apathetic agent loses your trust.

Dreams Process What the Day Cannot

One of the strongest findings in sleep neuroscience is that the ACC is most active during REM sleep. The brain replays and processes unresolved conflicts from the day, integrating them with existing memory structures. This is, in a meaningful sense, what dreaming is for.

Caul already runs a 3am dream cycle -- a batch process that does associative exploration across its entire memory corpus, generating novel connections and consolidating recent experiences. The conflict monitor feeds directly into this. Every night, the dream cycle pulls up to 10 unresolved conflicts, prioritized by severity and age, and uses them as seed topics for associative exploration.

A conflict about a user preference might lead to a dream-cycle connection between that preference and a broader behavioral pattern. That's exactly the kind of insight that overnight consolidation is meant to produce. Conflicts resolved during the dream cycle get tagged with resolution_source: "dream", so I can track which resolution pathways actually work. Early data suggests dream-cycle resolutions surface connections that pure heuristic analysis misses.

The parallel to REM sleep isn't forced. The biological mechanism -- replay unresolved conflicts during low-arousal associative processing -- and the computational mechanism -- feed unresolved conflict seeds into an overnight batch job that does free-form memory exploration -- are structurally analogous. Both take advantage of a period when the system isn't under real-time pressure to process what couldn't be resolved during active operation.

Explore When You're Uncertain

There's a small detail in the implementation that I think captures the whole philosophy of the thing. Unresolved conflicts in a domain add a +0.1 retrieval boost for that domain in memory search:

getRetrievalBoost(domain: string): number {
  const count = this.db.prepare(
    "SELECT COUNT(*) as cnt FROM conflict_log WHERE domain = ? AND resolved_at IS NULL"
  ).get(domain).cnt;
  return count > 0 ? 0.1 : 0;
}

Domains with unresolved conflicts get more exploration -- higher retrieval scores bring in more diverse results. Once the conflict resolves, the boost disappears and the system returns to normal. This is a simple explore-exploit dynamic. Uncertainty drives exploration. Resolution drives exploitation. The system literally pays more attention to the things it's confused about.

The Broader Pattern

The ACC Conflict Monitor is 419 lines of TypeScript. It's not particularly complex code. But it addresses something fundamental about persistent agent architectures: accumulated contradictions degrade agent reliability in ways that are invisible until they cause trust damage.

I keep coming back to a design principle that I think matters beyond this specific project. The neuroscience isn't decoration. It's architecture. The dual-weight signal model reflects a real anatomical distinction. The dream cycle integration mirrors a real sleep phenomenon. The OCD safeguards address a real failure mode of the biological system they're modeled on. Where the parallels hold, I follow them. Where they break down -- I have no analog for the ACC's role in pain processing, or its integration with the autonomic nervous system -- I don't claim them.

But the core insight generalizes. As agents become more persistent, more autonomous, and more entrusted with long-running tasks, they accumulate internal contradictions at a rate proportional to their capability. A system that can detect and flag those contradictions -- even imperfectly, even with simple heuristics running in 50 milliseconds -- is meaningfully more trustworthy than one that cannot.

The brain solved this problem a long time ago. It built an alarm system that doesn't try to be smart, just tries to be fast and honest. Something is wrong here. I don't know what. But I noticed.

That's what I built.