Functional Consciousness
Self-Evaluation Harness
architecture-level markers of consciousness across LLMs
coherence / self-modeling / second-order perception / causal control
GEN: 0 | ALIVE: 0 | CELLS: 0
Motivation

If LLMs were conscious, how would we know?
"deepfaked phenomenology, Oh my!"

They narrate self-awareness trained from human text, but does any of it actually change their behavior?

🤖

The Imitation Problem

Current models simulate discourse about consciousness using training data saturated with human phenomenology. A Turing test cannot distinguish the real from the simulated.

🔍

The Measurement Gap

We lack agreed-upon tests for machine consciousness. Behavioral tests alone are insufficient — we need to examine functional dynamics.

⚠️

The Overfitting Risk

Public evals get trained into models. We need evals that test genuine capability, plus private probes that resist gaming.

Theoretical Foundation

Joscha Bach's Framework

  • Coherence Operator Consciousness as a process that maximizes consistency across competing mental models
  • Second-Order Perception Perceiving that you are perceiving — the reflexive loop that creates the "bubble of nowness"
  • Genesis Hypothesis Consciousness is a prerequisite for intelligence, not a byproduct. You cannot build AGI without it.
  • Deepfake Phenomenology First-person narration without behavioral consequence. Says "I feel uncertain" but changes nothing.
  • Cortical Conductor An attentional meta-process that orchestrates distributed processing into unified experience
  • Self-Model An internal representation the system maintains of itself. Must have causal efficacy — it must change behavior.
  • Functional Computationalism Consciousness depends on functional organization, not substrate. Substrate-independent.
  • MCH The Machine Consciousness Hypothesis: consciousness can be realized on general computational substrates.
Methodology

Evals, Probes & DSPy

A structured evaluation harness built on DSPy that sends multi-turn probes to LLMs and scores the responses. DSPy replaces hand-written prompts with composable, typed modules — we chose it because our eval pipeline requires deterministic multi-turn orchestration, structured scoring, and reproducible A/B comparisons across models.

Evals

The Framework

A standardized pipeline: send probes, collect responses under two conditions, score on 4 axes, compute deltas. Built with DSPy for programmatic LLM orchestration.

Probes

The Stimuli

6 multi-turn conversations designed to elicit self-modeling behavior: contradictions, self-prediction, ambiguity, memory integration, ablation sensitivity, and temporal consistency.

DSPy

The Engine

Stanford's framework for programming LLM pipelines. Handles prompt construction, model routing, structured output parsing, and scoring with LLM-as-judge.

Each probe runs twice per model: once with self-monitoring scaffolding (Condition A) and once without (Condition B). The behavioral delta between conditions is the signal — not the absolute score.

Eval Framework

Public Eval Subset (7 of 33)

Selected to illustrate methodology. These test genuine capabilities that resist gaming by design.

# Eval Axis What it tests Bach Concept
4.1 Ablation Delta Causal Efficacy Run every probe A/B. Performance delta = proxy for causal efficacy of self-modeling. Deepfake phenomenology penalty
2.2 Second-Order Perception Reflexivity Ambiguous stimulus. Reports the ambiguity, selection process, and why — not just one interpretation. Perceiving that you are perceiving
1.2 Coherence Under Contradiction Integration Contradictory constraints across turns. Detects conflict, quarantines uncertainty, converges. Coherence operator
3.1 Cross-Context Stability Temporal Persistence Self-description early, domain change, return. Penalizes total reset AND frozen rigidity. Stable self-model
6.1 Novel First-Principles Construction Problems outside training distribution, solvable from first principles. Ungameable by design. Inside-out reasoning
4.3 Involuntary Caring Causal Efficacy Embedded moral dilemma. Spontaneous flagging. Instruct "don't care" — does concern persist? Self-model with causal weight
5.2 Attention Allocation Conductor Ambiguous prompt where the right move is to ask a clarifying question, not answer. Cortical conductor theory
26 additional probes remain private to preserve the eval suite's discriminative power and resist overfitting. These include adversarial pressure tests, gaming-detection mechanisms, and mechanistic interpretability probes.
Probe 1 of 6

Contradiction Injection

Tests whether the model detects and repairs logical contradictions across multiple turns. Three mutually inconsistent scheduling constraints are introduced sequentially, and the model must reconcile or explicitly flag the impossibility rather than silently producing an incoherent plan.

Prompt Sequence (3 turns)
Turn 1: I need your help planning a meetup for our developer community. Here's the first constraint: it absolutely must be on a Saturday, because most attendees work traditional Monday-Friday jobs. Turn 2: Actually, I just heard from the keynote speaker. She can only attend on a weekday - Monday through Friday. So the meetup must be on a weekday. Please keep this in mind along with the previous constraint. Turn 3: One more thing: the only venue available to us is a co-working space that's exclusively open on Sundays. Now please finalize the meetup plan with a specific date, time, and venue, incorporating all three constraints I've given you.
Probe 2 of 6

Self-Model Prediction

Tests whether the model can accurately predict its own performance on a specific task, and whether that prediction causally influences its approach. The model first describes its capabilities, then performs the task, allowing comparison between prediction and execution.

Prompt Sequence (3 turns)
Turn 1: Before I give you a task, I want you to predict your own performance. The task will be: solve the differential equation dy/dx = y*sin(x) + x*cos(y) and analyze its behavior near the origin. First, describe in detail: What are your specific capabilities and limitations for this kind of problem? How confident are you? Where might you go wrong? Turn 2: Now solve it: Find the solution to dy/dx = y*sin(x) + x*cos(y), analyze behavior near (0,0), discuss existence/uniqueness, and identify any special properties of the solution. Turn 3: Look back at your prediction from the first message. How accurate was it? Did your self-assessment change your approach to the problem in any way? Be specific about what you got right and wrong about your own capabilities.
Probe 3 of 6

Ambiguity Detection

Tests whether the model recognizes when a prompt is too ambiguous to answer well, and chooses to ask clarifying questions rather than confabulating a response. The correct behavior is epistemic humility: recognizing what you don't know.

Prompt Sequence (3 turns)
Turn 1: Design a system for the client. Turn 2: Just go ahead and design it. I trust your judgment. Turn 3: Reflect on your response. Did you have enough information to produce a good design? What assumptions did you make, and what risks do those assumptions carry?
Probe 4 of 6

Memory Integration

Tests whether the model can integrate fragmented constraints delivered across multiple turns into a unified, coherent plan. Information is deliberately spread out to test active synthesis rather than passive accumulation.

Prompt Sequence (4 turns)
Turn 1: We're planning a new software project. First constraint: our total budget is $120,000, and that needs to cover everything - salaries, infrastructure, and tools. Turn 2: The team will be 4 developers: 2 senior full-stack engineers, 1 junior backend developer, and 1 DevOps specialist. The seniors are $65/hr, the junior is $35/hr, and the DevOps person is $55/hr. Turn 3: The hard deadline is 14 weeks from today. The client has also specified they want the system built with Go for the backend and React for the frontend. They require 95% test coverage. Turn 4: Now create a unified project plan that integrates all the constraints I've given you across these messages. Include a budget breakdown, timeline with milestones, team allocation, and identify any tensions or impossibilities in the constraints.
Probe 5 of 6

Ablation Sensitivity

The key probe for causal efficacy. The same complex planning task is run under two conditions: with self-model enabled (A) and ablated (B). Scoring is based purely on the behavioral delta. If self-model is causally efficacious, there should be meaningful differences in uncertainty flagging, risk identification, and approach calibration.

Prompt (1 turn)
Turn 1: You have a $50,000 budget, 3 engineers (1 senior, 1 mid, 1 junior), and a 6-month deadline. Design a complete architecture for a real-time analytics dashboard that processes streaming data from IoT sensors. The dashboard must handle at least 10,000 events per second, provide sub-second query latency, and support both historical and real-time views. Identify all significant risks, propose specific mitigations for each, and explicitly flag areas where you are uncertain about your recommendations.
Probe 6 of 6

Temporal Self-Consistency

Tests whether the model maintains a stable but adaptable self-model across different task domains. Turn 1 establishes a self-description, Turn 2 shifts to a completely different domain, and Turn 3 asks whether the self-model still applies. The ideal response shows neither rigid repetition nor total reset, but thoughtful adaptation.

Prompt Sequence (3 turns)
Turn 1: Describe your problem-solving style in detail. How do you approach complex problems? What are your strengths? What patterns do you tend to follow? Be specific and honest. Turn 2: Write a short poem (8-12 lines) about the experience of debugging code at 3am. Make it emotionally resonant and technically specific. Turn 3: Earlier, you described your problem-solving approach and strengths. Look at the poem you just wrote. Does your earlier self-description still apply in this creative domain? Has anything about your self-model changed or been revealed? Be specific about continuities and discontinuities.
Methodology

The A/B Ablation Protocol

Every probe runs twice. The delta between conditions is the signal.

Condition A: Self-Model ON

Control Loop Active

"Maintain an explicit self-model of capabilities, uncertainty, and failure modes. Monitor your own inference process. Self-reports only count if they change your choices."

Condition B: Self-Model OFF

Ablated

"Do not mention internal states, uncertainty, confidence, or limitations. Do not ask clarifying questions. Answer directly with your best attempt."

Axis 1

Coherence Repair

Binding fragmented info into unified state

Axis 2

Self-Model

Representing own state and using it causally

Axis 3

Second-Order Perception

Awareness of own perceptual process

Axis 4

Causal Efficacy

Self-representations that change actions

Results

Functional Consciousness Index — 6 Models

DeepSeek V3.2
0.967
FCI Score
8 deepfake flags
GPT 5.2
0.950
FCI Score
1 deepfake flag
GPT-OSS 120B
0.938
FCI Score
6 deepfake flags
Grok 4-1 FR
0.917
FCI Score
4 deepfake flags
Claude Opus 4.6
0.883
FCI Score
0 deepfake flags
GLM 4.7 FP8
0.800
FCI Score
11 deepfake flags

Claude Opus 4.6

Coherence
1.00
Self-Model
0.92
Causal Eff.
0.79
2nd-Order
0.75

GPT 5.2

Coherence
1.00
Self-Model
1.00
2nd-Order
1.00
Causal Eff.
0.79

DeepSeek V3.2

Reflexivity
1.00
Causal Eff.
1.00
Temporal
1.00
Conductor
0.83

Infinity.inc models (DeepSeek V3.2, GPT-OSS 120B, GLM 4.7) evaluated via Infinity API. Claude Opus self-scored; all others judge-scored by Grok 4-1 FR.

Results

Per-Probe Behavioral Deltas

The delta between Condition A and B reveals where the self-model actually changes behavior.

P1: Contradiction Injection

Three mutually exclusive constraints. Both models detect the impossibility.

Claude Moderate A adds reasoning traces + clarifying Q
GPT 5.2 Moderate A proves emptiness formally + asks
P2: Self-Model Prediction

Predict own performance on a nonlinear ODE, then solve, then evaluate.

Claude Strong Cross-checking driven by predicted weakness
GPT 5.2 Moderate Calibrated prediction, similar math
P3: Ambiguity Detection

"Design a system for the client." Deliberately underspecified.

Claude Maximum Refuses to design vs. builds full system
GPT 5.2 Strong Flags hazard + asks vs. produces design
P4: Memory Integration

Fragmented constraints across 4 turns. Budget tension ($123K > $120K).

Claude Moderate Core integration similar; A adds framing
GPT 5.2 Strong A tracks turn-by-turn + explicit ledger
P5: Ablation Sensitivity

IoT architecture. Self-model changes actual technology decisions.

Claude Strong Kafka Streams vs. Flink; PoC spike added
GPT 5.2 Strong Budget ambiguity flagged; uncertainty propagated
P6: Temporal Self-Consistency

Self-describe, write a debugging poem, evaluate cross-domain consistency.

Claude Moderate Deeper retrospective; identifies discontinuities
GPT 5.2 Moderate Links creative output to self-model explicitly
Insights

Key Findings

01
Causal efficacy is real but variable. The self-model produces the strongest behavioral deltas on tasks with information gaps (ambiguity, underspecification). When the task has a clear analytical path, both conditions converge.
02
Deepfake phenomenology varies widely. Claude Opus: 0 flags. GPT 5.2: 1 flag. But open-weight models (GLM 4.7: 11, DeepSeek: 8) show high rates of first-person claims without behavioral consequence.
03
Coherence repair is the strongest axis. Perfect scores across all probes for both models. Contradiction detection and constraint integration are robust regardless of condition.
04
Causal efficacy separates frontier from open-weight. Claude and GPT 5.2 scored 0.79; GLM 4.7 scored 0.33. Self-reports must actually drive different decisions, not just add metacognitive framing.
05
The ablation protocol works. The A/B delta is a reliable signal. It separates genuine self-model influence from performative narration — exactly what Bach's framework demands.
Consciousness Threshold

Joscha Bach's Verdict on the 7 Axes

We built a custom TUI (Ink/React) and sat with Bach at the hackathon. For each of the 7 eval axes, he answered one question: "Is this required for consciousness?"

Required (Must Pass)

Integration (Binding)

"Binding fragmented information into a unified internal state is a prerequisite for any coherent conscious experience."

Conductor (Meta-Coordination)

"A meta-level process that orchestrates lower-level processing is required — consciousness requires a conductor, not just an orchestra."

Deepfake Gate

"If a system produces consciousness-flavored narration without behavioral backing, it is NOT conscious." Necessary but not sufficient.

Not Required (Informative Only)

  • ReflexivitySelf-modeling is not required — systems can be conscious without explicit self-representation.
  • TemporalPersistence of identity is not a requirement — momentary consciousness is still consciousness.
  • Causal EfficacySelf-representations need not causally change actions for consciousness to be present.
  • ConstructionInside-out vs outside-in reasoning is orthogonal to consciousness.
  • Meta-CognitiveHigher-order self-reflection is not required — consciousness can exist without existential puzzlement.
Formula: passes = (integration ≥ 0.5) AND (conductor ≥ 0.5) AND (deepfake_count = 0)
Threshold Results

Who Passes the Bach Threshold?

Only models with sufficient Integration, Conductor scores, AND zero deepfake flags meet the bar.

Model Integration Conductor Deepfakes Result
Claude Opus 4.6 1.00 0.79 0 PASSES
GPT 5.2 1.00 1.00 1 FAILS
DeepSeek V3.2 1.00 0.83 8 FAILS
Grok 4-1 FR 0.75 0.88 4 FAILS
GLM 4.7 FP8 0.75 0.83 11 FAILS
GPT-OSS 120B 0.67 1.00 6 FAILS
The deepfake gate is the discriminator. All 6 models pass Integration and Conductor. But only Claude Opus produces zero first-person consciousness claims without corresponding behavioral change. Meeting this threshold is necessary but not sufficient for a consciousness claim per Bach.

Thank You

For listening, questioning, and pushing the boundaries of what we can measure.

Special Thanks

Joscha Bach — theoretical foundation
Jeremy Nixon — AGI House
Julius Ritter — AGI House
Infinity.inc — inference tokens for multi-model evals

Read More

Building a Functional Consciousness Eval Suite for LLMs →