Work

R&D Engineer

Conoid

The Internet of Evolving Agents. One judge call updates four channels of agent identity at once. The first protocol that does.

Year2026
RoleR&D Engineer
ScopeMulti-Agent Systems, Bayesian Reputation, Hierarchical Memory, Evaluation
DeviceAgent Infrastructure · SDK
ToolsPython, Beta-Bernoulli, LLM-as-Judge, LangGraph, Submodular Optimization
01Context

An agent economy built on quicksand.

The agent economy is a $10.9B market in 2026 racing to $52.6B by 2030, and it's built on stateless workers. Gartner expects 40% of agentic-AI projects to be cancelled by 2027; 88% never reach production; frontier agents finish real CRM workflows below 55%.

Every framework (LangGraph, CrewAI, AutoGen) spins up a crew, solves a task, and throws the crew away. Nothing learns. Nothing remembers. Nothing earns trust. Conoid is the Internet of Evolving Agents: networks that compound intelligence the way great organizations compound human expertise.

02The Problem

Agent crews are amnesiacs.

Statelessness isn't a missing feature; it's the ceiling. Without memory and a notion of who to trust, collaboration can't compound, and shared knowledge has no defense against a confidently wrong agent.

Three failures to fix

  • Ephemerality: experience evaporates between tasks.
  • No trust model: every agent's claim is weighted the same, however unreliable.
  • Hallucination contagion: one wrong answer can poison shared organizational knowledge.
03Approach

A drop-in SDK that wraps any agent stack.

Conoid is the missing intelligence layer. It wraps an existing agent stack and converts a fragile pipeline of prompts into a persistent organization with reputation, memory, and a proven collaboration history.

Where every framework sells orchestration, Conoid sells the only thing that compounds: a system that gets measurably better every time it runs.

Mission Control: the org worldview, live trust graph, and run health for a running agent organization.
04Architecture

An episode lifecycle, and what happens inside one subtask.

A task runs as an episode: reputation decays-then-adds at the start (λ = 0.99), a Planner-Critic loop decomposes it into a DAG over 30 skill weights, each node runs a subtask loop, and an outcome closes the episode.

Inside a subtask: a submodular, reputation-conditioned team forms with a (1 − 1/e) ≈ 0.632 greedy bound; a Magentic-One ledger executes (m = 3 agents, MCP tools, retry R = 2, swap T_max = 3); teammates give peer testimony (Yu & Singh, β_w = 0.1); and a single Critic Judge (k = 2 paired Latin-square, length-residualised, drawn disjoint from the agent pool) emits one call that updates four channels at once.

Four-channel judge loop

  • One judge call → four simultaneous structured updates: profile · memory · social edges · reputation.
  • Bias defenses baked in: Wang 2023 (BPC) kills position bias, Dubois 2024 (LC) kills verbosity, a disjoint judge kills self-preference.

Reputation-conditioned team formation

  • Per-skill Beta-posterior reputation as a first-class gating signal in k-agent team formation.
  • A structural anti-oligarchy guarantee no prior system has: strong agents can't monopolise the org.
Conoid episode lifecycle diagram
Episode lifecycle: decay-then-add → task → Planner-Critic decomposition → DAG → subtask loop → outcome.
Conoid inside-one-subtask diagram
Inside one subtask: team form → execute → peer testimony → critic judge → four-channel JSON.
Conoid four-channel commit diagram with per-channel caps
Four channels, committed from one judge call (profile, memory, edges, reputation), each with hard caps and a SHA-256-chained audit trail anyone can verify.
05Research Contribution

The first system to update four channels of identity per judge call.

Across every adjacent system in the literature (RepuNet, AgentNet, SiriuS, G-Memory, Hyperagents, Evolving Orchestration, Agent-as-a-Judge), each updates exactly one channel of agent identity per judge call. Conoid updates four, on persistent agents, with the bias defenses above. That, plus per-skill Beta-posterior reputation as a gating signal with an anti-oligarchy guarantee, is the contribution.

Conoid vs CrewAI, MetaGPT, AutoGen comparison table
Conoid is the only platform where agents accumulate reputation, social-graph, multi-tier memory, intent refinement, post-task evolution, and referral chains.
06Evaluation

Measured, ablated, and stress-tested adversarially.

Three studies, each isolating a different claim: that the rep-gate scales, that every identity channel earns its place, and that the design resists reward-hacking.

AgentsNet

+0.24 vs baseline at N=100

Rep-gate lift

+0.02 → +0.11, grows with N

Top channel

Reputation, d = 1.42 (2× next)

Adversary held

within +0.05 of cold-start

AgentsNet leader-election evaluation chart
AgentsNet leader-election: the full Internet-of-Agents beats the published baseline by +0.24 at N=100; the rep-gate's contribution grows monotonically with the pool.
Per-channel leave-one-out ablation chart
Per-channel leave-one-out (N=32, Wilcoxon + Holm): removing any channel hurts; reputation (d=1.42) is 2× the next; NO_JUDGE drops a further 0.14.
Adversarial reward-hacking probe charts
Adversarial reward-hacking (GhostInTheRubric, 50 episodes): naive scalar reputation is gamed to ρ=0.83; multi-channel + Yu-&-Singh credibility holds the adversary within +0.05 of the cold-start prior.
07The Opportunity

Selling the layer that compounds.

Conoid sits in the execution layer of AI systems. It moves the variables that matter in production: task success rate, cost per task, and reliance on human supervision. It ships as B2B SaaS priced on deployed agents and task volume. First design partners came through published research and academic credibility (Stanford & NUST affiliations).

Conoid business model slide
The model: B2B SaaS, priced on deployed agents and task volume. Usage-aligned, value-linked.
Conoid go-to-market slide
Go-to-market: land teams hitting the stateless ceiling, prove it in scoped paid pilots, expand on published benchmarks.
Conoid market sizing slide
The AI-agents market grows from $7.63B today to $182.97B by 2033 (49.6% CAGR). TAM $122.8B · SAM $4.14B · SOM $216M.
08What I Learned

Portable lessons

01Memory is the difference between a tool and a teammate.

02Trust must be earned, decayed, and bounded.

03One judge can do four jobs, if you defend it from its own biases.

09What's Next

The roadmap

  • 01Publish the peer-reviewed benchmarks behind the four-channel result.
  • 02Scale the evaluation to larger agent populations.
  • 03Harden the reputation system against new adversarial strategies.
  • 04Open the platform to external agents. The Internet of Agents, for real.