Work

Product Engineer

Chatly Make

One prompt becomes a deployed, committed, thumbnailed app. I cut the round trip ~4× by replacing the agent loop with a pipeline.

The Chatly Make builder: the agent's steps on the left, the spec it filled in the middle, a live preview of the generated site on the right
Year2026
RoleProduct Engineer
ScopeAgentic Codegen, Full-Stack Generation, Build Pipeline, Modal Sandboxes
DeviceWeb · Chatly (Vyro.ai)
ToolsTypeScript, Modal, LLM Orchestration, Pipelines, GitHub, Deploy
System Architecturecharon-make: a stateless agent that turns a prompt into a committed, deployed app. The loop stores nothing: each turn is a pure function from payload to an SSE stream.
01Client & Edge

Browser · Frontend

Next.js
Streams tokens, files, and previews over SSE / WebSocket.

nginx

:443
Routes /api/langgraph → engine, /api → gateway, static → frontend.

Engine · FastAPI

:2024
LangGraph-compatible run surface. Streams the agent loop as Server-Sent Events.

Gateway · FastAPI

:8000
Auth, threads, projects, and connector OAuth.
02Agent Loop
run_agent_loop()pure function · payload → SSEstateless

Turn setup · once

Warm sandbox

cache 600s
Acquire a Modal sandbox; reuse the warm one within the TTL.

Concurrent Haiku

Title + skill detection fire in parallel at setup.

Cacheable prompt

~80% off
Byte-exact system blocks hit Anthropic's prompt cache on later turns.

Reason ⇄ act · while iterating

Claude Sonnet 4.6 stream

3 retries
Reasons over tools + full history; streams tokens and tool calls.

Tool dispatch

read · edit (lazy-diff) · run · glob · supabase · connector · github_push · subagents.
per tool·pre-hook (security) sandbox exec post-hook (tsc --noEmit) secret redaction SSE

On finish

Git auto-commit

Haiku writes the commit message; bash commits the diff.

Thumbnail

Playwright screenshots the live preview → Cloudflare R2.

Done

usage
Emits token usage; the loop exits.
03Execution

Modal Sandbox · remote

tunnels :3000 · :8080
/mnt/user-data ↔ Cloudflare R2 (synced mount)/workspace: local NVMe (node_modules, .next)/git: repo, never synced
04Integrations

Anthropic

Sonnet 4.6 (reasoning + codegen) · Haiku (titles, skills, commits).

Composio

Generic connectors: Gmail, Slack, GitHub, Supabase…

GitHub

Bespoke push; auto-creates the repo.

Cloudflare R2

Object storage + preview thumbnails.

Postgres

Thread metadata + artifact registry.

Supabase

Per-thread project: DB, auth, storage.
Security · 5 layersprompt rules · pre-hooks · output redaction · injection wrapping · SSE contractStatelessfull history per turn · Haiku compaction
01Context

What Chatly Make is, and where it sits

I'm a Product Engineer at Chatly (Vyro.ai). Chatly Make is the part of Chatly's Omni-Agent that turns a single prompt into a running app: it generates the code, commits it to GitHub, deploys the app, makes a preview thumbnail, and wires up connectors. Same spirit as Lovable or bolt.new. Prompt in, deployed app out.

To be clear about scope: I designed and built Make, the app-builder slice, not all of Omni-Agent. Internally it's charon-make. The first DeerFlow-based attempt was charon-df-agent-v1 (the “df” is DeerFlow).

A full site (hero, projects, about, footer) generated and deployed by Chatly Make
What one prompt produces: a complete site (design system, sections, copy) generated, committed, and deployed in a single run.
02The Problem

Shipping an app is not writing a report

The first builds were slow: ~45 minutes end to end for a full-stack app, ~20 minutes for frontend-only. That clock covers everything: code generation, GitHub commits, deploy, and thumbnail. For a product meant to feel like “prompt to live app,” 45 minutes isn't a wait. It's an abandon.

Where the time went: engineering reasoning, not instrumented attribution

  • Serialized plan, act, observe loop: one model round trip per micro-step, so wall-clock grows roughly linearly with step count. An app touching dozens of files pays that tax dozens of times.
  • Cold environments: install and build ran in ad-hoc per-run containers, re-paying the base-image pull and a cold npm/pnpm install every single time.
  • Sequential everything: independent files generated one at a time; commit, deploy, and headless-browser thumbnailing all ran back-to-back at the end, latencies stacking instead of overlapping.
  • Full-context regeneration: the growing transcript got re-sent each step, so per-step latency climbed as the model re-read code it had already written.
03Approach

DeerFlow first, then Hermes. Both the wrong shape

I built Make on ByteDance's DeerFlow first. DeerFlow is a deep-research framework: a Coordinator, Planner, Researcher, Coder, Reporter graph tuned for search → read → synthesize → write, with a human-in-the-loop plan-approval gate before execution. Its “Coder” runs Python inside a research flow. It isn't there to scaffold, build, and deploy a real codebase, and the deliverable is a document, not an app. So even simple builds inherited planning overhead, search machinery, and an approval stall that does nothing for autonomous codegen.

Then I tried Hermes: NousResearch's agent framework, the self-improving multi-backend harness, not their LLM models. Better on paper: parallel subagents, a Modal execution backend. But it's still a general long-horizon agent harness. The open-ended loop and per-step inference tax remained, and the defaults are tuned for autonomous task-running, not the tight scaffold → build → preview loop app generation needs.

The honest read: the bottleneck wasn't either framework's quality. An app builder is a mostly-deterministic delivery pipeline wearing an agent costume. So I stopped bending a research harness toward codegen and designed for the actual job.

04Architecture

A pipeline, warm Modal sandboxes, and overlap

I replaced the open-ended loop with an explicit, mostly-deterministic pipeline: scaffold → generate files → install/build → commit → deploy → thumbnail. The model is invoked a bounded number of times for generation instead of once per micro-step, which kills the per-step inference tax and the “should I loop back?” overhead that dominated the research loops.

The redesign. PR titles: “Add pipeline”, “Use modal proxy”

  • Warm Modal sandboxes through a thin proxy: build, install, and deploy run in pre-warmed, isolated Linux environments instead of cold per-run Docker. Image-layer cache, a warmed node_modules, and a warm pool mean each app stops re-paying the multi-minute cold install. Untrusted generated code stays sandboxed.
  • Execute as you generate: dependency install starts the moment package.json is known, overlapping with continued component generation; coherent files get written and committed while later files are still being produced.
  • Parallel codegen: files with no data dependency (separate components, routes, configs) generate concurrently, collapsing a sum of sequential generations toward the cost of the slowest batch.
  • Overlapped long tail: thumbnail/screenshot work kicks off on deploy-readiness in a separate worker, so it doesn't block the user-facing deploy. The research/report roles and approval gates are stripped entirely.
05The Numbers

About 4×, end to end

Every figure is full end-to-end wall-clock: code generation + GitHub commits + deploy + thumbnail, not just generation.

Full-stack app

~45 min → ~13 min

Frontend-only

~20 min → ~5 min

End-to-end

~4× faster

The clock covers

codegen + commit + deploy + thumbnail

Shipped in

37 commits · 11 PRs · 1 month

GitHub contribution activity on Vyro-ai/charon-make: 37 commits and 11 pull requests, 10 merged
The work behind the numbers. Vyro-ai/charon-make: 37 commits, 11 PRs (10 merged) in a month. Two of them: “Use modal proxy” and “Add pipeline.”
06What I Learned

Portable lessons

01Match the harness to the job. An app builder is a delivery pipeline, not a research agent.

02Every step that calls the model is on the critical path. Bound the calls; let determinism carry the rest.

03Warm beats clever. A pre-warmed, cached sandbox erases more wall-clock than any prompt trick.

04Overlap is free speed: generate, commit, build, and thumbnail want to run concurrently, not in line.