The problem
The operational layer of e-commerce — listing optimization, ad placement, support, payments, fulfillment — currently requires humans because the existing tools were built when humans were the only option. AI assistants are bolted on, but the merchant is still the loop. The platform suggests; the merchant approves. Suggestion-and-approval feels like progress. It isn't. It's faster typing.
Our angle
We believe the next architecture is not "AI assistants for merchants" but "AI agents that run operations." The merchant defines the strategy: what to sell, who to sell to, what brand position to defend. Everything below that is operational and can run autonomously.
We named six. Atlas keeps the catalog upright. Apollo writes and schedules campaigns. Pheme places paid acquisition. Iris answers customer support. Plutus reconciles payments and routes refunds. Mercury coordinates fulfillment across logistics networks. Each one is a separately measurable, separately reliability-budgeted system. Each one speaks a shared protocol so they can hand work to each other without the merchant in the middle.
What we're exploring
Reliability under load. Multi-agent coordination when two agents disagree (does Atlas re-list a SKU that Plutus flagged for refund risk? does Apollo schedule a campaign Mercury can't fulfill?). The right human-in-the-loop boundary for high-stakes actions — refunds above a threshold, ad spend above a daily limit, listings that touch regulated categories. Observability for non-technical merchants: when an agent makes a decision, the merchant should be able to ask why and get an answer they understand.
We're also exploring the eval problem. LLM-based agents can pass a spot check and fail under traffic. Our internal harness runs every agent against a regression suite of merchant scenarios on every release. The numbers go in the benchmarks file when it goes public.
Status
Active prototype within fastart.co. Customer-facing in early-access mode. The six agents run real stores today, with merchant override on every decision.
An invitation
If you've worked on agent reliability, multi-agent evaluation, or LLM observability — and you have notes on where commerce agents fail under real operations — we'd like to compare notes. research@fastart.tech.
If you're a researcher with a strong opinion about the right boundary between autonomous decision and human approval in commerce, we share notes. research@fastart.tech.