Labs note 中文阅读

Mobile-on-Mobile Agent Loop

A simple phone-GUI workflow where a phone-side Codex session asks desktop Codex to operate the same Android phone through USB ADB, without using any dedicated mobile GUI agent.

DateMay 26, 2026
StatusLabs note
Figure 1: the phone sends intent, the desktop executes, USB ADB operates the same phone, and screenshots close the loop.
Mobile-on-Mobile Agent Loop cover diagram

TL;DR

Experiment: OpenCodexLabs/open-codex-mobile · Mobile-on-Mobile Agent Loop.

This is not a product launch, and it is not a full mobile GUI agent. It is a simple phone-GUI workflow built from Codex on the phone, Codex on the desktop, USB ADB, screenshots, and human-visible verification. The useful loop is: the phone can be both the agent control surface and the target GUI surface.

The user starts from a phone-side Codex session. Desktop Codex receives the task. The desktop uses USB ADB to operate the same Android phone. The phone app returns a real GUI result. Screenshots and UI state provide evidence. The answer comes back to the phone-side conversation.

In one sentence: mobile control does not have to stop at supervising a desktop agent; it can trigger a desktop agent to act on the mobile GUI itself.

The odd moment

Most mobile-agent products treat the phone as a remote control. You check status, approve commands, add context, and read the final answer. That already changes the agent workflow, because the human no longer has to sit in front of the desktop while a long task is running.

This experiment adds one more turn. The same phone that sends the instruction also becomes the thing being operated. The desktop remains the trusted execution machine, but the action surface is the Android phone GUI: a real app, a real screen, real buttons, real state.

That distinction matters. Many useful tasks live behind mobile apps, weak APIs, local permissions, or app-only interfaces. If an agent can safely operate a phone GUI with evidence, those tasks can enter an agent workflow before a clean API exists.

The loop

The implementation was deliberately simple. No special mobile automation framework was needed for the first proof. USB ADB was enough to open an app, capture screenshots, inspect UI state, tap, type, and verify the result.

Figure 2: the minimal mobile-on-mobile loop.
Phone Codex to desktop Codex to USB ADB to same phone GUI workflow
RoleWhat it doesWhy it matters
Phone-side CodexCaptures user intent and receives the final result.The phone stays the lightweight control surface.
Desktop CodexRuns local tools, ADB, screenshots, and verification.The trusted machine remains the executor.
USB ADBBridges actions into the Android GUI.The experiment avoids network complexity and keeps the loop reproducible.
Same phone GUIProvides the real app state and visual result.The agent acts on the environment the user actually cares about.

The demo

I used a deliberately ordinary task: open a map app on the connected phone, search for a destination, read a route or ride estimate, and stop before any irreversible action. The public screenshots keep the map flow visible; only local infrastructure details in the control screenshot are masked.

The point of the demo was not the specific route. The point was that the agent operated a real phone app and came back with evidence from the same device that initiated the request.

Figure 3: evidence from the live run. The map screenshots keep the real app flow visible, while the control screenshot hides local infrastructure details.
Screenshots showing phone Codex, map search result, route estimate, and ride estimate evidence

What this changes

The important result is not that ADB can click a phone. That has been true for years. The interesting result is the product shape: a mobile agent session can delegate execution to a desktop agent, which can then operate the same mobile GUI and report back.

That suggests a broader division of labor. The phone is good at intent, supervision, quick approval, and final reading. The desktop is good at trusted execution, local tools, files, logs, and long-running work. The phone GUI is good at representing app-only tasks that do not yet have a clean API.

Once these pieces are connected, the mobile app stops being only a chat box. It becomes a real action surface in the agent workflow.

Why I would not start with MCP

It is tempting to immediately turn this into an MCP server, a reusable skill, or a polished mobile-control product. I would not start there.

The first reusable unit is the workflow, not the protocol. The agent needs a safe rhythm: take a screenshot, inspect UI state, perform one small action, take another screenshot, stop at irreversible boundaries, and summarize the evidence. Without that rhythm, a nicer tool interface only makes unsafe automation easier.

LayerWhen it is enoughWhen to promote it
Raw ADB workflowOne phone, one experiment, human-supervised tasks.When the steps repeat and need guardrails.
SkillCodify safe operating habits and stop conditions.When multiple agents should reuse the same procedure.
MCPExpose stable tools such as screenshot, dump UI, tap, type, open app.When several clients or devices need the same interface.

Safety boundaries

This kind of loop should stay conservative. It is fine to open an app, search, read a result, and collect screenshots. It is not fine to silently place orders, pay, send messages, change account settings, grant new permissions, or handle sensitive content without explicit user confirmation.

The same applies to publishing traces. Phone screenshots often contain more personal data than expected: location, account state, notification icons, map context, local IPs, nearby places, and historical app state. The public artifact should be reviewed deliberately, and infrastructure details should be masked when they are not part of the point.

The story in one sentence

Mobile-on-Mobile Agent Loop is a small experiment, but it points to a useful future primitive: the phone can be the place where intent starts, the desktop can be the place where execution happens, and the same phone GUI can be the real-world surface the agent operates.

The phone is not only a remote control for agents. Sometimes it is the environment.

Discussion

Comments are powered by Disqus.