Skip to content

Fix replay rendering for exec_command and update_plan#215

Open
kokoro-aya wants to merge 4 commits intozed-industries:mainfrom
kokoro-aya:kokoro-aya/fix-replay-for-exec_command-and-update_plan
Open

Fix replay rendering for exec_command and update_plan#215
kokoro-aya wants to merge 4 commits intozed-industries:mainfrom
kokoro-aya:kokoro-aya/fix-replay-for-exec_command-and-update_plan

Conversation

@kokoro-aya
Copy link
Copy Markdown

Context

I was using Codex ACP in Zed for various ongoing tasks. I was able to reenter threads of previous sessions or previous projects and seeing structured output for CLI commands and the plan drafted during these threads.

After a thread where a prompt messed with some dirty inputs (cross-referencing local files/previous threads), the Codex CLI panel's rendering degraded and all CLI commands/plans rendered as generic exec_command and update_plan, which made the reading of threads difficult.

I cloned this repo and worked locally and solved this rendering issue in my fork, which allows me to continue to work on my projects without downgrading my UX.

Summary

This PR fixes two history replay parity issues in codex-acp:

  • replayed FunctionCall(name="exec_command") entries were shown as generic exec_command tool calls instead of meaningful command titles such as Read ..., Search ..., or List ...
  • replayed FunctionCall(name="update_plan") entries were shown as generic tool calls instead of restoring the ACP plan UI

Live rendering already handled both cases better; the replay path did not.

Problem

When sessions were restored from history, replay did not fully reconstruct the richer ACP presentation used during live execution.

In practice this caused two visible regressions in old threads:

  1. shell-backed commands stored as FunctionCall(name="exec_command") were replayed as generic tool calls
  2. plan updates stored as FunctionCall(name="update_plan") were replayed as generic tool calls instead of plan updates

This made restored history less useful and less consistent with live session behavior.

Root cause

The replay path in thread.rs handled only a subset of historical tool-call shapes as structured replay events.

exec_command

Replay already special-cased a few shell-like function names:

  • shell
  • container.exec
  • shell_command

But historical FunctionCall(name="exec_command") fell through to the generic fallback path, so replay lost semantic tool metadata.

update_plan

Live plan updates already had a dedicated path:

  • PlanUpdate events were translated into SessionUpdate::Plan

But during replay, historical FunctionCall(name="update_plan") entries were not translated back into plan updates and instead fell through to the generic function-call fallback.

What changed

exec_command

Replay now treats exec_command as a shell-like function call.

It parses:

  • cmd
  • workdir

and reuses the existing command parsing logic to recover structured tool-call metadata during replay.

If parsing fails, replay still falls back to the previous generic behavior.

update_plan

Replay now special-cases FunctionCall(name="update_plan").

It:

  • parses the stored plan arguments
  • emits SessionUpdate::Plan
  • tracks the corresponding call_id only during the current replay pass
  • suppresses the matching FunctionCallOutput during that same replay pass so replay does not emit a stray generic tool update afterward

This replay bookkeeping is local to the replay pass and does not become persistent session state.

Result

After rebuilding and reopening older threads in Zed:

  • replayed shell-backed tool calls render with meaningful titles again
  • replayed plan updates render as proper plan UI again

Examples observed after the fix include:

  • Read Foo.scala
  • Read Bar.scala
  • Read Baz.scala
  • List /Users/irony/Developer/some-project/src/some-module

All these commands were previously shown as exec_command.

Scope

This PR only changes history replay behavior.

It does not change:

  • live tool-call rendering
  • live plan update handling
  • stored rollout / session history data
  • authentication, session lifecycle, or prompt submission behavior

Risk

  • the changes are limited to replay
  • existing generic fallback behavior remains in place for unrecognized or unparsable function calls
  • no historical data is rewritten
  • the additional replay bookkeeping for update_plan is local to a single replay pass

Testing

Automated

  • Ran cargo test

Added 2 tests:

  • replaying FunctionCall(name="exec_command") produces a structured tool call instead of a generic one:

    cargo test test_replay_exec_command_function_call_is_structured

  • replaying FunctionCall(name="update_plan") produces a plan update and suppresses the matching generic function-call output:

    cargo test test_replay_update_plan_function_call_emits_plan_update

  • Ran cargo fmt --check

Manual

  • Build an artefact with cargo build --release
  • Add a custom agent in Zed
  • Opened older threads in Zed containing replayed exec_command calls
  • Opened older threads in Zed containing replayed update_plan entries
  • See the difference between this custom agent and Codex CLI

Observed that:

  • command history now renders with meaningful titles
  • plan history now renders as plan UI

Issue relevance

This branch addresses the following replay downgrade problems:

  • generic replay of historical exec_command
  • generic replay of historical update_plan

It does not solve every rendering anomaly found during investigation, especially related to unusual embedded transcript/context content.

I used Codex to help me investigate the issue, drafted the code and the PR.

Previously, only the following cases are supported:
- `shell`
- `container.exec`
- `shell_command`

This commit adds `exec_command` as well into this path of `shell-like function call` kinds of commands.

- Commit drafted by Codex.
See test case `test_replay_exec_command_function_call_is_structured`.

- Commit drafted by Codex.
This commit switches the replay of `update_plan` from generic tool call to a structured "plan update" as it displays while thread was performing.

What has been changed:

- A local `HashSet<String>` was added in `handle_replay_history` function
  - This local state is only used for current replay for indexing `call_id` associated with updated plans
  - Also adjusted signature of `replay_response_item`
- Updated `FunctionCall` and `FunctionCallOutput` branches of `ResponseItem`
  - in `ResponseItem::FunctionCall`, we first try to reconstruct the plan from generic tool call, if it passes, we note down the `call_id` and use this plan
  - in `ResponseItem::FunctionCallOutput`, we omit the outputs of these generic tool updates

- Commit drafted by Codex.
See test case `test_replay_update_plan_function_call_emits_plan_update`.

- Commit drafted by Codex.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant