Fix replay rendering for exec_command and update_plan#215
Open
kokoro-aya wants to merge 4 commits intozed-industries:mainfrom
Open
Fix replay rendering for exec_command and update_plan#215kokoro-aya wants to merge 4 commits intozed-industries:mainfrom
exec_command and update_plan#215kokoro-aya wants to merge 4 commits intozed-industries:mainfrom
Conversation
Previously, only the following cases are supported: - `shell` - `container.exec` - `shell_command` This commit adds `exec_command` as well into this path of `shell-like function call` kinds of commands. - Commit drafted by Codex.
See test case `test_replay_exec_command_function_call_is_structured`. - Commit drafted by Codex.
This commit switches the replay of `update_plan` from generic tool call to a structured "plan update" as it displays while thread was performing. What has been changed: - A local `HashSet<String>` was added in `handle_replay_history` function - This local state is only used for current replay for indexing `call_id` associated with updated plans - Also adjusted signature of `replay_response_item` - Updated `FunctionCall` and `FunctionCallOutput` branches of `ResponseItem` - in `ResponseItem::FunctionCall`, we first try to reconstruct the plan from generic tool call, if it passes, we note down the `call_id` and use this plan - in `ResponseItem::FunctionCallOutput`, we omit the outputs of these generic tool updates - Commit drafted by Codex.
See test case `test_replay_update_plan_function_call_emits_plan_update`. - Commit drafted by Codex.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
I was using Codex ACP in Zed for various ongoing tasks. I was able to reenter threads of previous sessions or previous projects and seeing structured output for CLI commands and the plan drafted during these threads.
After a thread where a prompt messed with some dirty inputs (cross-referencing local files/previous threads), the Codex CLI panel's rendering degraded and all CLI commands/plans rendered as generic
exec_commandandupdate_plan, which made the reading of threads difficult.I cloned this repo and worked locally and solved this rendering issue in my fork, which allows me to continue to work on my projects without downgrading my UX.
Summary
This PR fixes two history replay parity issues in
codex-acp:FunctionCall(name="exec_command")entries were shown as genericexec_commandtool calls instead of meaningful command titles such asRead ...,Search ..., orList ...FunctionCall(name="update_plan")entries were shown as generic tool calls instead of restoring the ACP plan UILive rendering already handled both cases better; the replay path did not.
Problem
When sessions were restored from history, replay did not fully reconstruct the richer ACP presentation used during live execution.
In practice this caused two visible regressions in old threads:
FunctionCall(name="exec_command")were replayed as generic tool callsFunctionCall(name="update_plan")were replayed as generic tool calls instead of plan updatesThis made restored history less useful and less consistent with live session behavior.
Root cause
The replay path in
thread.rshandled only a subset of historical tool-call shapes as structured replay events.exec_commandReplay already special-cased a few shell-like function names:
shellcontainer.execshell_commandBut historical
FunctionCall(name="exec_command")fell through to the generic fallback path, so replay lost semantic tool metadata.update_planLive plan updates already had a dedicated path:
PlanUpdateevents were translated intoSessionUpdate::PlanBut during replay, historical
FunctionCall(name="update_plan")entries were not translated back into plan updates and instead fell through to the generic function-call fallback.What changed
exec_commandReplay now treats
exec_commandas a shell-like function call.It parses:
cmdworkdirand reuses the existing command parsing logic to recover structured tool-call metadata during replay.
If parsing fails, replay still falls back to the previous generic behavior.
update_planReplay now special-cases
FunctionCall(name="update_plan").It:
SessionUpdate::Plancall_idonly during the current replay passFunctionCallOutputduring that same replay pass so replay does not emit a stray generic tool update afterwardThis replay bookkeeping is local to the replay pass and does not become persistent session state.
Result
After rebuilding and reopening older threads in Zed:
Examples observed after the fix include:
Read Foo.scalaRead Bar.scalaRead Baz.scalaList /Users/irony/Developer/some-project/src/some-moduleAll these commands were previously shown as
exec_command.Scope
This PR only changes history replay behavior.
It does not change:
Risk
update_planis local to a single replay passTesting
Automated
cargo testAdded 2 tests:
replaying
FunctionCall(name="exec_command")produces a structured tool call instead of a generic one:cargo test test_replay_exec_command_function_call_is_structuredreplaying
FunctionCall(name="update_plan")produces a plan update and suppresses the matching generic function-call output:cargo test test_replay_update_plan_function_call_emits_plan_updateRan
cargo fmt --checkManual
cargo build --releaseexec_commandcallsupdate_planentriesObserved that:
Issue relevance
This branch addresses the following replay downgrade problems:
exec_commandupdate_planIt does not solve every rendering anomaly found during investigation, especially related to unusual embedded transcript/context content.
I used Codex to help me investigate the issue, drafted the code and the PR.