Skip to content

fix: clear error state on disabled-transitively cells when ancestor recovers#8784

Open
VishakBaddur wants to merge 2 commits intomarimo-team:mainfrom
VishakBaddur:fix/disabled-cell-error-state-not-cleared
Open

fix: clear error state on disabled-transitively cells when ancestor recovers#8784
VishakBaddur wants to merge 2 commits intomarimo-team:mainfrom
VishakBaddur:fix/disabled-cell-error-state-not-cleared

Conversation

@VishakBaddur
Copy link
Copy Markdown
Contributor

Fixes #8072

Root Cause

When a disabled-transitively cell's ancestor had an error and then recovered, the disabled cell permanently showed the ancestor's error state.

run_stale_cells() in runtime.py only re-queues non-disabled cells:

if cell_impl.stale and not self.graph.is_disabled(cid):
    cells_to_run.add(cid)

So disabled-transitively cells never got re-queued and never had a chance to reset their run_result_status from "exception" to "disabled".

Fix

  • Added is_any_ancestor_errored() to DirectedGraph
  • In run_stale_cells(), after building cells_to_run, reset run_result_status to "disabled" for any disabled-transitively cell whose ancestor no longer has an error

Testing

Added test_is_any_ancestor_errored to tests/_runtime/test_dataflow.py verifying the new graph method correctly detects and clears ancestor error states.

…ecovers

Fixes marimo-team#8072

When a disabled-transitively cell's ancestor had an error and then
recovered, the disabled cell permanently showed the ancestor's error
state. This happened because run_stale_cells() only re-queues non-disabled
cells, so disabled-transitively cells never got a chance to reset their
run_result_status from 'exception' to 'disabled'.

Fix:
- Add is_any_ancestor_errored() to DirectedGraph
- In run_stale_cells(), after building cells_to_run, reset run_result_status
  to 'disabled' for any disabled-transitively cell whose ancestor no longer
  has an error
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Mar 20, 2026 1:26am

Request Review

@mscolnick mscolnick added the bug Something isn't working label Mar 20, 2026
@mscolnick mscolnick requested a review from Copilot March 20, 2026 17:44
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

def is_any_ancestor_errored(self, cell_id: CellId_t) -> bool:
"""Check if any ancestor of a cell has an error."""
return any(
self.topology.cells[cid].run_result_status == "exception"
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_any_ancestor_errored() only treats run_result_status == "exception" as an error, but the runtime also uses other error-like statuses (e.g. "marimo-error" is set for semantic/registration errors). The method name/docstring says “has an error”, so this narrow check is likely to be reused incorrectly and can cause false negatives when an ancestor is still in an error state.

Consider either (a) broadening the predicate to include all statuses that should be treated as “errored” (at least "exception" and "marimo-error", possibly "interrupted" depending on intent), or (b) renaming/docstring to make it explicit that it only checks for raised exceptions.

Suggested change
self.topology.cells[cid].run_result_status == "exception"
self.topology.cells[cid].run_result_status in ("exception", "marimo-error")

Copilot uses AI. Check for mistakes.
Comment on lines +1788 to +1797
# Clear stale error state from disabled-transitively cells whose
# ancestor has recovered from an error. Without this, the disabled
# cell permanently shows the ancestor error even after it is fixed.
for cid, cell_impl in self.graph.cells.items():
if (
self.graph.is_disabled(cid)
and not cell_impl.config.disabled
and cell_impl.run_result_status == "exception"
and not self.graph.is_any_ancestor_errored(cid)
):
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop calls self.graph.is_disabled(cid) for every cell on every run_stale_cells() invocation. DirectedGraph.is_disabled() walks parents (BFS) and can become a noticeable hot path for large notebooks.

Since this block only targets disabled-transitively cells, consider using the already-tracked runtime state (cell_impl.runtime_state == "disabled-transitively" / cell_impl.disabled_transitively) instead of recomputing is_disabled() each time, or precomputing a disabled set once and reusing it in both loops.

Copilot uses AI. Check for mistakes.
Comment on lines +1788 to +1799
# Clear stale error state from disabled-transitively cells whose
# ancestor has recovered from an error. Without this, the disabled
# cell permanently shows the ancestor error even after it is fixed.
for cid, cell_impl in self.graph.cells.items():
if (
self.graph.is_disabled(cid)
and not cell_impl.config.disabled
and cell_impl.run_result_status == "exception"
and not self.graph.is_any_ancestor_errored(cid)
):
cell_impl.set_run_result_status("disabled")

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block updates cell_impl.run_result_status but does not emit any CellNotification to the frontend. The frontend’s “errored”/error UI is driven by received cell-op messages (especially error outputs), and it doesn’t observe backend run_result_status directly.

If the goal is to clear the user-visible error state for disabled-transitively cells, this likely also needs an explicit UI update (e.g., clearing/replacing the error output and/or sending a status transition that resets the frontend’s errored flag). An alternative is to include these cells in the normal _run_cells queue so they go through the runner’s standard status transitions, plus explicitly clearing their error output when they’re skipped as disabled.

Copilot uses AI. Check for mistakes.
Comment on lines +1594 to +1623
def test_is_any_ancestor_errored() -> None:
"""Test that is_any_ancestor_errored correctly detects ancestor errors."""
graph = dataflow.DirectedGraph()
# Create a chain: 0 -> 1 -> 2
code = "x = 0"
first_cell = parse_cell(code)
graph.register_cell("0", first_cell)
code = "y = x"
second_cell = parse_cell(code)
graph.register_cell("1", second_cell)
code = "z = y"
third_cell = parse_cell(code)
graph.register_cell("2", third_cell)

# No errors initially
assert not graph.is_any_ancestor_errored("0")
assert not graph.is_any_ancestor_errored("1")
assert not graph.is_any_ancestor_errored("2")

# Set cell 0 to exception state
graph.cells["0"].set_run_result_status("exception")
assert not graph.is_any_ancestor_errored("0") # no ancestors
assert graph.is_any_ancestor_errored("1") # parent 0 has error
assert graph.is_any_ancestor_errored("2") # grandparent 0 has error

# Fix cell 0 - clear the error
graph.cells["0"].set_run_result_status("success")
assert not graph.is_any_ancestor_errored("0")
assert not graph.is_any_ancestor_errored("1")
assert not graph.is_any_ancestor_errored("2")
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test validates the new DirectedGraph.is_any_ancestor_errored() helper, but the PR’s user-facing behavior change is in Kernel.run_stale_cells() (clearing disabled-transitively cells’ stale error state when an ancestor recovers). Consider adding an integration-style runtime test that reproduces #8072 end-to-end (ancestor errors → downstream disabled-transitively cell shows error → ancestor fixed + run_stale_cells() → downstream cell no longer shows error/exception state). This would help ensure the run_stale_cells() logic stays correct as execution/notification behavior evolves.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disabled cells' ancestor errors cannot be cleared

3 participants