Skip to content

improvement: inline audio and video in HTML export#8931

Open
mscolnick wants to merge 6 commits intomainfrom
ms/embed-audio
Open

improvement: inline audio and video in HTML export#8931
mscolnick wants to merge 6 commits intomainfrom
ms/embed-audio

Conversation

@mscolnick
Copy link
Copy Markdown
Contributor

@mscolnick mscolnick commented Mar 30, 2026

Closes #8801

When exporting a marimo notebook to HTML, <audio> and <video> elements with virtual file URLs (./@file/...) were not embedded as data URIs — only <img> tags were. This meant audio clips broke in exported HTML files.

Added "audio" and "video" to VIRTUAL_FILE_ALLOWED_TAGS so they are inlined just like images. Also added a 10MB size cap (MAX_VIRTUAL_FILE_INLINE_BYTES) — files exceeding this limit are skipped and should be served from the public/ folder instead.

Closes #8801

## Summary

When exporting a marimo notebook to HTML, `<audio>` and `<video>` elements with virtual file URLs (`./@file/...`) were not embedded as data URIs — only `<img>` tags were. This meant audio clips broke in exported HTML files.

Added `"audio"` and `"video"` to `VIRTUAL_FILE_ALLOWED_TAGS` so they are inlined just like images. Also added a 10MB size cap (`MAX_VIRTUAL_FILE_INLINE_BYTES`) — files exceeding this limit are skipped and should be served from the `public/` folder instead.
@mscolnick mscolnick requested a review from dmadisetti as a code owner March 30, 2026 18:51
Copilot AI review requested due to automatic review settings March 30, 2026 18:51
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Mar 31, 2026 8:42pm

Request Review

@mscolnick mscolnick added the enhancement New feature or request label Mar 30, 2026
@mscolnick mscolnick changed the title fix: inline audio and video in HTML export improvement: inline audio and video in HTML export Mar 30, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the HTML export pipeline to inline virtual-file media for <audio> and <video> elements (in addition to <img>), and introduces a maximum inline size threshold intended to prevent oversized assets from being embedded.

Changes:

  • Expand HTML virtual-file inlining to include audio and video tags (previously img only).
  • Add a max_inline_bytes limit to virtual-file → data-URI replacement during DOM traversal and HTML export.
  • Add/extend tests covering audio/video replacement and size-limit skipping behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
marimo/_server/export/exporter.py Allows audio/video inlining; adds a 10MB inline cap and threads it through export inlining + virtual-files map generation.
marimo/_convert/common/dom_traversal.py Adds max_inline_bytes support to replace_virtual_files_with_data_uris() to skip oversized virtual files.
tests/_server/export/test_exporter.py Adds export-level tests for audio inlining and oversized virtual-file behavior.
tests/_convert/common/test_dom_traversal.py Adds DOM traversal tests for audio/video replacement and max-inline-bytes skipping.

Comment on lines +62 to +65
VIRTUAL_FILE_ALLOWED_TAGS = {"img", "audio", "video"}
# Maximum file size to inline as a data URI in exported HTML (10 MB).
# Files exceeding this are skipped and should be served from the public/ folder.
MAX_VIRTUAL_FILE_INLINE_BYTES = 10 * 1024 * 1024
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says files over the limit "should be served from the public/ folder", but the exporter currently just skips inlining and leaves the original ./@file/... URL in the HTML. In a static/offline HTML export there’s no handler for ./@file/ URLs, so skipped media will still be broken unless the URL is rewritten (e.g., to ./public/) or the export surfaces a user-visible warning/error when skipping.

Copilot uses AI. Check for mistakes.
byte_length = int(byte_length_str)
except Exception as e:
LOGGER.warning(
"File not found in export: %s. Error: %s", file_url, e
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning logged in the URL parsing block uses "File not found in export" even when the failure is actually due to an invalid virtual-file URL format or non-integer byte length. Consider changing the message to reflect parsing/validation failure (and keep the "not found" wording for actual read_virtual_file errors) to make logs actionable.

Suggested change
"File not found in export: %s. Error: %s", file_url, e
"Invalid virtual file URL in export (parsing/validation failed): %s. Error: %s",
file_url,
e,

Copilot uses AI. Check for mistakes.
file_manager = AppFileManager.from_app(InternalApp(app))
cell_ids = list(file_manager.app.cell_manager.cell_ids())

# 10_000_000 bytes exceeds the 5 MB limit
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline comment here is inconsistent with the test data and the new 10MB cap: it mentions "10_000_000 bytes" and a "5 MB limit", but the URL uses 20_000_000 bytes and MAX_VIRTUAL_FILE_INLINE_BYTES is 10MB. Please update the comment (and/or the byte length used) so the test documents the real behavior.

Suggested change
# 10_000_000 bytes exceeds the 5 MB limit
# 20_000_000 bytes (20 MB) exceeds the 10 MB inline limit

Copilot uses AI. Check for mistakes.

# The large file should NOT be inlined in the HTML output
assert "data:audio/x-wav;base64," not in html
# It should fall through to the virtual_files dict instead
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion comment says the oversized virtual file should "fall through to the virtual_files dict", but exporter._build_virtual_files_dict now receives the same max_inline_bytes cap and will skip adding oversized entries there as well. Either adjust the test expectation/docs, or change the implementation if the dict fallback is still intended for some asset types.

Suggested change
# It should fall through to the virtual_files dict instead
# The HTML should keep a reference to the original virtual file path

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

- Fix misleading "File not found" log message for URL parsing errors
- Fix incorrect test comments about size limits and fallback behavior
…port

Instead of leaving a broken ./@file/ URL in static HTML when a file
exceeds the inline size limit, emit a text/plain data URI with a
human-readable message explaining why the file was not inlined.
… files

The test expected oversized virtual files to keep their original ./@file/
URL, but commit d39e7d3 changed the behavior to emit a text/plain
placeholder data URI instead. Update assertions accordingly.
mimetypes.guess_type returns audio/x-wav on macOS/Linux but audio/wav
on Windows, causing test failures on Windows CI.
… tests

The test_proxy_static_file test was flaky on macOS CI because time.sleep(1)
was insufficient for the subprocess server to be ready. Poll the socket
instead to wait for actual readiness.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

option to configure exporting audio in HTML

2 participants