improvement: inline audio and video in HTML export#8931
improvement: inline audio and video in HTML export#8931
Conversation
Closes #8801 ## Summary When exporting a marimo notebook to HTML, `<audio>` and `<video>` elements with virtual file URLs (`./@file/...`) were not embedded as data URIs — only `<img>` tags were. This meant audio clips broke in exported HTML files. Added `"audio"` and `"video"` to `VIRTUAL_FILE_ALLOWED_TAGS` so they are inlined just like images. Also added a 10MB size cap (`MAX_VIRTUAL_FILE_INLINE_BYTES`) — files exceeding this limit are skipped and should be served from the `public/` folder instead.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR updates the HTML export pipeline to inline virtual-file media for <audio> and <video> elements (in addition to <img>), and introduces a maximum inline size threshold intended to prevent oversized assets from being embedded.
Changes:
- Expand HTML virtual-file inlining to include
audioandvideotags (previouslyimgonly). - Add a
max_inline_byteslimit to virtual-file → data-URI replacement during DOM traversal and HTML export. - Add/extend tests covering audio/video replacement and size-limit skipping behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| marimo/_server/export/exporter.py | Allows audio/video inlining; adds a 10MB inline cap and threads it through export inlining + virtual-files map generation. |
| marimo/_convert/common/dom_traversal.py | Adds max_inline_bytes support to replace_virtual_files_with_data_uris() to skip oversized virtual files. |
| tests/_server/export/test_exporter.py | Adds export-level tests for audio inlining and oversized virtual-file behavior. |
| tests/_convert/common/test_dom_traversal.py | Adds DOM traversal tests for audio/video replacement and max-inline-bytes skipping. |
| VIRTUAL_FILE_ALLOWED_TAGS = {"img", "audio", "video"} | ||
| # Maximum file size to inline as a data URI in exported HTML (10 MB). | ||
| # Files exceeding this are skipped and should be served from the public/ folder. | ||
| MAX_VIRTUAL_FILE_INLINE_BYTES = 10 * 1024 * 1024 |
There was a problem hiding this comment.
The comment says files over the limit "should be served from the public/ folder", but the exporter currently just skips inlining and leaves the original ./@file/... URL in the HTML. In a static/offline HTML export there’s no handler for ./@file/ URLs, so skipped media will still be broken unless the URL is rewritten (e.g., to ./public/) or the export surfaces a user-visible warning/error when skipping.
marimo/_server/export/exporter.py
Outdated
| byte_length = int(byte_length_str) | ||
| except Exception as e: | ||
| LOGGER.warning( | ||
| "File not found in export: %s. Error: %s", file_url, e |
There was a problem hiding this comment.
The warning logged in the URL parsing block uses "File not found in export" even when the failure is actually due to an invalid virtual-file URL format or non-integer byte length. Consider changing the message to reflect parsing/validation failure (and keep the "not found" wording for actual read_virtual_file errors) to make logs actionable.
| "File not found in export: %s. Error: %s", file_url, e | |
| "Invalid virtual file URL in export (parsing/validation failed): %s. Error: %s", | |
| file_url, | |
| e, |
| file_manager = AppFileManager.from_app(InternalApp(app)) | ||
| cell_ids = list(file_manager.app.cell_manager.cell_ids()) | ||
|
|
||
| # 10_000_000 bytes exceeds the 5 MB limit |
There was a problem hiding this comment.
The inline comment here is inconsistent with the test data and the new 10MB cap: it mentions "10_000_000 bytes" and a "5 MB limit", but the URL uses 20_000_000 bytes and MAX_VIRTUAL_FILE_INLINE_BYTES is 10MB. Please update the comment (and/or the byte length used) so the test documents the real behavior.
| # 10_000_000 bytes exceeds the 5 MB limit | |
| # 20_000_000 bytes (20 MB) exceeds the 10 MB inline limit |
|
|
||
| # The large file should NOT be inlined in the HTML output | ||
| assert "data:audio/x-wav;base64," not in html | ||
| # It should fall through to the virtual_files dict instead |
There was a problem hiding this comment.
This assertion comment says the oversized virtual file should "fall through to the virtual_files dict", but exporter._build_virtual_files_dict now receives the same max_inline_bytes cap and will skip adding oversized entries there as well. Either adjust the test expectation/docs, or change the implementation if the dict fallback is still intended for some asset types.
| # It should fall through to the virtual_files dict instead | |
| # The HTML should keep a reference to the original virtual file path |
- Fix misleading "File not found" log message for URL parsing errors - Fix incorrect test comments about size limits and fallback behavior
…port Instead of leaving a broken ./@file/ URL in static HTML when a file exceeds the inline size limit, emit a text/plain data URI with a human-readable message explaining why the file was not inlined.
… files The test expected oversized virtual files to keep their original ./@file/ URL, but commit d39e7d3 changed the behavior to emit a text/plain placeholder data URI instead. Update assertions accordingly.
mimetypes.guess_type returns audio/x-wav on macOS/Linux but audio/wav on Windows, causing test failures on Windows CI.
… tests The test_proxy_static_file test was flaky on macOS CI because time.sleep(1) was insufficient for the subprocess server to be ready. Poll the socket instead to wait for actual readiness.
Closes #8801
When exporting a marimo notebook to HTML,
<audio>and<video>elements with virtual file URLs (./@file/...) were not embedded as data URIs — only<img>tags were. This meant audio clips broke in exported HTML files.Added
"audio"and"video"toVIRTUAL_FILE_ALLOWED_TAGSso they are inlined just like images. Also added a 10MB size cap (MAX_VIRTUAL_FILE_INLINE_BYTES) — files exceeding this limit are skipped and should be served from thepublic/folder instead.