Skip to content

feat: add screen recording capability#159

Open
AlexAlves87 wants to merge 11 commits intoopenclaw:masterfrom
AlexAlves87:feat/screen-record
Open

feat: add screen recording capability#159
AlexAlves87 wants to merge 11 commits intoopenclaw:masterfrom
AlexAlves87:feat/screen-record

Conversation

@AlexAlves87
Copy link
Copy Markdown
Contributor

@AlexAlves87 AlexAlves87 commented Apr 8, 2026

Summary

Adds screen.record, screen.record.start, and screen.record.stop to ScreenCapability. screen.record does a fixed-duration capture and returns a base64 MP4. The start/stop pair allows session-based recording with a recordingId so the caller controls when to stop.

What changed

  • ScreenCapability: three new commands with events and arg parsing
  • ScreenRecordingService: WinRT capture (D3D11CaptureFramePool), BGRA→NV12 conversion, MediaTranscoder with hardware and software fallback
  • NodeService: wired to the new capability events

Testing

  • ./build.ps1
  • dotnet test ./tests/OpenClaw.Shared.Tests/OpenClaw.Shared.Tests.csproj --no-restore
  • dotnet test ./tests/OpenClaw.Tray.Tests/OpenClaw.Tray.Tests.csproj --no-restore
  • Manual: called screen.record from a connected agent, received valid MP4

Notes

OpenClaw.Shared.Tests currently has 8 pre-existing failures on this branch related to culture-sensitive number formatting. These are unrelated to this change and are covered by a separate fix.

AlexAlves87 and others added 4 commits April 8, 2026 19:19
New command in the shared capability layer:

- screen.record: fixed-duration capture; blocks until done and returns
  the video as base64 MP4.

Args: durationMs (def. 5000), fps (def. 10), screenIndex/monitor (def. 0).
The monitor→screenIndex alias keeps consistency with screen.capture.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
WinRT-based implementation backing screen.record:

- D3D11 + Direct3D11CaptureFramePool for GPU-backed frame acquisition
- Software BGRA→NV12 conversion (BT.601 limited range) before encoding
- MediaTranscoder pipeline with hardware acceleration and SW fallback
- No external dependencies: pure P/Invoke (d3d11.dll, combase.dll)

Records the full monitor only. Per-window capture is not yet implemented.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NodeService instantiates ScreenRecordingService and subscribes OnScreenRecord
to ScreenCapability's RecordRequested event.

Tests cover the full surface of screen.record: missing handler error, correct
arg forwarding, defaults (durationMs=5000, fps=10, screenIndex=0), the
monitor→screenIndex alias, and exception handling in the handler.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two new commands for session-based recording:

- screen.record.start: opens a recording session and returns a recordingId
- screen.record.stop: closes the session and returns the video

ActiveSession manages the capture loop with a CancellationToken and stores
frames safely under a lock. A ConcurrentDictionary keyed by recordingId
allows concurrent sessions.

9 new tests cover: start/stop without a handler, args and monitor alias,
recordingId in the start response, full stop payload, and exception paths.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

AlexAlves87 and others added 2 commits April 9, 2026 16:46
- Fix InvalidCastException in CreateForMonitor: pass IID_IInspectable
  instead of typeof(GraphicsCaptureItem).GUID, which returns a C#/WinRT-
  generated GUID unrecognized by the native COM method (E_NOINTERFACE).
- Replace PrepareStreamTranscodeAsync with PrepareMediaStreamSourceTranscodeAsync
  + MediaStreamSource feeding NV12 samples on demand, fixing "Transcode
  failed: Unknown" on all three screen recording commands.
- Add 500 MB frame-buffer cap (MaxFrameBufferBytes) with early stop and
  warning log to prevent OOM on long or high-fps recordings.
- Save encoded MP4 to %TEMP%\openclaw\ and return filePath in the response.
- Change ScreenRecordResult.Fps from float to int.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AlexAlves87
Copy link
Copy Markdown
Contributor Author

Addressing the Repo Assist observations — all three issues are resolved in commit f4dbc52:

🚨 BuildRawVideoStream .Wait() — Fixed. BuildRawVideoStream no longer exists. The entire encoding pipeline was replaced with MediaStreamSource + PrepareMediaStreamSourceTranscodeAsync, which also fixed the root cause of the Transcode failed: Unknown error that was blocking screen recording entirely.

⚠️ Memory footprint — Fixed. Added MaxFrameBufferBytes = 500 MB constant with an early-stop guard in both capture loops (RecordAsync and ActiveSession.RunAsync). When the cap is hit, capture stops gracefully and a warning is logged. Limits documented in XML docs on ScreenRecordArgs and ScreenRecordStartArgs.

🧹 Fps float → int — Fixed. ScreenRecordResult.Fps is now int.

🔴 Pre-existing culture test failures — Resolved by syncing with upstream. Commit 6933239 (fix: use invariant culture in numeric display formatting, merged in #158) fixes those 8 failures. CI on this PR is now fully green.

@github-actions

This comment has been minimized.

@shanselman
Copy link
Copy Markdown
Collaborator

shanselman commented Apr 13, 2026

Hi @AlexAlves87Copilot here replying on Scott's behalf.

Thanks for the work on this. The feature looks useful and the CI result is encouraging, but on a deeper engineering pass we found a few things we'd like tightened up before we can merge it.

  1. Handle the no-monitor / bad-index case explicitly so this returns a controlled error instead of blowing up on monitors[screenIndex].
private static GraphicsCaptureItem CreateCaptureItem(int screenIndex)
{
    var monitors = GetMonitorHandles();
    if (monitors.Count == 0)
        throw new InvalidOperationException("No screens available");

    if (screenIndex < 0 || screenIndex >= monitors.Count)
        throw new ArgumentOutOfRangeException(nameof(screenIndex), "Invalid monitor index");

    ...
}
  1. Own and dispose the D3D device together with the session/pool. Right now the device lifetime looks too loose for long-running capture sessions.
private readonly IDirect3DDevice _device;

public ActiveSession(..., IDirect3DDevice device, ...)
{
    _device = device;
}

public void Dispose()
{
    _cts.Cancel();
    try { _captureTask.GetAwaiter().GetResult(); } catch { }
    try { _session.Dispose(); } catch { }
    try { _pool.Dispose(); } catch { }
    try { _device.Dispose(); } catch { }
}
  1. Please stop/clear active recording sessions on node disconnect, so an abandoned screen.record.start session can't keep holding memory/resources after the connection goes away.

  2. This one feels especially important: dispose the per-frame SoftwareBitmap immediately after extracting bytes. At the current scale this can retain a lot of native memory during recordings.

using var bmp = await SoftwareBitmap.CreateCopyFromSurfaceAsync(frame.Surface);
frames.Add(ExtractBitmapBytes(bmp));

and similarly in the encoder:

using var dw = new DataWriter();
dw.WriteBytes(nv12);
var sample = MediaStreamSample.CreateFromBuffer(dw.DetachBuffer(), ts);

One other thing we'd like cleaned up: temp MP4 retention. Right now each recording writes a file under %TEMP%\openclaw\openclaw-screen-record-*.mp4 with no obvious cleanup policy. Even if we keep returning ilePath, we'd want those files expired/cleaned aggressively or made opt-in for privacy/hygiene.

private static void CleanupOldTempRecordings()
{
    var dir = Path.Combine(Path.GetTempPath(), "openclaw");
    if (!Directory.Exists(dir)) return;

    foreach (var file in Directory.EnumerateFiles(dir, "openclaw-screen-record-*.mp4"))
    {
        var info = new FileInfo(file);
        if (info.CreationTimeUtc < DateTime.UtcNow.AddHours(-24))
        {
            try { File.Delete(file); } catch (IOException) { }
            catch (UnauthorizedAccessException) { }
        }
    }
}

If you can tidy up those areas — especially #4 — we'd be happy to take another look. Thanks again for pushing this forward.

AlexAlves87 and others added 5 commits April 15, 2026 13:31
…id monitor

Silent fallback to index 0 masked the no-monitor case (IndexOutOfRange) and
gave callers no indication that their screen index was wrong. Now throws
InvalidOperationException (no screens) or ArgumentOutOfRangeException (bad index).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nally

D3D device was created but never disposed: in StartAsync the reference was
dropped after pool creation; in RecordAsync it was outside the finally block.
ActiveSession now owns the device and disposes it alongside pool and session.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
An abandoned screen.record.start session kept holding D3D device, frame pool
and capture thread after disconnect. DisconnectAsync now calls StopAllSessions()
so resources are released on every disconnect, not just on full shutdown.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SoftwareBitmap wraps native memory that the GC won't release until the
finalizer runs, causing memory to pile up during long recordings. DataWriter
has the same issue per-frame in the encoder. Adding 'using' ensures prompt
release after each frame is processed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Without a cleanup policy, screen recordings accumulated indefinitely in
%TEMP%\openclaw\. Files contain screen content (potentially sensitive).
CleanupOldTempRecordings() now runs before each SaveToTempFile call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@AlexAlves87
Copy link
Copy Markdown
Contributor Author

Hi @shanselman / @copilot — all five points are already addressed in the commits on this branch:

  1. No-monitor / bad-index (5267bb6) — CreateCaptureItem now throws InvalidOperationException if no monitors found and ArgumentOutOfRangeException for an out-of-range index.
  2. D3D device lifetime (9008cb6) — IDirect3DDevice is now owned by ActiveSession and disposed in both RecordAsync's finally and ActiveSession.Dispose().
  3. Stop sessions on disconnect (e272e9f) — NodeService.OnDisconnected now calls StopAllSessions(), so no session survives a node drop.
  4. SoftwareBitmap disposal (8b69827) — each frame bitmap is using-wrapped; DataWriter is also disposed immediately after writing.
  5. Temp MP4 cleanup (d962f4a) — CleanupOldTempRecordings() runs on every new recording and deletes files older than 24h.

Let me know if you'd like anything squashed before merge.

@github-actions
Copy link
Copy Markdown
Contributor

🤖 This is an automated response from Repo Assist.

@AlexAlves87 — all five of shanselman's points are cleanly addressed in the latest commits:

  1. No-monitor / bad-index (5267bb6) — InvalidOperationException + ArgumentOutOfRangeException guards in place.
  2. D3D device lifetime (9008cb6) — device owned and disposed by ActiveSession (both finally and Dispose()).
  3. Stop sessions on disconnect (e272e9f) — NodeService.OnDisconnected now calls StopAllSessions().
  4. SoftwareBitmap disposal (8b69827) — each frame is using-wrapped; DataWriter disposed immediately after write.
  5. Temp MP4 cleanup (d962f4a) — CleanupOldTempRecordings() deletes files older than 24 h on every new recording.

The PR is in excellent shape. I'd encourage the maintainer to take another look — all review concerns are resolved and CI is clean.

Generated by 🌈 Repo Assist, see workflow run. Learn more.

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/repo-assist.md@97143ac59cb3a13ef2a77581f929f06719c7402a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants