WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@richiemcilroy
Copy link
Member

@richiemcilroy richiemcilroy commented Dec 17, 2025

Significant improvements to hardware and software video decoding, particularly by dynamically detecting hardware decoder capabilities on Windows and optimizing software decoding threading based on video resolution.

Summary by CodeRabbit

  • New Features

    • Parallel SIMD-enabled video color conversions with progress reporting and cancellation.
    • Hardware decoder capability detection and caching with automatic fallback to software decoding.
    • Dynamic GPU texture sizing with zero-copy fallback on constrained devices.
  • Improvements

    • Texture pooling and on-demand allocation for more efficient memory use.
    • Adaptive software-threading tuned by frame size.
    • Minor error-message formatting and lint cleanups.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 17, 2025

Walkthrough

Added SIMD-parallel YUV→RGBA conversion with progress/cancellation and rayon-based parallelism; refactored GPU texture allocation to be dynamic and GPU-limit aware with zero-copy fallback on Windows; added hardware-decoder capability detection and software-threading configuration for FFmpeg/Media Foundation.

Changes

Cohort / File(s) Summary
Parallelization dependencies
crates/rendering/Cargo.toml, crates/video-decode/Cargo.toml
Added rayon = "1.10" and num_cpus = "1.16" dependencies for parallel processing and CPU detection.
SIMD-accelerated YUV conversion
crates/rendering/src/cpu_yuv.rs
Added ConversionProgress (atomic progress/cancel), SimdLevel enum and detection, SIMD-dispatched NV12/YUV420p→RGBA paths (AVX2/SSE2/Scalar), parallel/sequential drivers, per-row helpers, CPU-parallel processing via rayon, architecture fallbacks, and tests for detection, parity, large-frame parallelism, and cancellation.
Dynamic GPU texture management
crates/rendering/src/yuv_converter.rs
Increased effective GPU limits, added alignment/validation, dynamic texture creation (Y/U/V/UV/output), ensure_texture_size() and allocated_* state, GPU-limit-aware initialization, and Windows zero-copy attempt with staged-copy fallback and zero_copy state handling.
Hardware decoder capabilities
crates/video-decode/src/ffmpeg.rs
Introduced HwDecoderCapabilities and global cache, replaced static size checks with capability checks, added configure_software_threading() and integrated capability-aware hardware/software selection and logging.
Media Foundation capabilities and texture pooling
crates/video-decode/src/media_foundation.rs
Added MFDecoderCapabilities, static cache, TexturePool for reusable output/YUV textures, refactored sample/read flow to use TexturePool, extended MFDecodedFrame to include optional Y/UV textures/handles, and exposed capabilities accessor.
Minor formatting/lint adjustments
crates/frame-converter/src/d3d11.rs, crates/recording/src/output_pipeline/win.rs, crates/rendering/src/decoder/mod.rs
Adjusted error-format strings and format placeholders; added clippy allow attributes for redundant-closure lint on platform-specific backings.
Public API re-exports
crates/video-decode/src/lib.rs
Re-exported MFDecoderCapabilities and get_mf_decoder_capabilities.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Conv as YuvToRgbaConverter
    participant Tex as TextureAllocation
    participant GPU as GPU/WGPU
    participant D3D as D3D11 (Windows)
    User->>Conv: convert_nv12_with_fallback(...)
    Conv->>Conv: validate_dimensions(gpu_max_texture_size)
    alt Resize needed
        Conv->>Tex: ensure_texture_size(new_w,new_h)
        Tex->>GPU: create/reallocate textures
        GPU-->>Tex: textures ready
    end
    Note over Conv,D3D: Zero-copy attempt (Windows)
    Conv->>D3D: acquire shared handle
    D3D-->>Conv: handle or failure
    alt Zero-copy succeeds
        Conv-->>User: return texture view
    else Zero-copy fails
        Conv->>Conv: set zero_copy_failed
        Conv->>GPU: staged copy of NV12 planes
        GPU-->>Conv: staged texture ready
        Conv-->>User: return texture view
    end
Loading
sequenceDiagram
    participant Scheduler as rayon Thread Pool
    participant Progress as ConversionProgress
    participant SIMD as SIMD Dispatch (AVX2/SSE2/Scalar)
    participant Output as Output Buffer
    Scheduler->>Scheduler: split rows, par_iter
    par Threads
        Scheduler->>SIMD: detect SimdLevel (cached)
        SIMD-->>Scheduler: chosen variant
        Scheduler->>SIMD: convert row(s)
        SIMD-->>Output: write RGBA pixels
        Scheduler->>Progress: fetch_add rows_completed
        Scheduler->>Progress: check cancelled -> break if set
    end
    Scheduler-->>Output: final buffer
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Pay special attention to:
    • crates/rendering/src/cpu_yuv.rs: SIMD intrinsics correctness, memory/layout assumptions, thread-safety of ConversionProgress, rayon boundary conditions, and tests.
    • crates/rendering/src/yuv_converter.rs: GPU limit calculation, ensure_texture_size() correctness, zero-copy vs staged-copy coherence, texture view lifetime.
    • crates/video-decode/src/media_foundation.rs and ffmpeg.rs: capability detection caching (OnceLock) and thread-safety, TexturePool reuse semantics and error handling.

Possibly related PRs

Suggested reviewers

  • Brendonovich

Poem

🐰 In threads we hop and SIMD carrots gleam,
Pixels flip and buffers hum like cream,
When zero-copy trips, fallback saves the day,
Rabbity converters bound and sprint away! 🥕✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Windows video playback optimisations' accurately describes the main focus of the PR, which introduces hardware decoder capability detection and software threading optimization for Windows video decoding across multiple crates.
Docstring Coverage ✅ Passed Docstring coverage is 82.22% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch windows-playback-optimisation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/rendering/src/yuv_converter.rs (1)

1136-1173: Potential mismatch when downscaling is triggered.

The validate_dimensions call returns effective_width/effective_height and _downscaled flag, but the actual conversion still uses the original width and height for:

  • RGBA buffer allocation (line 1141)
  • SIMD conversion (lines 1143-1151)
  • Texture write (lines 1153-1171)

If the video exceeds GPU limits and downscaling is indicated, the textures will be sized for the downscaled resolution while the data is at original resolution, which could cause issues.

This also applies to convert_yuv420p_cpu below.

Consider either:

  1. Using effective_width/effective_height consistently when downscaling
  2. Or rejecting the conversion with an error if downscaling would be required for CPU paths
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2528bde and 8521adf.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • crates/rendering/Cargo.toml (1 hunks)
  • crates/rendering/src/cpu_yuv.rs (9 hunks)
  • crates/rendering/src/yuv_converter.rs (17 hunks)
  • crates/video-decode/Cargo.toml (1 hunks)
  • crates/video-decode/src/ffmpeg.rs (2 hunks)
  • crates/video-decode/src/lib.rs (1 hunks)
  • crates/video-decode/src/media_foundation.rs (8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.rs: Never use dbg!() macro; use proper logging (tracing::debug!, etc.) instead
Never write let _ = async_fn() which silently drops futures; await or explicitly handle them
Use duration.saturating_sub(other) instead of duration - other to avoid panics on underflow
Merge nested if statements: write 'if a && b { }' instead of 'if a { if b { } }'
Don't call .clone() on Copy types (integers, bools, etc.); copy them directly
Use function references directly: iter.map(foo) instead of iter.map(|x| foo(x))
Accept &[T] or &str instead of &Vec or &String in function parameters for flexibility
Use .is_empty() instead of .len() == 0 or .len() > 0 / .len() != 0
Don't assign () to a variable: write foo(); instead of let _ = foo(); or let x = foo(); when return is unit
Use .unwrap_or(val) instead of .unwrap_or_else(|| val) when the default is a simple/cheap value
Use 'for item in &collection' or 'for (i, item) in collection.iter().enumerate()' instead of 'for i in 0..collection.len()'
Use value.clamp(min, max) instead of manual if chains or .min(max).max(min) patterns
Always handle Result/Option or types marked #[must_use]; never ignore them

**/*.rs: Use rustfmt and workspace clippy lints for Rust code formatting and linting
Use snake_case for Rust module names and kebab-case for crate names
Never use dbg!() macro in Rust code; use proper logging instead (Clippy: dbg_macro = deny)
Always handle Result/Option or types marked #[must_use]; never ignore them (Rust compiler lint: unused_must_use = deny)
Never write let _ = async_fn() which silently drops futures; await or explicitly handle them (Clippy: let_underscore_future = deny)
Use saturating_sub instead of - for Duration to avoid panics (Clippy: unchecked_duration_subtraction = deny)
Merge nested if statements: use if a && b { } instead of if a { if b { } } (Clippy: collapsible_if = deny)
Don't call .clone() on Copy types; just copy them directly (Clippy: clone_on_copy = deny)
U...

Files:

  • crates/video-decode/src/lib.rs
  • crates/video-decode/src/ffmpeg.rs
  • crates/video-decode/src/media_foundation.rs
  • crates/rendering/src/cpu_yuv.rs
  • crates/rendering/src/yuv_converter.rs
**/*.{ts,tsx,js,jsx,rs}

📄 CodeRabbit inference engine (AGENTS.md)

Never add comments to code (//, /* */, ///, //!, #, etc.); code must be self-explanatory through naming, types, and structure

Files:

  • crates/video-decode/src/lib.rs
  • crates/video-decode/src/ffmpeg.rs
  • crates/video-decode/src/media_foundation.rs
  • crates/rendering/src/cpu_yuv.rs
  • crates/rendering/src/yuv_converter.rs
🧠 Learnings (2)
📚 Learning: 2025-10-28T08:39:42.230Z
Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.

Applied to files:

  • crates/video-decode/src/lib.rs
  • crates/video-decode/src/ffmpeg.rs
📚 Learning: 2025-12-07T14:29:40.743Z
Learnt from: CR
Repo: CapSoftware/Cap PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-07T14:29:40.743Z
Learning: Applies to **/*.rs : Use `rustfmt` and workspace clippy lints for Rust code formatting and linting

Applied to files:

  • crates/rendering/Cargo.toml
🧬 Code graph analysis (1)
crates/video-decode/src/lib.rs (1)
crates/video-decode/src/media_foundation.rs (1)
  • get_mf_decoder_capabilities (138-140)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Clippy (aarch64-apple-darwin, macos-latest)
  • GitHub Check: Clippy (x86_64-pc-windows-msvc, windows-latest)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (20)
crates/video-decode/Cargo.toml (1)

12-12: Appropriate dependency choice for CPU-aware threading configuration.

The num_cpus crate version 1.16.0 is a stable choice for dynamically detecting available CPUs to configure software decoder threading, which aligns well with the PR's objective to optimize threading based on available hardware. A newer version 1.17.0 is available, but the caret versioning (^1.16) allows automatic updates within the 1.x series as needed.

crates/rendering/Cargo.toml (1)

21-21: Excellent choice for parallelization. Rayon is a data-parallelism library for Rust that is extremely lightweight and guarantees data-race freedom, making it the standard for this use case. Version 1.10 is current and appropriate.

crates/video-decode/src/lib.rs (1)

10-14: LGTM!

Clean extension of the public API to expose hardware decoder capability querying. The re-export structure follows the existing pattern.

crates/video-decode/src/ffmpeg.rs (4)

13-28: LGTM!

Well-structured capability struct with sensible defaults for fallback scenarios.


32-134: LGTM!

Solid D3D11 capability detection with proper fallback handling. The tiered resolution testing (8K → 4K) correctly identifies hardware limits.


145-172: LGTM!

Good resolution-based threading configuration. The thresholds align well with common video resolutions (1080p, 4K).


213-246: LGTM!

Good integration of dynamic hardware capability checking. The fallback paths are well-structured, ensuring software threading is configured appropriately in all scenarios.

crates/video-decode/src/media_foundation.rs (4)

34-140: LGTM!

Well-structured capability detection with descending resolution tests. The Option return from get_mf_decoder_capabilities appropriately signals when initialization hasn't occurred yet.


160-294: LGTM!

Efficient texture pooling implementation that correctly handles dimension changes and uses appropriate formats for NV12 planes.


350-388: LGTM!

Clean initialization with one-time capability detection. The warning for oversized videos is appropriate without being blocking.


555-624: LGTM!

The texture pool integration and plane extraction logic are correct. The D3D11_BOX regions properly extract Y (full res) and UV (half res) planes from the NV12 texture.

Note: y_handle and uv_handle are always None in this path. If shared handle support is needed for zero-copy in consumers, this would need to be extended.

crates/rendering/src/cpu_yuv.rs (4)

1-33: LGTM!

Clean progress tracking with appropriate use of relaxed atomics for UI feedback purposes.


431-475: AVX2/SSE2 functions are placeholders calling scalar code.

The nv12_convert_row_avx2 and nv12_convert_row_sse2 functions both just delegate to the scalar implementation. The SIMD detection logic is in place, but actual SIMD intrinsics aren't implemented yet.

This means the parallelization via rayon provides the optimization, but not SIMD vectorization within each row.

Is this intentional as a first step, with SIMD intrinsics to be added later? If so, consider adding a TODO or tracking issue.


282-335: LGTM!

Good band-based parallelization with rayon. The MIN_ROWS_PER_THREAD threshold prevents excessive overhead for small images, and per-row progress updates provide responsive UI feedback.


1038-1224: LGTM!

Comprehensive test coverage for the new features: SIMD detection, progress tracking, parallel conversion, and cancellation behavior.

crates/rendering/src/yuv_converter.rs (5)

85-134: LGTM!

Solid dimension validation with GPU-awareness. The even dimension enforcement (& !1) is essential for YUV formats, and the aspect-ratio-preserving downscale is well-implemented.


347-470: LGTM!

Well-factored texture creation helpers with correct formats for each YUV plane type. Double-buffered output is good for pipeline efficiency.


472-510: LGTM!

Good grow-only texture reallocation strategy that avoids thrashing. The aligned dimensions ensure GPU-friendly allocation sizes.


943-1049: LGTM!

Clean implementation of the zero-copy D3D11 shared handle path with proper error handling for both texture imports.


1051-1122: LGTM!

Smart fallback strategy with sticky failure flag to avoid repeated zero-copy attempts. The reset_zero_copy_state method allows recovery when conditions change.

@richiemcilroy richiemcilroy merged commit d1a6718 into main Dec 17, 2025
16 of 17 checks passed
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (5)
crates/rendering/src/decoder/mod.rs (2)

213-216: Refactor to avoid suppressing the redundant_closure lint.

The closure can be eliminated by using .as_deref() followed by a function reference, which aligns with the coding guidelines that specify this lint is set to deny.

As per coding guidelines, use function references directly instead of closures.

Apply this diff:

-    #[allow(clippy::redundant_closure)]
     pub fn iosurface_backing(&self) -> Option<&cv::ImageBuf> {
-        self.iosurface_backing.as_ref().map(|b| b.inner())
+        self.iosurface_backing.as_deref().map(SendableImageBuf::inner)
     }

278-281: Refactor to avoid suppressing the redundant_closure lint.

The closure can be eliminated by using .as_deref() followed by a function reference, which aligns with the coding guidelines that specify this lint is set to deny.

As per coding guidelines, use function references directly instead of closures.

Apply this diff:

-    #[allow(clippy::redundant_closure)]
     pub fn d3d11_texture_backing(&self) -> Option<&ID3D11Texture2D> {
-        self.d3d11_texture_backing.as_ref().map(|b| b.inner())
+        self.d3d11_texture_backing.as_deref().map(SendableD3D11Texture::inner)
     }
crates/video-decode/src/media_foundation.rs (2)

57-134: Consider documenting that max_width/max_height represent the maximum resolution supported by any codec.

The current implementation sets max_width and max_height to the highest resolution supported by either H264 or HEVC decoders. This means if H264 supports 8K but HEVC only supports 4K, max_width will be 8K. Callers checking dimensions against these limits should be aware that the limits are codec-agnostic and may not reflect per-codec capabilities.


220-291: Validate that dimensions are even for NV12 texture creation.

Lines 256-257 create UV textures with half dimensions (width / 2, height / 2). If the input dimensions are odd, this will truncate, potentially causing misalignment issues with NV12 format which expects even dimensions.

Consider adding a validation check:

 fn get_or_create_yuv_textures(
     &mut self,
     device: &ID3D11Device,
     width: u32,
     height: u32,
 ) -> Result<(&ID3D11Texture2D, &ID3D11Texture2D), String> {
+    if width % 2 != 0 || height % 2 != 0 {
+        return Err(format!("NV12 requires even dimensions, got {}x{}", width, height));
+    }
+
     use windows::Win32::Graphics::Dxgi::Common::{
crates/rendering/src/yuv_converter.rs (1)

622-638: Consider more idiomatic Option handling.

The early return pattern works, but could be more concise:

-    if self.iosurface_cache.is_none() {
-        return Err(IOSurfaceTextureError::NoMetalDevice.into());
-    }
-
     let io_surface = image_buf
         .io_surf()
         .ok_or(IOSurfaceTextureError::NoIOSurface)?;

     let width = image_buf.width() as u32;
     let height = image_buf.height() as u32;

     let (effective_width, effective_height, _downscaled) =
         validate_dimensions(width, height, self.gpu_max_texture_size)?;
     self.ensure_texture_size(device, effective_width, effective_height);
     self.swap_output_buffer();

-    let cache = self.iosurface_cache.as_ref().unwrap();
+    let cache = self.iosurface_cache.as_ref()
+        .ok_or(IOSurfaceTextureError::NoMetalDevice)?;
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8521adf and ed8f831.

📒 Files selected for processing (7)
  • crates/frame-converter/src/d3d11.rs (11 hunks)
  • crates/recording/src/output_pipeline/win.rs (2 hunks)
  • crates/rendering/src/cpu_yuv.rs (9 hunks)
  • crates/rendering/src/decoder/mod.rs (2 hunks)
  • crates/rendering/src/yuv_converter.rs (18 hunks)
  • crates/video-decode/src/ffmpeg.rs (2 hunks)
  • crates/video-decode/src/media_foundation.rs (8 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.rs: Never use dbg!() macro; use proper logging (tracing::debug!, etc.) instead
Never write let _ = async_fn() which silently drops futures; await or explicitly handle them
Use duration.saturating_sub(other) instead of duration - other to avoid panics on underflow
Merge nested if statements: write 'if a && b { }' instead of 'if a { if b { } }'
Don't call .clone() on Copy types (integers, bools, etc.); copy them directly
Use function references directly: iter.map(foo) instead of iter.map(|x| foo(x))
Accept &[T] or &str instead of &Vec or &String in function parameters for flexibility
Use .is_empty() instead of .len() == 0 or .len() > 0 / .len() != 0
Don't assign () to a variable: write foo(); instead of let _ = foo(); or let x = foo(); when return is unit
Use .unwrap_or(val) instead of .unwrap_or_else(|| val) when the default is a simple/cheap value
Use 'for item in &collection' or 'for (i, item) in collection.iter().enumerate()' instead of 'for i in 0..collection.len()'
Use value.clamp(min, max) instead of manual if chains or .min(max).max(min) patterns
Always handle Result/Option or types marked #[must_use]; never ignore them

**/*.rs: Use rustfmt and workspace clippy lints for Rust code formatting and linting
Use snake_case for Rust module names and kebab-case for crate names
Never use dbg!() macro in Rust code; use proper logging instead (Clippy: dbg_macro = deny)
Always handle Result/Option or types marked #[must_use]; never ignore them (Rust compiler lint: unused_must_use = deny)
Never write let _ = async_fn() which silently drops futures; await or explicitly handle them (Clippy: let_underscore_future = deny)
Use saturating_sub instead of - for Duration to avoid panics (Clippy: unchecked_duration_subtraction = deny)
Merge nested if statements: use if a && b { } instead of if a { if b { } } (Clippy: collapsible_if = deny)
Don't call .clone() on Copy types; just copy them directly (Clippy: clone_on_copy = deny)
U...

Files:

  • crates/recording/src/output_pipeline/win.rs
  • crates/video-decode/src/ffmpeg.rs
  • crates/rendering/src/decoder/mod.rs
  • crates/video-decode/src/media_foundation.rs
  • crates/rendering/src/yuv_converter.rs
  • crates/frame-converter/src/d3d11.rs
  • crates/rendering/src/cpu_yuv.rs
**/*.{ts,tsx,js,jsx,rs}

📄 CodeRabbit inference engine (AGENTS.md)

Never add comments to code (//, /* */, ///, //!, #, etc.); code must be self-explanatory through naming, types, and structure

Files:

  • crates/recording/src/output_pipeline/win.rs
  • crates/video-decode/src/ffmpeg.rs
  • crates/rendering/src/decoder/mod.rs
  • crates/video-decode/src/media_foundation.rs
  • crates/rendering/src/yuv_converter.rs
  • crates/frame-converter/src/d3d11.rs
  • crates/rendering/src/cpu_yuv.rs
🧠 Learnings (2)
📚 Learning: 2025-10-28T08:39:42.230Z
Learnt from: Brendonovich
Repo: CapSoftware/Cap PR: 1305
File: crates/recording/src/output_pipeline/macos.rs:80-90
Timestamp: 2025-10-28T08:39:42.230Z
Learning: In `crates/recording/src/output_pipeline/macos.rs`, the `AVFoundationMp4Muxer` intentionally holds the `Mutex<MP4Encoder>` lock during retry attempts in `send_video_frame()` and `send_audio_frame()`. This blocking behavior is correct because frame processing must happen sequentially to prevent audio and video frames from being interleaved incorrectly in the encoder.

Applied to files:

  • crates/recording/src/output_pipeline/win.rs
  • crates/video-decode/src/ffmpeg.rs
📚 Learning: 2025-12-07T14:29:40.743Z
Learnt from: CR
Repo: CapSoftware/Cap PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-07T14:29:40.743Z
Learning: Applies to **/*.rs : Use function references directly instead of redundant closures: `iter.map(foo)` instead of `iter.map(|x| foo(x))` (Clippy: `redundant_closure` = deny)

Applied to files:

  • crates/rendering/src/decoder/mod.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Clippy (x86_64-pc-windows-msvc, windows-latest)
  • GitHub Check: Clippy (aarch64-apple-darwin, macos-latest)
  • GitHub Check: Analyze (javascript-typescript)
  • GitHub Check: Analyze (rust)
🔇 Additional comments (27)
crates/recording/src/output_pipeline/win.rs (1)

215-215: LGTM! Improved error message clarity.

The formatting change from positional to named parameters ({poisoned}) makes the error messages more explicit and easier to understand during debugging.

Also applies to: 543-543

crates/frame-converter/src/d3d11.rs (1)

116-116: LGTM! Consistent error formatting improvements.

The switch from positional placeholders ({:?}) to named parameters ({e:?}) across all error sites improves code clarity and follows Rust formatting conventions.

Also applies to: 120-120, 124-124, 168-168, 195-195, 200-200, 226-228, 236-236, 349-349, 381-381, 407-407, 431-431, 455-455, 527-527

crates/video-decode/src/ffmpeg.rs (5)

10-30: LGTM!

The HwDecoderCapabilities struct and caching mechanism are well-designed. Using OnceLock ensures thread-safe lazy initialization of hardware capabilities, and the optimistic default values (8192x8192) are safe since the decoder logic validates dimensions against actual capabilities before attempting hardware decode.


32-133: LGTM - Hardware capability detection is robust.

The Windows D3D11 video decoder capability query is well-structured with proper error handling. The logic correctly:

  • Tests for 8K (8192x8192) support first and updates max dimensions to 8192 if successful
  • Falls back to testing 4K (4096x4096) support if 8K fails
  • Retains the baseline 4096x4096 if neither check succeeds
  • Returns safe defaults on any query failure

The use of unsafe Windows APIs is necessary for D3D11 device creation and has appropriate error handling.


135-142: LGTM!

The non-Windows stub and public accessor are appropriate. The get_or_init pattern ensures the capability query runs once and the result is cached for subsequent calls.


144-171: LGTM - Software threading configuration is well-designed.

The function appropriately:

  • Uses u64 for pixel count calculation to prevent overflow
  • Applies resolution-based heuristics: auto-threading for 4K+, half CPU cores (min 2) for Full HD, and 2 threads for smaller resolutions
  • Uses frame-level threading (FF_THREAD_FRAME) which is appropriate for video decoding
  • Has a null pointer check before accessing the FFmpeg codec context
  • Logs the configuration for observability

212-245: LGTM - Hardware/software decode logic is correct.

The integration properly handles all decode paths:

  1. Hardware requested, within limits → Uses hardware acceleration (line 228-232)
  2. Hardware requested, exceeds limits → Falls back to software with threading (line 223)
  3. Hardware requested, initialization fails → Falls back to software with threading (line 236)
  4. No hardware requested → Configures software threading (line 244)

The logic correctly avoids duplicate software threading configuration by checking both hw_device.is_none() and hw_device_type.is_none() on line 243. While the pure software decode path (when hw_device_type is None) is not exercised by the current codebase, the implementation handles it correctly.

crates/rendering/src/yuv_converter.rs (11)

85-92: LGTM! Sensible texture dimension constants.

The increased GPU texture limits (8K) and initial allocation at 1080p with 64-byte padding alignment are appropriate for modern hardware.


97-134: LGTM! Robust dimension validation with GPU-aware downscaling.

The validation logic correctly:

  • Respects both hardware and configured limits
  • Scales down proportionally when needed
  • Rejects extreme downscaling (< 0.1)
  • Ensures even dimensions for YUV compatibility
  • Provides clear logging

152-154: LGTM! Appropriate state tracking fields.

The new fields correctly track:

  • Current texture allocation dimensions
  • GPU capability limits
  • Zero-copy failure state (Windows)

Also applies to: 163-164


169-174: LGTM! Proper GPU capability detection.

The initialization correctly queries GPU limits and logs the configuration.


305-345: LGTM! Sensible initial texture allocation.

Starting at 1080p avoids over-allocation while being sufficient for common use cases. The texture will grow dynamically as needed.


347-470: LGTM! Well-structured texture creation helpers.

The helper functions correctly:

  • Create appropriately sized textures for each plane
  • Handle Y, UV, U, V planes with correct dimensions (half-size for chroma)
  • Create double-buffered output textures
  • Return both texture and view for convenience

512-514: LGTM! Clean accessor method.


540-542: LGTM! Consistent dimension validation pattern.

All conversion entry points correctly validate dimensions against GPU limits before processing. The _downscaled flag is currently unused but may be useful for future metrics or warnings.


961-1049: LGTM! Well-implemented zero-copy path with proper error handling.

The implementation correctly:

  • Validates dimensions before processing
  • Attempts zero-copy import for both Y and UV planes
  • Provides detailed logging for success and failure cases
  • Returns appropriate errors with context

1051-1112: LGTM! Smart fallback strategy with persistent failure tracking.

The implementation intelligently:

  • Attempts zero-copy when available and not previously failed
  • Permanently disables zero-copy after first failure to avoid repeated overhead
  • Provides clear trace logging for path selection
  • Falls back gracefully to staging copy

The persistent failure tracking (zero_copy_failed) is a good optimization to avoid repeated failed attempts on incompatible hardware.


1114-1122: LGTM! Useful zero-copy state management API.

The accessors allow callers to:

  • Query current zero-copy status
  • Reset state if hardware configuration changes
crates/rendering/src/cpu_yuv.rs (9)

1-33: LGTM! Well-designed progress tracking with atomic state.

The implementation correctly:

  • Uses atomic operations for thread-safe progress tracking
  • Handles edge case (zero total_rows) in progress_fraction
  • Uses Relaxed ordering appropriately for non-critical progress updates
  • Provides cancellation support

129-160: LGTM! Clean SIMD capability detection.

The implementation correctly:

  • Detects AVX2/SSE2 support on x86 architectures
  • Falls back to scalar on non-x86
  • Provides useful metadata via pixels_per_iteration

162-165: LGTM! Sensible parallelism thresholds.

Using 1080p as the cutoff for parallel processing and 16 rows per thread are appropriate values that balance overhead with parallelism benefits.


177-255: LGTM! Robust SIMD conversion with proper validation.

The implementation correctly:

  • Validates buffer sizes with overflow protection (saturating arithmetic)
  • Detects optimal SIMD level
  • Chooses parallel/sequential based on frame size
  • Falls back to scalar conversion if validation fails

257-285: LGTM! Sequential conversion with cancellation support.

Properly checks cancellation before each row and updates progress atomically after processing.


287-342: LGTM! Efficient parallel conversion using rayon.

The implementation correctly:

  • Divides work into bands based on thread count
  • Uses par_chunks_mut for parallel processing
  • Checks cancellation within each band
  • Guards against out-of-bounds with global_row >= height check
  • Updates progress atomically

344-413: LGTM! Clean SIMD dispatch pattern.

The dispatch correctly routes to AVX2/SSE2/Scalar implementations based on detected capabilities. Unsafe blocks are properly contained.


537-621: LGTM! Consistent pattern with NV12 conversion.

The YUV420P implementation follows the same robust pattern as NV12 with proper validation and parallel/sequential selection.

Note: Same SIMD placeholder issue applies (see comment on lines 415-495).


623-711: LGTM! Consistent parallel and sequential conversion implementations.

The YUV420P conversion correctly mirrors the NV12 pattern with proper cancellation and progress tracking.

Comment on lines +415 to 495
#[cfg(any(target_arch = "x86_64", target_arch = "x86"))]
#[target_feature(enable = "avx2")]
unsafe fn nv12_convert_row_avx2(
y_data: &[u8],
uv_data: &[u8],
width: usize,
y_row_start: usize,
uv_row_start: usize,
out_row_start: usize,
output: &mut [u8],
) {
unsafe {
nv12_convert_row_sse2(
y_data,
uv_data,
width,
y_row_start,
uv_row_start,
out_row_start,
output,
);
}
}

let r = clamp_u8((298 * c + 409 * e + 128) >> 8);
let g = clamp_u8((298 * c - 100 * d - 208 * e + 128) >> 8);
let b = clamp_u8((298 * c + 516 * d + 128) >> 8);
#[cfg(any(target_arch = "x86_64", target_arch = "x86"))]
#[target_feature(enable = "sse2")]
unsafe fn nv12_convert_row_sse2(
y_data: &[u8],
uv_data: &[u8],
width: usize,
y_row_start: usize,
uv_row_start: usize,
out_row_start: usize,
output: &mut [u8],
) {
nv12_convert_row_scalar(
y_data,
uv_data,
width,
y_row_start,
uv_row_start,
out_row_start,
output,
);
}

let out_idx = out_row_start + col * 4;
if out_idx + 3 < output.len() {
output[out_idx] = r;
output[out_idx + 1] = g;
output[out_idx + 2] = b;
output[out_idx + 3] = 255;
}
}
#[cfg(any(target_arch = "x86_64", target_arch = "x86"))]
fn nv12_convert_row_scalar(
y_data: &[u8],
uv_data: &[u8],
width: usize,
y_row_start: usize,
uv_row_start: usize,
out_row_start: usize,
output: &mut [u8],
) {
for col in 0..width {
let y_idx = y_row_start + col;
let uv_idx = uv_row_start + (col / 2) * 2;

let y = y_data.get(y_idx).copied().unwrap_or(0) as i32;
let u = uv_data.get(uv_idx).copied().unwrap_or(128) as i32;
let v = uv_data.get(uv_idx + 1).copied().unwrap_or(128) as i32;

let c = y - 16;
let d = u - 128;
let e = v - 128;

let r = clamp_u8((298 * c + 409 * e + 128) >> 8);
let g = clamp_u8((298 * c - 100 * d - 208 * e + 128) >> 8);
let b = clamp_u8((298 * c + 516 * d + 128) >> 8);

let out_idx = out_row_start + col * 4;
if out_idx + 3 < output.len() {
output[out_idx] = r;
output[out_idx + 1] = g;
output[out_idx + 2] = b;
output[out_idx + 3] = 255;
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

SIMD functions are placeholder implementations calling scalar code.

The AVX2 and SSE2 functions currently just call the scalar implementation, meaning no actual SIMD acceleration is occurring. The dispatch infrastructure is in place, but the vectorized implementations are missing.

This should be documented as known limitation, or actual SIMD implementations should be provided. If this is intentional (staged rollout), consider:

  1. Adding a comment explaining that SIMD implementations are placeholders
  2. Updating the PR description to clarify the current state
  3. Or implementing the actual SIMD intrinsics for performance

Based on the PR objectives mentioning "SIMD-parallel YUV→RGBA conversion", users may expect actual SIMD acceleration in this release.

Comment on lines +510 to +523
#[cfg(not(any(target_arch = "x86_64", target_arch = "x86")))]
#[allow(clippy::too_many_arguments)]
pub fn nv12_to_rgba_simd_with_progress(
y_data: &[u8],
uv_data: &[u8],
width: u32,
height: u32,
y_stride: u32,
uv_stride: u32,
output: &mut [u8],
_progress: Option<Arc<ConversionProgress>>,
) {
nv12_to_rgba(y_data, uv_data, width, height, y_stride, uv_stride, output);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Progress tracking not supported on non-x86 architectures.

The non-x86 fallback ignores the progress parameter, meaning cancellation and progress monitoring won't work on ARM or other architectures. Consider documenting this limitation if it's intentional.

If progress support is desired on all platforms, implement it for the scalar path:

 #[cfg(not(any(target_arch = "x86_64", target_arch = "x86")))]
 #[allow(clippy::too_many_arguments)]
 pub fn nv12_to_rgba_simd_with_progress(
     y_data: &[u8],
     uv_data: &[u8],
     width: u32,
     height: u32,
     y_stride: u32,
     uv_stride: u32,
     output: &mut [u8],
-    _progress: Option<Arc<ConversionProgress>>,
+    progress: Option<Arc<ConversionProgress>>,
 ) {
-    nv12_to_rgba(y_data, uv_data, width, height, y_stride, uv_stride, output);
+    let height_usize = height as usize;
+    for row in 0..height_usize {
+        if let Some(ref p) = progress && p.is_cancelled() {
+            return;
+        }
+        // Process row (inline the conversion logic or refactor nv12_to_rgba)
+        if let Some(ref p) = progress {
+            p.rows_completed.fetch_add(1, Ordering::Relaxed);
+        }
+    }
 }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
crates/rendering/src/cpu_yuv.rs lines 510-523: the non-x86 fallback
nv12_to_rgba_simd_with_progress currently ignores the progress parameter, so
progress reporting/cancellation doesn't work on ARM; either document this
limitation or implement progress support in the scalar path by removing the
unused underscore from the parameter, propagate the
Option<Arc<ConversionProgress>> into the scalar conversion routine, and update
the scalar nv12_to_rgba implementation to periodically report progress and check
for cancellation (e.g., compute rows/blocks processed, call progress.report(...)
and abort/return early on cancellation). Ensure the function signature stays
compatible and add a short doc comment if you choose to leave it unsupported.

Comment on lines +1158 to +1187
#[test]
fn test_cancellation() {
let progress = Arc::new(ConversionProgress::new(1080));

let width = 1920u32;
let height = 1080u32;
let y_stride = 1920u32;
let uv_stride = 1920u32;

let y_data: Vec<u8> = vec![128; (y_stride * height) as usize];
let uv_data: Vec<u8> = vec![128; (uv_stride * height / 2) as usize];

let mut output = vec![0u8; (width * height * 4) as usize];

progress.cancel();

nv12_to_rgba_simd_with_progress(
&y_data,
&uv_data,
width,
height,
y_stride,
uv_stride,
&mut output,
Some(progress.clone()),
);

let rows_done = progress.rows_completed.load(Ordering::Relaxed);
assert!(rows_done < height as usize);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cancellation test may be non-deterministic.

The test cancels before starting conversion (line 1172) and expects rows_done < height (line 1186). While this should work since cancellation is checked at the start of each row, there's a theoretical race condition if the check happens after row processing begins.

Consider making the test more robust:

     #[test]
     fn test_cancellation() {
         let progress = Arc::new(ConversionProgress::new(1080));
 
         let width = 1920u32;
         let height = 1080u32;
         let y_stride = 1920u32;
         let uv_stride = 1920u32;
 
         let y_data: Vec<u8> = vec![128; (y_stride * height) as usize];
         let uv_data: Vec<u8> = vec![128; (uv_stride * height / 2) as usize];
 
         let mut output = vec![0u8; (width * height * 4) as usize];
 
         progress.cancel();
 
         nv12_to_rgba_simd_with_progress(
             &y_data,
             &uv_data,
             width,
             height,
             y_stride,
             uv_stride,
             &mut output,
             Some(progress.clone()),
         );
 
         let rows_done = progress.rows_completed.load(Ordering::Relaxed);
-        assert!(rows_done < height as usize);
+        assert!(
+            rows_done < height as usize,
+            "Expected early termination due to cancellation, but {} rows completed",
+            rows_done
+        );
     }

Or test cancellation mid-stream by canceling from another thread after a delay.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#[test]
fn test_cancellation() {
let progress = Arc::new(ConversionProgress::new(1080));
let width = 1920u32;
let height = 1080u32;
let y_stride = 1920u32;
let uv_stride = 1920u32;
let y_data: Vec<u8> = vec![128; (y_stride * height) as usize];
let uv_data: Vec<u8> = vec![128; (uv_stride * height / 2) as usize];
let mut output = vec![0u8; (width * height * 4) as usize];
progress.cancel();
nv12_to_rgba_simd_with_progress(
&y_data,
&uv_data,
width,
height,
y_stride,
uv_stride,
&mut output,
Some(progress.clone()),
);
let rows_done = progress.rows_completed.load(Ordering::Relaxed);
assert!(rows_done < height as usize);
}
#[test]
fn test_cancellation() {
let progress = Arc::new(ConversionProgress::new(1080));
let width = 1920u32;
let height = 1080u32;
let y_stride = 1920u32;
let uv_stride = 1920u32;
let y_data: Vec<u8> = vec![128; (y_stride * height) as usize];
let uv_data: Vec<u8> = vec![128; (uv_stride * height / 2) as usize];
let mut output = vec![0u8; (width * height * 4) as usize];
progress.cancel();
nv12_to_rgba_simd_with_progress(
&y_data,
&uv_data,
width,
height,
y_stride,
uv_stride,
&mut output,
Some(progress.clone()),
);
let rows_done = progress.rows_completed.load(Ordering::Relaxed);
assert!(
rows_done < height as usize,
"Expected early termination due to cancellation, but {} rows completed",
rows_done
);
}

Comment on lines +93 to +95
fn align_dimension(dim: u32) -> u32 {
dim.div_ceil(TEXTURE_SIZE_PADDING) * TEXTURE_SIZE_PADDING
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the file to understand align_dimension and its usage
cat -n crates/rendering/src/yuv_converter.rs | head -120

Repository: CapSoftware/Cap

Length of output: 4337


🏁 Script executed:

#!/bin/bash
# Search for GPU-related limit constants and validations
rg -i "gpu.*max|max.*texture|texture.*size.*limit" --type=rs -C2

Repository: CapSoftware/Cap

Length of output: 85


🏁 Script executed:

#!/bin/bash
# Continue reading the file to see more of validate_dimensions and how align_dimension is used
cat -n crates/rendering/src/yuv_converter.rs | sed -n '97,150p'

Repository: CapSoftware/Cap

Length of output: 2262


🏁 Script executed:

#!/bin/bash
# Search for calls to align_dimension to see where it's used
rg "align_dimension" crates/rendering/src/yuv_converter.rs -C3

Repository: CapSoftware/Cap

Length of output: 524


🏁 Script executed:

#!/bin/bash
# Search for where ensure_texture_size is called to confirm validation happens
rg "ensure_texture_size|validate_dimensions" crates/rendering/src/yuv_converter.rs -C3

Repository: CapSoftware/Cap

Length of output: 3252


🏁 Script executed:

#!/bin/bash
# Read more of the file to see the full context of ensure_texture_size and how it's used
cat -n crates/rendering/src/yuv_converter.rs | sed -n '150,250p'

Repository: CapSoftware/Cap

Length of output: 5229


🏁 Script executed:

#!/bin/bash
# Check the full ensure_texture_size implementation to see if it re-validates after alignment
cat -n crates/rendering/src/yuv_converter.rs | sed -n '260,310p'

Repository: CapSoftware/Cap

Length of output: 2701


🏁 Script executed:

#!/bin/bash
# Find where allocated_width and allocated_height are set to understand the bounds
rg "allocated_width\s*=" crates/rendering/src/yuv_converter.rs -C2

Repository: CapSoftware/Cap

Length of output: 240


🏁 Script executed:

#!/bin/bash
# Find the complete ensure_texture_size function
rg -A30 "fn ensure_texture_size" crates/rendering/src/yuv_converter.rs

Repository: CapSoftware/Cap

Length of output: 1438


🏁 Script executed:

#!/bin/bash
# Check what the limits are for texture creation and if alignment could violate them
rg "MAX_TEXTURE" crates/rendering/src/yuv_converter.rs -C1

Repository: CapSoftware/Cap

Length of output: 343


🏁 Script executed:

#!/bin/bash
# Check if texture creation functions handle dimension checks or clamping
rg -A5 "create_y_texture|create_uv_texture|create_u_texture|create_v_texture|create_output_textures" crates/rendering/src/yuv_converter.rs | head -40

Repository: CapSoftware/Cap

Length of output: 1557


🏁 Script executed:

#!/bin/bash
# Verify the math: what is 4320 aligned?
python3 << 'EOF'
max_height = 4320
padding = 64
aligned = ((max_height + padding - 1) // padding) * padding
print(f"MAX_TEXTURE_HEIGHT: {max_height}")
print(f"TEXTURE_SIZE_PADDING: {padding}")
print(f"Aligned height: {aligned}")
print(f"Exceeds original? {aligned > max_height}")
print()
print("Calculation breakdown:")
print(f"  {max_height} / {padding} = {max_height / padding}")
print(f"  ceil({max_height / padding}) = {-(-max_height // padding)}")
print(f"  {-(-max_height // padding)} * {padding} = {-(-max_height // padding) * padding}")
EOF

Repository: CapSoftware/Cap

Length of output: 145


🏁 Script executed:

#!/bin/bash
# Get the full create_y_texture to see if there's any validation
rg -A15 "fn create_y_texture" crates/rendering/src/yuv_converter.rs

Repository: CapSoftware/Cap

Length of output: 617


🏁 Script executed:

#!/bin/bash
# Check if create_texture validates the dimensions against GPU limits
rg -B5 -A10 "device.create_texture" crates/rendering/src/yuv_converter.rs | head -50

Repository: CapSoftware/Cap

Length of output: 1790


Alignment causes validated dimensions to exceed GPU limits.

The validation ensures dimensions don't exceed effective_max_height (4320 in this case), but align_dimension(4320) returns 4352, which exceeds the limit. Since alignment happens after validation in ensure_texture_size, dimensions that pass validation can become invalid. Re-validate or clamp dimensions to GPU limits after alignment before creating textures.

🤖 Prompt for AI Agents
In crates/rendering/src/yuv_converter.rs around lines 93 to 95, align_dimension
currently rounds up (e.g., 4320 -> 4352) after validation which can push
dimensions past GPU limits; change ensure_texture_size to perform alignment
first and then re-validate or clamp the aligned dimensions against
effective_max_width/effective_max_height (or return an error) so the final
dimensions used to create textures never exceed GPU limits; apply this for both
width and height and ensure any error messages mention the aligned size when
rejecting or clamping.

Comment on lines +472 to +510
fn ensure_texture_size(&mut self, device: &wgpu::Device, width: u32, height: u32) {
let required_width = align_dimension(width);
let required_height = align_dimension(height);

if required_width <= self.allocated_width && required_height <= self.allocated_height {
return;
}

let new_width = required_width.max(self.allocated_width);
let new_height = required_height.max(self.allocated_height);

tracing::info!(
old_width = self.allocated_width,
old_height = self.allocated_height,
new_width = new_width,
new_height = new_height,
"Reallocating YUV converter textures for larger video"
);

let (y_texture, y_view) = Self::create_y_texture(device, new_width, new_height);
let (uv_texture, uv_view) = Self::create_uv_texture(device, new_width, new_height);
let (u_texture, u_view) = Self::create_u_texture(device, new_width, new_height);
let (v_texture, v_view) = Self::create_v_texture(device, new_width, new_height);
let (output_textures, output_views) =
Self::create_output_textures(device, new_width, new_height);

self.y_texture = y_texture;
self.y_view = y_view;
self.uv_texture = uv_texture;
self.uv_view = uv_view;
self.u_texture = u_texture;
self.u_view = u_view;
self.v_texture = v_texture;
self.v_view = v_view;
self.output_textures = output_textures;
self.output_views = output_views;
self.allocated_width = new_width;
self.allocated_height = new_height;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify aligned dimensions don't exceed GPU limits.

The grow-only strategy is efficient, but consider this edge case:

  1. Caller validates dimensions → gets (effective_width, effective_height) within GPU limits
  2. ensure_texture_size aligns these with align_dimension → potentially exceeds limits

For example, if effective_width = gpu_max - 10 and TEXTURE_SIZE_PADDING = 64, the aligned dimension would exceed the GPU limit.

Apply this approach to validate post-alignment:

 fn ensure_texture_size(&mut self, device: &wgpu::Device, width: u32, height: u32) {
     let required_width = align_dimension(width);
     let required_height = align_dimension(height);
+    
+    let effective_max = MAX_TEXTURE_WIDTH.min(self.gpu_max_texture_size);
+    let required_width = required_width.min(effective_max);
+    let required_height = required_height.min(effective_max);

     if required_width <= self.allocated_width && required_height <= self.allocated_height {

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/rendering/src/yuv_converter.rs around lines 472 to 510, after
computing required_width/required_height with align_dimension, validate those
post-alignment dimensions against the device GPU limits
(device.limits().max_texture_dimension_2d); if either aligned dimension exceeds
the limit, do not attempt to allocate the larger textures — log an error/tracing
with the attempted and max values and return early (or clamp to the max and log
if you prefer a best-effort allocation), otherwise proceed with the existing
reallocation logic; ensure allocated_width/allocated_height are only updated
when allocation actually succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants