Would like to optimize this for scale #147
Replies: 1 comment 1 reply
-
|
By running parallel, do you mean spawning multiple instances of DVR-Scan? Regarding benchmarks, I have a 3090 and got roughly a 2x speedup for some videos, less for others... GPU utilization is currently quite low. The potential for higher performance is there somewhere. However, there's a lot of moving parts for a pure Python application that make it difficult. That being said I also welcome any optimizations folks might find with the current implementation. Ideally video decoding could also be offloaded to the GPU to further improve performance (less data to send to the GPU to begin with), and do everything there. That's probably not going to be possible without rewriting the core in C++ or Rust though. Thanks for the tip about WSL2, I haven't actually thought to run DVR-Scan under it yet (I've been testing on a VM). Curious why that might be since I thought X stuff was supported now. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Great program, thanks. I have been looking for a program to learn to optimize using CUDA/GPU as I am painfully weak in coding for GPU. I've hacked together bash scripts to make this run pretty fast - e.g. I9-12900K @100% I get about 750FPS running parallel 4 times. That pretty much pegs the CPU. I haven't messed with OpenCV compiled with CUDA (yet) as I'm having compile problems.
Current average across 4 threads running simultaneously (using parallel):
[DVR-Scan] DVR-Scan 1.6
[DVR-Scan] Initializing scan context...
[DVR-Scan] Opened video REC_220557_100_00.mp4 (1280 x 720 at 29.750 FPS).
[DVR-Scan] Limiting detection to 1 region.
[DVR-Scan] Using subtractor MOG2 with kernel_size = 3 (auto)
[DVR-Scan] Scanning input video for motion events...
[DVR-Scan] Processed 11961 frames read in 14.5 secs (avg 822.5 FPS).
[DVR-Scan] No motion events detected in input.
Detected: 0 | Progress: 11961 frames [00:14, 822.81 frames/s]
It's not scalable (enough) when I have a million plus videos so still trying to tweak it more without just adding more compute, threads. Plus this assumes the scene hasn't changed. In videos where the camera is moving then I either scan the entire frame or manually select regions again. Huge speed up narrowing down regions so it's worth the work although it's laborious- WSL2 doesn't work with your tool as far as opening up the UI to select regions (nothing opens). Or at least my version doesn't using Ubuntu 22.04.
Looks like the only route using OpenCV is cv2.UMat() with CUDA enabled to move the tensors. Using anything other than OpenCV for objection detection is probably overkill (like using a Transformers zero-shot objection detection model).
Any other ideas? I'd really like to see benchmarks from people with it already compiled though if anyone has some- Especially high performant GPU's like the 4090 and multi GPU. If it's only like marginally faster then it's probably not worth working on it.
Thanks
Beta Was this translation helpful? Give feedback.
All reactions