Would like to optimize this for scale #147

pinballelectronica · 2023-10-18T12:51:03Z

pinballelectronica
Oct 18, 2023

Great program, thanks. I have been looking for a program to learn to optimize using CUDA/GPU as I am painfully weak in coding for GPU. I've hacked together bash scripts to make this run pretty fast - e.g. I9-12900K @100% I get about 750FPS running parallel 4 times. That pretty much pegs the CPU. I haven't messed with OpenCV compiled with CUDA (yet) as I'm having compile problems.

Current average across 4 threads running simultaneously (using parallel):

[DVR-Scan] DVR-Scan 1.6
[DVR-Scan] Initializing scan context...
[DVR-Scan] Opened video REC_220557_100_00.mp4 (1280 x 720 at 29.750 FPS).
[DVR-Scan] Limiting detection to 1 region.
[DVR-Scan] Using subtractor MOG2 with kernel_size = 3 (auto)
[DVR-Scan] Scanning input video for motion events...
[DVR-Scan] Processed 11961 frames read in 14.5 secs (avg 822.5 FPS).
[DVR-Scan] No motion events detected in input.
Detected: 0 | Progress: 11961 frames [00:14, 822.81 frames/s]

It's not scalable (enough) when I have a million plus videos so still trying to tweak it more without just adding more compute, threads. Plus this assumes the scene hasn't changed. In videos where the camera is moving then I either scan the entire frame or manually select regions again. Huge speed up narrowing down regions so it's worth the work although it's laborious- WSL2 doesn't work with your tool as far as opening up the UI to select regions (nothing opens). Or at least my version doesn't using Ubuntu 22.04.

Looks like the only route using OpenCV is cv2.UMat() with CUDA enabled to move the tensors. Using anything other than OpenCV for objection detection is probably overkill (like using a Transformers zero-shot objection detection model).

Any other ideas? I'd really like to see benchmarks from people with it already compiled though if anyone has some- Especially high performant GPU's like the 4090 and multi GPU. If it's only like marginally faster then it's probably not worth working on it.

Thanks

Breakthrough · 2023-10-19T01:19:32Z

Breakthrough
Oct 19, 2023
Maintainer

By running parallel, do you mean spawning multiple instances of DVR-Scan? Regarding benchmarks, I have a 3090 and got roughly a 2x speedup for some videos, less for others... GPU utilization is currently quite low. The potential for higher performance is there somewhere. However, there's a lot of moving parts for a pure Python application that make it difficult.

That being said I also welcome any optimizations folks might find with the current implementation. Ideally video decoding could also be offloaded to the GPU to further improve performance (less data to send to the GPU to begin with), and do everything there. That's probably not going to be possible without rewriting the core in C++ or Rust though.

Thanks for the tip about WSL2, I haven't actually thought to run DVR-Scan under it yet (I've been testing on a VM). Curious why that might be since I thought X stuff was supported now.

1 reply

pinballelectronica Oct 19, 2023
Author

I agree pure python nah. Why waste all that good work already done.

Yeah the program Parallel for Linux. It handles the thread scheduling- it's a great tool for ghetto scaling out apps that don't do multithreading. I need to better understand Opencv for sure before I do anything lol. I was arguing with Codellama about using Transformers over OpenCV and it was arguing that it's overkill haha.

Do you know the absolute lowest resolution this program will function well at? I know for other fairly similar ML use cases I can get away with 512x512

Speaking of multithreading!

If your program supports ffmpeg for decoding then we should try using -c:v h264_nvenc for input and offload the decoding to the GPU with CUDA- I ask about the quality because I've heard the hw encoder can produce inferior results. I suspect it's not inferior enough to mess up quality object detection though. Worst case you can pipe it to nvenc (proper). and ffmpeg supports -threads and also ffmpeg supports i,b,p frame slicing which I presume would use CUDA! I'll try it out.

(that got me thinking, would it make sense to have a pipeline that temp converts the video to all i frames before doing object detection?)

dvr-scan works great under wsl2- For anyone having nightmares trying to install opencv from source w/ CUDA in WSL2, do NOT use a Windows drive to install it, it will not work. (e.g. /mnt)- use /home

You need to make from source with CUDA or you can download binaries here- https://www.gyan.dev/ffmpeg/builds/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Would like to optimize this for scale #147

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Would like to optimize this for scale #147

Uh oh!

pinballelectronica Oct 18, 2023

Replies: 1 comment · 1 reply

Uh oh!

Breakthrough Oct 19, 2023 Maintainer

Uh oh!

Uh oh!

pinballelectronica Oct 19, 2023 Author

pinballelectronica
Oct 18, 2023

Replies: 1 comment 1 reply

Breakthrough
Oct 19, 2023
Maintainer

pinballelectronica Oct 19, 2023
Author