Ripbot264 profile2/7/2024 GPU cores (shaders, stream processors, EUs, or whatever you want to call them) are slower than CPU cores, but typically there are many times more GPU cores available. The GPU needs to be able to complete these tasks faster than they would be completed in the CPU, so that other tasks that depend on the result of the first task don't have to wait for the first task to complete. Traditional "GPU Computing" involves offloading tasks to the GPU. We even make the framework that some semiconductor companies use as their OpenCL (or similar heterogeneous computing) developer's API. GPU computing (heterogeneous computing) was the core competency that MulticoreWare was founded on. Yes, and we wrote the GPU acceleration for x264. You can of course do any pre-filtering on the GPU, if required, but thats really outside of the actual encoding process anyway. So overall, what benwaggoner said - modern codecs are too complex for that. X264 got some OpenCL features, but from what I remember they barely help speed at all, and a more complex codec like HEVC will likely benefit even less. FPGA seems like the most viable alternative to CPU. I don't think GPUs are going to be viable in the foreseeable future. MPEG-2 on GPU was pretty trivial, H.264 could fall short in qualty at lower bitrates, and HEVC on GPU simply hasn't ever demonstrated high qualiity with high compression efficiency. The latest Intel CPUs are really incredible devices for encoding HEVC.If anything, codecs are becoming less well suited for GPU versus CPU as they become more complex. One very fast core with very low memory latency can do a lot more than a single GPU "core." And while GPUs are good at processing a whole lot of bytes at once, AVX2 and AVX-512 offer similar capabilities in CPUs. And lots of choices impact other choices in other parts of the same frame. Just the round-trip latency from main memory to GPU and back again slows things down way too much. Encoding, particulary in advanced codecs like HEVC, have a whole lot of tight feedback loops where having stuff in the L1 cache is great. unless the random memory reads would kill the performance).Ībsolutely preproccessing filters are largely linear stuff (output of one filter is input to the next), and are very well suited for GPU. I'm wondering if it would be too hard to reserve 2-3 GB of video card memory and fill it up with video frames and then do colorspace conversions, motion estimations and stuff like that on multiple frames simultaneously (ex instead of splitting one frame over hundreds of "mini-cores", preload the card with 100 frames and have a few cores working on each frame simultaneously.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |