GPU Issues
Troubleshooting CUDA, cuDNN, ONNX Runtime, and GPU memory problems.
GPU issues are the most common category of problems reported by Recaster users. This page covers diagnosis and solutions for NVIDIA GPU acceleration, driver compatibility, and platform-specific GPU limitations.
GPU Not Detected
If Recaster does not detect your GPU and falls back to CPU processing:
- Update NVIDIA drivers — Download the latest drivers from nvidia.com/drivers. Driver version 535 or later is recommended.
- Check VRAM — Ensure your GPU has at least 2 GB VRAM for face swapping and 4 GB for training.
- Verify NVIDIA GPU — Recaster only supports NVIDIA GPUs for CUDA acceleration. AMD and Intel GPUs are not supported (CPU fallback is used).
- Restart after driver update — A full system restart is required after installing new GPU drivers.
Quick Verification
nvidia-smi in your terminal to verify your GPU is recognized by the NVIDIA driver. If this command fails, the driver is not installed correctly.CUDA Errors
CUDA_ERROR_INVALID_PTX
This error indicates a GPU/CUDA version mismatch. The compiled PTX code is not compatible with your GPU architecture.
- RTX 5000 series: This is expected. RTX 5000 GPUs require CUDA 12.8+ with compute capability 12.0, but current TensorFlow wheels only support up to compute capability 9.0. See the RTX 5000 Workarounds section below.
- Other GPUs: Update your NVIDIA driver to the latest version. This usually resolves the PTX mismatch for supported GPU architectures.
CUDA Out of Memory
If you see CUDA out-of-memory errors during processing:
- Close other GPU-intensive applications (games, other AI tools, video editors).
- Reduce tile size in upscaling settings from 512 to 256 or 128.
- Use smaller model variants (InSwapper 128-FP16 instead of Ghost 3).
- Lower the GPU memory limit in advanced settings.
- Process lower-resolution input files.
cuFuncGetName Errors
This error typically appears on remote instances and indicates an ONNX Runtime or cuDNN version mismatch. The fix is to reinstall the correct package versions:
pip install --force-reinstall onnxruntime-gpu==1.19.2
pip install --force-reinstall "numpy>=1.23.0,<2.0.0"
pip install nvidia-cudnn-cu12==9.1.0.70cuDNN Installation
cuDNN (CUDA Deep Neural Network library) is required for GPU acceleration on remote instances. If you see "libcudnn.so.9 not found" errors:
- Install cuDNN 9 via pip:
pip install nvidia-cudnn-cu12==9.1.0.70- Find the cuDNN library path:
python -c "import nvidia.cudnn; print(nvidia.cudnn.__path__[0] + '/lib')"- Add the path to
LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$(python -c "import nvidia.cudnn; print(nvidia.cudnn.__path__[0] + '/lib')"):$LD_LIBRARY_PATH- Verify the CUDA provider is available:
python -c "import onnxruntime as ort; print(ort.get_available_providers())"You should see CUDAExecutionProvider in the output. If only CPUExecutionProvider appears, the cuDNN setup is not correct.
Automatic Fallback
ONNX Runtime Compatibility
Critical Version Pinning
If face processing hangs, shows 0% GPU utilization, or produces garbage output, you likely have an incompatible ONNX Runtime version. Force-reinstall the correct versions:
pip install --force-reinstall onnxruntime-gpu==1.19.2
pip install --force-reinstall "numpy>=1.23.0,<2.0.0"
pip uninstall -y opencv-python-headless
pip install --no-deps "opencv-python>=4.5.0,<4.11.0"NumPy Version Mismatch
NumPy 2.0+ breaks ONNX Runtime 1.19.2. If you see import errors or unexpected behavior, downgrade NumPy:
pip install --force-reinstall "numpy>=1.23.0,<2.0.0"OpenCV Headless Conflict
Some packages install opencv-python-headless which conflicts with opencv-python. Recaster requires the non-headless version:
pip uninstall -y opencv-python-headless
pip install --no-deps "opencv-python>=4.5.0,<4.11.0"RTX 5000 Series Workarounds
Blackwell Architecture Limitations
Available workarounds for RTX 5000 series users:
1. Remote Training (Recommended)
Use Studio tier cloud GPUs via Vast.ai. Works immediately with no local GPU constraints. Cloud instances use RTX 3090/4090/A100 GPUs that have full CUDA support.
2. WSL2 on Windows
Use Windows Subsystem for Linux 2 with a community fork of DeepFaceLab that supports Blackwell architecture. This runs natively on your GPU through WSL2's GPU passthrough.
3. ONNX Runtime Operations
Quick Recast (face swapping/enhancement) and video upscaling use ONNX Runtime instead of TensorFlow. These operations may work on RTX 5000 GPUs since ONNX Runtime updates CUDA support more frequently.
SwinIR CoreML Issues (macOS)
On macOS, SwinIR models cannot use CoreML acceleration due to Apple's framework not supporting SwinIR's dynamic input shapes. You may see errors like:
- "CoreML does not support shapes with dimension values of 0"
- "runtime shape has zero elements"
This is a limitation of Apple's CoreML framework, not a Recaster bug. SwinIR automatically falls back to CPU processing on macOS, which is slower (2-5 FPS vs 10-20 FPS with CoreML).
macOS Recommendation
GPU Memory Management
If you are running low on GPU memory, try these optimizations:
- Reduce tile size: In upscaling settings, lower the tile size from 512 to 256 or 128. Smaller tiles use less VRAM per processing step.
- Lower batch size: If processing multiple faces, smaller batch sizes reduce peak VRAM usage.
- Close other GPU applications: Games, video editors, other AI tools, and even some web browsers use GPU memory.
- Set GPU memory limit: In advanced upscaling settings, set a GPU memory limit (1/2/4/6/8 GB) to prevent ONNX Runtime from consuming all available VRAM.
- Use FP16 models: Half-precision models (e.g. InSwapper 128-FP16) use approximately half the VRAM of full-precision equivalents.
Monitoring GPU Utilization
To monitor GPU usage during processing:
# Watch GPU usage in real-time (updates every 1 second)
nvidia-smi -l 1
# Or use watch for a cleaner display
watch -n 1 nvidia-smiIf GPU utilization shows 0% during processing, the GPU provider is not being used. Check the ONNX Runtime provider configuration and cuDNN installation as described above.
Was this page helpful?