Upscaling Models

Upscaling Models

Recaster includes two families of AI upscaling models: Real-ESRGAN (GAN-based) and SwinIR (Transformer-based). Each offers different quality and speed trade-offs.

Model Comparison

The table below lists every upscaling model available in Recaster, along with its tier, file size, quality rating, and a brief description. Models are automatically downloaded from Hugging Face the first time you use them.

NameTierVariantsQualityDescription
Real-ESRGAN 2xFree2x9.2 / 10Fast GAN-based upscaling. Good balance of speed and quality for quick previews and general use.
Real-ESRGAN 4x+Free4x9.2 / 10Enhanced GAN upscaling with the Real-ESRGAN-x4plus architecture. Strong detail recovery at 4x scale.
SwinIR 2xFree2x9.7 / 10Transformer-based architecture delivering the highest quality output. Best choice for professional work.
SwinIR 4xFree4x9.5 / 10Direct 4x upscaling with SwinIR transformer. Excellent quality with a single pass.
Real-ESRGAN 8xStudio8x9.0 / 10Maximum resolution upscaling for Studio tier. Produces extremely large output from low-resolution sources.

Real-ESRGAN

Real-ESRGAN is a Generative Adversarial Network (GAN) designed for practical image restoration and super-resolution. It excels at handling real-world degradation such as compression artifacts, blur, and noise. The ONNX models used in Recaster are sourced from the qualcomm/Real-ESRGAN-x4plus repository on Hugging Face.

Key Characteristics

  • GAN architecture produces sharp, visually appealing results
  • 67 MB model file for both 2x and 4x variants
  • Supports tile-based processing for large frames
  • CoreML acceleration on macOS for fast local processing
  • 8x variant (70 MB) available exclusively on the Studio tier

SwinIR

SwinIR uses a Swin Transformer architecture that captures long-range dependencies in images, producing the highest-quality upscaling results available in Recaster. The quality advantage is most noticeable on complex textures, fine hair detail, and skin tones. Models are sourced from the lixinze/swinir repository on Hugging Face.

Key Characteristics

  • Transformer architecture with highest quality rating (9.7/10 at 2x)
  • 67 MB model file for both 2x and 4x variants
  • Best choice for professional output where quality matters most
  • Supports tile-based processing for large frames
  • Requires Hugging Face authentication token for download

Hugging Face Authentication

SwinIR models are hosted in a gated repository that requires Hugging Face authentication. You must add a Hugging Face access token to your Recaster settings before downloading SwinIR models.

  1. Visit huggingface.co/settings/tokens and create a new access token (or copy an existing one).
  2. Open your Recaster settings file. On macOS this is located at ~/Library/Application Support/Recaster/settings.json.
  3. Add a hf_token field with your token value:
{
  "hf_token": "hf_YOUR_TOKEN_HERE",
  ...
}

Restart Recaster after adding the token. SwinIR models will download automatically when you first select them.

Real-ESRGAN does not require authentication

Real-ESRGAN models are hosted in a public repository and download without any authentication. If you only plan to use Real-ESRGAN, you do not need a Hugging Face token.

Model Cache Location

Downloaded models are cached locally so they only need to be downloaded once. The cache location depends on your operating system:

PlatformCache Path
macOS~/Library/Application Support/Recaster/models/
Windows%APPDATA%\Recaster\models\
Linux~/.config/Recaster/models/

macOS Compatibility

On macOS, ONNX Runtime uses Apple's CoreML framework for hardware acceleration. Real-ESRGAN works well with CoreML and processes at approximately 10 to 20 FPS on Apple Silicon. However, SwinIR has a known compatibility issue with CoreML that causes it to fall back to CPU-only processing on macOS.

SwinIR on macOS

Apple's CoreML framework does not support the dynamic input shapes used by SwinIR models. When running on macOS, SwinIR automatically falls back to CPU processing, which is significantly slower (approximately 2 to 5 FPS compared to 10 to 20 FPS with Real-ESRGAN on CoreML). For the best local macOS experience, use Real-ESRGAN. Alternatively, use remote upscaling on a Linux GPU instance where SwinIR runs at full speed with CUDA.

Choosing a Model

Use the guidelines below to pick the right model for your project:

Choose Real-ESRGAN when...

  • Speed is more important than maximum quality
  • You are processing on macOS locally
  • You want quick previews before final output
  • Your source has heavy compression artifacts
  • You need 8x upscaling (Studio tier)

Choose SwinIR when...

  • Quality is the top priority
  • You have a dedicated NVIDIA GPU or use remote upscaling
  • You are working on professional or final-delivery output
  • Fine detail such as hair and skin texture matters
  • You are upscaling high-quality source material

Recommended for most users

Start with the Balanced quality preset, which uses SwinIR 2x with multi-pass upscaling for an excellent balance of quality and speed. If you are on macOS without a dedicated GPU, switch to the Fast preset which uses Real-ESRGAN 2x.

GPU Requirements

Both model families use ONNX Runtime for inference. On systems with an NVIDIA GPU, the CUDA Execution Provider is used automatically. On macOS, CoreML provides hardware acceleration for compatible models. If no GPU is available, processing falls back to CPU.

GPU memory usage is approximately 2 GB for 4x upscaling with the default tile size of 256 pixels. You can adjust the tile size and GPU memory limit in the pipeline settings to fit your hardware.

Model Sources

All models are hosted on Hugging Face and verified for compatibility with Recaster: