Upscaling Models

Recaster includes two families of AI upscaling models: Real-ESRGAN (GAN-based) and SwinIR (Transformer-based). Each offers different quality and speed trade-offs.

Model Comparison

The table below lists every upscaling model available in Recaster, along with its tier, file size, quality rating, and a brief description. Models are automatically downloaded from Hugging Face the first time you use them.

Name	Tier	Variants	Quality	Description
Real-ESRGAN 2x	Free	2x	9.2 / 10	Fast GAN-based upscaling. Good balance of speed and quality for quick previews and general use.
Real-ESRGAN 4x+	Free	4x	9.2 / 10	Enhanced GAN upscaling with the Real-ESRGAN-x4plus architecture. Strong detail recovery at 4x scale.
SwinIR 2x	Free	2x	9.7 / 10	Transformer-based architecture delivering the highest quality output. Best choice for professional work.
SwinIR 4x	Free	4x	9.5 / 10	Direct 4x upscaling with SwinIR transformer. Excellent quality with a single pass.
Real-ESRGAN 8x	Studio	8x	9.0 / 10	Maximum resolution upscaling for Studio tier. Produces extremely large output from low-resolution sources.

Real-ESRGAN

Real-ESRGAN is a Generative Adversarial Network (GAN) designed for practical image restoration and super-resolution. It excels at handling real-world degradation such as compression artifacts, blur, and noise. The ONNX models used in Recaster are sourced from the qualcomm/Real-ESRGAN-x4plus repository on Hugging Face.

Key Characteristics

GAN architecture produces sharp, visually appealing results
67 MB model file for both 2x and 4x variants
Supports tile-based processing for large frames
CoreML acceleration on macOS for fast local processing
8x variant (70 MB) available exclusively on the Studio tier

SwinIR

SwinIR uses a Swin Transformer architecture that captures long-range dependencies in images, producing the highest-quality upscaling results available in Recaster. The quality advantage is most noticeable on complex textures, fine hair detail, and skin tones. Models are sourced from the lixinze/swinir repository on Hugging Face.

Key Characteristics

Transformer architecture with highest quality rating (9.7/10 at 2x)
67 MB model file for both 2x and 4x variants
Best choice for professional output where quality matters most
Supports tile-based processing for large frames
Requires Hugging Face authentication token for download

Hugging Face Authentication

SwinIR models are hosted in a gated repository that requires Hugging Face authentication. You must add a Hugging Face access token to your Recaster settings before downloading SwinIR models.

Visit huggingface.co/settings/tokens and create a new access token (or copy an existing one).
Open your Recaster settings file. On macOS this is located at ~/Library/Application Support/Recaster/settings.json.
Add a hf_token field with your token value:

{
  "hf_token": "hf_YOUR_TOKEN_HERE",
  ...
}

Restart Recaster after adding the token. SwinIR models will download automatically when you first select them.

Real-ESRGAN does not require authentication

Real-ESRGAN models are hosted in a public repository and download without any authentication. If you only plan to use Real-ESRGAN, you do not need a Hugging Face token.

Model Cache Location

Downloaded models are cached locally so they only need to be downloaded once. The cache location depends on your operating system:

Platform	Cache Path
macOS	`~/Library/Application Support/Recaster/models/`
Windows	`%APPDATA%\Recaster\models\`
Linux	`~/.config/Recaster/models/`

macOS Compatibility

On macOS, ONNX Runtime uses Apple's CoreML framework for hardware acceleration. Real-ESRGAN works well with CoreML and processes at approximately 10 to 20 FPS on Apple Silicon. However, SwinIR has a known compatibility issue with CoreML that causes it to fall back to CPU-only processing on macOS.

SwinIR on macOS

Apple's CoreML framework does not support the dynamic input shapes used by SwinIR models. When running on macOS, SwinIR automatically falls back to CPU processing, which is significantly slower (approximately 2 to 5 FPS compared to 10 to 20 FPS with Real-ESRGAN on CoreML). For the best local macOS experience, use Real-ESRGAN. Alternatively, use remote upscaling on a Linux GPU instance where SwinIR runs at full speed with CUDA.

Choosing a Model

Use the guidelines below to pick the right model for your project:

Choose Real-ESRGAN when...

Speed is more important than maximum quality
You are processing on macOS locally
You want quick previews before final output
Your source has heavy compression artifacts
You need 8x upscaling (Studio tier)

Choose SwinIR when...

Quality is the top priority
You have a dedicated NVIDIA GPU or use remote upscaling
You are working on professional or final-delivery output
Fine detail such as hair and skin texture matters
You are upscaling high-quality source material

Recommended for most users

Start with the Balanced quality preset, which uses SwinIR 2x with multi-pass upscaling for an excellent balance of quality and speed. If you are on macOS without a dedicated GPU, switch to the Fast preset which uses Real-ESRGAN 2x.

GPU Requirements

Both model families use ONNX Runtime for inference. On systems with an NVIDIA GPU, the CUDA Execution Provider is used automatically. On macOS, CoreML provides hardware acceleration for compatible models. If no GPU is available, processing falls back to CPU.

GPU memory usage is approximately 2 GB for 4x upscaling with the default tile size of 256 pixels. You can adjust the tile size and GPU memory limit in the pipeline settings to fit your hardware.

Model Sources

All models are hosted on Hugging Face and verified for compatibility with Recaster:

Real-ESRGAN 2x/4x: qualcomm/Real-ESRGAN-x4plus
SwinIR 2x/4x: lixinze/swinir (requires authentication)
Real-ESRGAN 8x: facefusion/models

Was this page helpful?