Local Training

Train models on your own GPU with real-time loss monitoring, auto-save protection, and pause/resume support.

Getting Started

Local training runs the DeepFaceLab training process directly on your machine's GPU. This is available to all Recaster users, including the Free tier, and requires no internet connection once you have your dataset prepared.

Before starting training, make sure you have extracted faces from both your source and destination videos, and that your face masks are clean and well-defined.

Starting Training

Open the Training panel

Click the Training button in the sidebar or navigate to it from the main menu. The Training widget appears in the central area.

Select a model type

Choose from SAEHD, AMP, Quick96, or XSeg in the model type dropdown. Each model has different characteristics suited to different use cases. See the Model Types page for detailed comparisons.

Configure training parameters

Set the resolution, batch size, architecture dimensions, and other model-specific options. Hover over any parameter label for a tooltip explaining what it does and recommended values.

Click Start

The training process begins. DeepFaceLab initializes the model (this may take 30-60 seconds for the first iteration), then starts processing batches. Loss values and the preview window update in real-time.

First-Time Initialization

The first training iteration takes longer than subsequent ones because DeepFaceLab needs to build and compile the model architecture. Subsequent iterations are much faster once the model is initialized.

Loss Monitoring

The loss graph is the primary indicator of training progress. It displays two lines that update in real-time as training progresses:

Source Loss

Measures how well the model reconstructs the source face when given a source face as input. A lower value means the model better understands the source face structure.

Destination Loss

Measures how well the model reconstructs the destination face. This is particularly important because the destination face quality directly affects the final merged output.

Reading the Loss Graph

Healthy training -- Both lines trend steadily downward. The curve is steep at first, then gradually flattens as the model improves.
Converged -- Both lines have flattened and show minimal change over thousands of iterations. The model has learned what it can from the current dataset and configuration.
Loss spike -- A sudden upward jump in loss. Small spikes are normal and the model usually recovers. Large, sustained spikes may indicate a problem with the dataset or training parameters.
Divergence -- Loss increases steadily over time. This is rare but can happen with excessively high learning rates or corrupted datasets. Stop training and check your configuration.

Typical Loss Values

Good final loss values depend on the model type and resolution. For SAEHD at 224px, expect source loss around 0.02-0.05 and destination loss around 0.03-0.06. Lower is better, but values below 0.01 may indicate overfitting.

Save and Backup

Recaster provides multiple levels of protection for your training progress:

Auto-Save

Training history is automatically saved at two intervals:

Every 30 seconds -- A timer-based auto-save captures the current iteration count, loss values, and configuration snapshot.
At milestone iterations -- Every 1,000 iterations, a more comprehensive save is performed including model state metadata.

Auto-save ensures that if the application crashes, the system loses power, or the GPU encounters an error, you will lose at most 30 seconds of training progress data.

Manual Save

Click the Save button in the training controls to save the current model state immediately. This writes the model weights to disk. Use manual saves before changing any training parameters or before stopping a session you plan to resume later.

Backups

Click the Backup button to create a timestamped copy of the current model files. Backups are stored in the project's model directory with a date suffix. This is useful before making significant configuration changes or before experimenting with different training approaches.

Disk Space

Model files can be large (200MB-2GB depending on architecture and resolution). Creating frequent backups can consume significant disk space. Periodically clean up old backups that you no longer need.

Pause and Resume

Recaster supports pausing and resuming training without losing any progress. This is useful when you need to temporarily free up GPU resources for other tasks, or when you want to step away from training overnight and resume the next day.

How Pause Works

On Unix-based systems (macOS and Linux), the pause feature uses process signals to suspend the training process:

Pause -- Sends a SIGSTOP signal to the training process, which freezes it in place without terminating it. The GPU is freed while the process is suspended.
Resume -- Sends a SIGCONT signal to continue the process from exactly where it was paused. Training picks up at the same iteration without any data loss.

Pause vs Stop

Pausing suspends the process but keeps it in memory. The model state is preserved exactly as-is. Stopping terminates the process completely and requires a fresh model load on the next start. Use Pause for short breaks and Stop when you are done training.

Pause Duration

There is no time limit on how long training can be paused. The process remains suspended indefinitely until you either resume or stop it. The training history tracks pause duration separately from active training time.

Training History

All training sessions are tracked automatically. The history includes:

Session start and end timestamps
Model type and configuration snapshot
Total iterations completed
Final loss values (source and destination)
Active training duration (excluding pauses)
Session status (completed, interrupted, crashed)

Training history is stored in the application settings directory:

Platform	Location
macOS	`~/Library/Application Support/Recaster/training_history.json`
Linux	`~/.config/Recaster/training_history.json`
Windows	`%APPDATA%\Recaster\training_history.json`

Crash Recovery

If the application crashes during training, the auto-save system marks the session as "interrupted" in the history. On the next launch, Recaster detects the interrupted session and offers to resume from the last saved checkpoint.

GPU Considerations

Local training performance depends heavily on your GPU. Here are key factors to consider:

VRAM Requirements

Configuration	Minimum VRAM	Recommended VRAM
Quick96 (96px)	2 GB	4 GB
SAEHD 128px	4 GB	6 GB
SAEHD 224px	6 GB	8 GB
SAEHD 320px	8 GB	12 GB
AMP 224px	6 GB	8 GB

Out of Memory Errors

If you encounter GPU out-of-memory errors, try reducing the batch size, lowering the resolution, or decreasing the architecture dimensions. The batch size has the most direct impact on VRAM usage.

Training Speed

Training speed is measured in iterations per second and depends on GPU power, resolution, and batch size. Typical speeds:

RTX 3060 (12GB) -- ~5-10 iter/s at 224px, ~15-20 iter/s at 96px
RTX 3090 (24GB) -- ~10-20 iter/s at 224px, ~30-40 iter/s at 96px
RTX 4090 (24GB) -- ~15-30 iter/s at 224px, ~40-60 iter/s at 96px
Apple M1/M2 -- ~3-8 iter/s at 224px (Metal acceleration)

Was this page helpful?