Local Training
Train models on your own GPU with real-time loss monitoring, auto-save protection, and pause/resume support.
Getting Started
Local training runs the DeepFaceLab training process directly on your machine's GPU. This is available to all Recaster users, including the Free tier, and requires no internet connection once you have your dataset prepared.
Before starting training, make sure you have extracted faces from both your source and destination videos, and that your face masks are clean and well-defined.
Starting Training
Open the Training panel
Select a model type
Configure training parameters
Click Start
First-Time Initialization
Loss Monitoring
The loss graph is the primary indicator of training progress. It displays two lines that update in real-time as training progresses:
Source Loss
Measures how well the model reconstructs the source face when given a source face as input. A lower value means the model better understands the source face structure.
Destination Loss
Measures how well the model reconstructs the destination face. This is particularly important because the destination face quality directly affects the final merged output.
Reading the Loss Graph
- Healthy training -- Both lines trend steadily downward. The curve is steep at first, then gradually flattens as the model improves.
- Converged -- Both lines have flattened and show minimal change over thousands of iterations. The model has learned what it can from the current dataset and configuration.
- Loss spike -- A sudden upward jump in loss. Small spikes are normal and the model usually recovers. Large, sustained spikes may indicate a problem with the dataset or training parameters.
- Divergence -- Loss increases steadily over time. This is rare but can happen with excessively high learning rates or corrupted datasets. Stop training and check your configuration.
Typical Loss Values
Save and Backup
Recaster provides multiple levels of protection for your training progress:
Auto-Save
Training history is automatically saved at two intervals:
- Every 30 seconds -- A timer-based auto-save captures the current iteration count, loss values, and configuration snapshot.
- At milestone iterations -- Every 1,000 iterations, a more comprehensive save is performed including model state metadata.
Auto-save ensures that if the application crashes, the system loses power, or the GPU encounters an error, you will lose at most 30 seconds of training progress data.
Manual Save
Click the Save button in the training controls to save the current model state immediately. This writes the model weights to disk. Use manual saves before changing any training parameters or before stopping a session you plan to resume later.
Backups
Click the Backup button to create a timestamped copy of the current model files. Backups are stored in the project's model directory with a date suffix. This is useful before making significant configuration changes or before experimenting with different training approaches.
Disk Space
Pause and Resume
Recaster supports pausing and resuming training without losing any progress. This is useful when you need to temporarily free up GPU resources for other tasks, or when you want to step away from training overnight and resume the next day.
How Pause Works
On Unix-based systems (macOS and Linux), the pause feature uses process signals to suspend the training process:
- Pause -- Sends a
SIGSTOPsignal to the training process, which freezes it in place without terminating it. The GPU is freed while the process is suspended. - Resume -- Sends a
SIGCONTsignal to continue the process from exactly where it was paused. Training picks up at the same iteration without any data loss.
Pause vs Stop
Pause Duration
There is no time limit on how long training can be paused. The process remains suspended indefinitely until you either resume or stop it. The training history tracks pause duration separately from active training time.
Training History
All training sessions are tracked automatically. The history includes:
- Session start and end timestamps
- Model type and configuration snapshot
- Total iterations completed
- Final loss values (source and destination)
- Active training duration (excluding pauses)
- Session status (completed, interrupted, crashed)
Training history is stored in the application settings directory:
| Platform | Location |
|---|---|
| macOS | ~/Library/Application Support/Recaster/training_history.json |
| Linux | ~/.config/Recaster/training_history.json |
| Windows | %APPDATA%\Recaster\training_history.json |
Crash Recovery
GPU Considerations
Local training performance depends heavily on your GPU. Here are key factors to consider:
VRAM Requirements
| Configuration | Minimum VRAM | Recommended VRAM |
|---|---|---|
| Quick96 (96px) | 2 GB | 4 GB |
| SAEHD 128px | 4 GB | 6 GB |
| SAEHD 224px | 6 GB | 8 GB |
| SAEHD 320px | 8 GB | 12 GB |
| AMP 224px | 6 GB | 8 GB |
Out of Memory Errors
Training Speed
Training speed is measured in iterations per second and depends on GPU power, resolution, and batch size. Typical speeds:
- RTX 3060 (12GB) -- ~5-10 iter/s at 224px, ~15-20 iter/s at 96px
- RTX 3090 (24GB) -- ~10-20 iter/s at 224px, ~30-40 iter/s at 96px
- RTX 4090 (24GB) -- ~15-30 iter/s at 224px, ~40-60 iter/s at 96px
- Apple M1/M2 -- ~3-8 iter/s at 224px (Metal acceleration)
Was this page helpful?