Remote Training

Remote Training

Train on cloud GPUs via Vast.ai with live preview streaming, multi-session support, and budget controls.

StudioRemote Training requires a Studio tier license.

Studio Feature

Remote training is available exclusively to Studio tier users. It allows you to train on powerful cloud GPUs without needing expensive local hardware. If you are on the Free tier, you can upgrade at any time from your account settings or the in-app upgrade dialog.

Overview

Remote training connects Recaster to cloud GPU instances on Vast.ai, a marketplace for renting GPU compute by the hour. This gives you access to high-end GPUs like the RTX 3090, RTX 4090, and A100 at a fraction of the purchase cost.

Recaster handles the full remote training lifecycle: provisioning instances, uploading your dataset, starting training, streaming live previews back to your desktop, and syncing results. You interact with the same Training widget interface -- the only difference is where the computation happens.

Vast.ai Setup

Before using remote training, you need to configure your Vast.ai account and SSH credentials in Recaster.

1

Create a Vast.ai account

Visit vast.ai and sign up for an account. Add credit to your balance using a credit card or cryptocurrency. Most training sessions cost $0.20-$0.80 per hour depending on the GPU.
2

Generate an SSH key

Recaster can generate an SSH key pair for you automatically. Go to Settings and click "Generate SSH Key" in the Remote section. The public key is displayed for you to copy.
3

Add your SSH key to Vast.ai

In your Vast.ai account settings, paste the public SSH key. This allows Recaster to securely connect to your rented instances.
4

Enter your API key

Copy your Vast.ai API key from your account page and paste it into Recaster's Settings under the Remote section. This allows Recaster to browse, create, and manage instances on your behalf.

SSH Key Location

Recaster stores SSH keys in the application settings directory. On macOS, this is ~/Library/Application Support/Recaster/ssh/. The private key must have 600 permissions (read/write for owner only). Recaster sets this automatically when generating keys.

Launching Instances

Once your Vast.ai account is configured, you can browse available GPU offers and launch instances directly from Recaster's Remote panel.

Instance Browser

The instance browser shows available GPU machines sorted by price per hour. Each listing includes:

  • GPU model and VRAM (e.g., RTX 3090 24GB)
  • Price per hour
  • Available disk space
  • Network speed
  • Reliability rating
GPUVRAMTypical PriceBest For
RTX 309024 GB$0.20-0.40/hrGeneral training, great value
RTX 409024 GB$0.40-0.80/hrFast training, high-res models
A100 40GB40 GB$0.60-1.20/hrLarge models, high resolution
A100 80GB80 GB$1.00-2.00/hrMaximum resolution and batch size

Cost-Effective Training

The RTX 3090 offers the best value for most training tasks. At around $0.20-0.40 per hour, a 24-hour SAEHD training run costs approximately $5-10 total. Compare this to the $1,500+ purchase price of a consumer RTX 3090.

File Synchronization

Recaster uses rsync over SSH to synchronize your project files between your local machine and the remote instance. This includes uploading face datasets, model files, and configuration, as well as downloading training results.

Sync Workflow

  • Initial upload -- When you start remote training, Recaster uploads your source and destination face datasets to the instance. A progress bar shows the upload status.
  • Incremental sync -- After the initial upload, only changed files are synced. This makes subsequent syncs much faster.
  • Result download -- When training is complete, sync the trained model files back to your local machine for merging.

Sync Panel

The Sync tab in the Project Panel provides a visual interface for file synchronization. It shows the sync status of each directory (source faces, destination faces, model files) with color-coded indicators:

  • Green -- Fully synced, local and remote files match.
  • Yellow -- Local files are ahead of remote (upload needed).
  • Orange -- Remote files are ahead of local (download needed).

Upload Speed

The initial dataset upload can take several minutes depending on your internet upload speed and dataset size. A typical face dataset (2,000-5,000 faces) is 200-500 MB. Consider compressing your dataset or using a wired connection for the first upload.

Live Preview Streaming

One of the most powerful Studio features is real-time preview streaming from remote training sessions. Instead of waiting for training to complete and downloading the model, you can watch the face swap quality improve in real-time.

How Streaming Works

Recaster deploys a lightweight streaming server to the remote instance alongside the training process. This server:

  1. Captures the DFL training preview window using X11 screenshots.
  2. Crops the header and footer to isolate the face grid.
  3. Encodes the frame as JPEG and sends it via Server-Sent Events (SSE).
  4. An SSH tunnel forwards the stream to your local machine.
  5. Recaster receives the frames and displays them in the preview canvas.

The streaming adds minimal overhead to the training process. Preview frames update every few seconds, and the stream automatically adapts to network conditions.

Preview Features

  • All 9 preview views available with Space/Shift+Space navigation.
  • 4-column by 2-row grid layout (same as local preview).
  • Loss values and iteration count streamed alongside preview frames.
  • Adaptive quality adjusts JPEG compression based on network latency.

Enable Live Preview

Click the "Live" toggle button in the training controls to start preview streaming. The first frame may take a few seconds to appear while the SSH tunnel is established. The streaming server is deployed automatically on first use.

Multi-Session Training

Studio users can run multiple training sessions concurrently on separate GPU instances. This is useful when you need to train models for different face pairs simultaneously, or when you want to compare different configurations in parallel.

Session Management

The Multi-Session panel provides an overview of all active training sessions:

  • Session cards -- Each active session is displayed as a card showing the project name, model type, current iteration, loss values, and GPU instance details.
  • Quick actions -- Pause, resume, or stop any session from the card controls.
  • Switch preview -- Click a session card to view its live preview in the main canvas.

Port Allocation

Each concurrent session uses a unique port for SSH tunneling and preview streaming. Recaster automatically allocates ports in the range 8765-8769, supporting up to 5 simultaneous sessions.

Cost Tracking

Remote training costs money, so Recaster includes built-in cost tracking and budget management to help you stay within your spending limits.

Budget Configuration

Set a spending budget in the Budget Configuration dialog:

  • Daily budget -- Maximum spend per 24-hour period. Instances are paused when the limit is reached.
  • Monthly budget -- Maximum total spend per month. A warning appears when approaching the limit.
  • Per-session limit -- Cap the cost of any single training session.

Spending Alerts

Recaster provides proactive cost alerts:

  • 80% warning -- A yellow banner appears when you have used 80% of your configured budget.
  • 100% action -- When the budget limit is reached, running instances are automatically paused. You can increase the budget to continue or stop instances to save the remaining balance.
  • Session cost display -- Each session card shows its accumulated cost and per-hour rate.

Instance Billing

Vast.ai charges by the hour for rented instances. Instances continue to accrue charges even when training is paused (the GPU is still reserved). Always stop and destroy instances when you are done training to avoid unexpected charges.

Project Panel Integration

Remote training integrates with the Project Panel through the Local/Remote toggle switch. When enabled, the Project Panel shows three tabs:

Local Tab

Your local project files. Always visible regardless of mode.

Remote Tab

Browse the remote instance filesystem. Navigate directories, view files, and verify that datasets uploaded correctly.

Sync Tab

Upload and download files between local and remote. Sync status indicators show which directories are up to date.

Instance Association

Each Recaster project remembers which remote instance it is associated with. When you toggle to Remote mode:

  • If one instance is running, it is automatically associated.
  • If multiple instances are running, a selection dialog appears.
  • If no instances are running, you are prompted to launch one.
  • The association is saved in the project state and restored on next launch.

Model Versioning

Studio users have access to model version snapshots, which allow you to save the current state of a training model and restore it later. This is particularly useful for remote training where you may want to:

  • Save a checkpoint before changing training parameters.
  • Compare models from different training stages to find the sweet spot.
  • Roll back to a previous state if training quality degrades.
  • Share model snapshots between local and remote environments.

Snapshot Before Experimenting

Always create a version snapshot before enabling GAN training, changing the learning rate, or making other significant configuration changes. This gives you a safe rollback point if the change negatively impacts quality.

Troubleshooting

SSH connection fails

Verify your SSH key has correct permissions (chmod 600). Confirm the key is added to your Vast.ai account. Test the connection manually in a terminal.

Live preview not showing

Check that the streaming server is running on the remote instance. The server is deployed automatically on first use. If the preview remains blank, the SSH tunnel may have disconnected -- try toggling Live off and on again.

rsync progress not showing on macOS

The built-in macOS rsync is version 2.6.9 which lacks modern progress flags. Install the latest version via Homebrew: brew install rsync.

Instance terminated unexpectedly

Vast.ai instances can be interrupted by the provider. Your training history and model auto-saves are preserved. Launch a new instance, sync your project files, and resume training.