Benchmark data
Each of the following examples is testing max number of inputs and outputs that the instance can handle. It provides 3 variants with different ratio of inputs to outputs. Each output renders 1, 2 or 4 inputs using tiles.
- Capacity Testing: Each test examines the maximum number of inputs and outputs the server instance can handle.
- Processing Details: We’re testing examples with 1, 2, or 4 inputs, each using Tiles component.
You can see the most important benchmarks below. Visit our GitHub repository for more results and the exact implementation of the benchmarks.
Full GPU pipeline
For any meaningful workload, a full GPU pipeline (Vulkan H264 decoding, GPU-based compositing, and Vulkan H264 encoding) is the most cost-efficient way to run Smelter. It avoids CPU/GPU memory transfers and removes CPU encoding as the bottleneck, so a GPU-equipped instance will deliver more throughput per dollar than any CPU-only instance at a comparable price point.
The tables below show all Vulkan results for both g4dn instance types. Notice that the results are almost the same, because all the heavy workloads run on the GPU.
g4dn.xlarge
CPU: 4vCPU, Memory: 16GB, GPU: Nvidia T4
| Input | Output | Input/output ratio | ||
|---|---|---|---|---|
| 1:1 | 2:1 | 4:1 | ||
| 720p24fps | 720p24fps | 15 / 15 | 30 / 15 | 48 / 12 |
| 720p24fps | 1080p30fps | 7 / 7 | 14 / 7 | 24 / 6 |
| 1080p30fps | 1080p30fps | 6 / 6 | 12 / 6 | 24 / 6 |
| 1080p30fps | 1440p30fps | 3 / 3 | 8 / 4 | 12 / 3 |
| 1080p30fps | 2160p30fps | 1 / 1 | 2 / 1 | 4 / 1 |
| 2160p30fps | 2160p30fps | 1 / 1 | 2 / 1 | 4 / 1 |
g4dn.2xlarge
CPU: 8vCPU, Memory: 32GB, GPU: Nvidia T4
| Input | Output | Input/output ratio | ||
|---|---|---|---|---|
| 1:1 | 2:1 | 4:1 | ||
| 720p24fps | 720p24fps | 15 / 15 | 30 / 15 | 56 / 14 |
| 720p24fps | 1080p30fps | 6 / 6 | 14 / 7 | 24 / 6 |
| 1080p30fps | 1080p30fps | 6 / 6 | 12 / 6 | 24 / 6 |
| 1080p30fps | 1440p30fps | 3 / 3 | 6 / 3 | 12 / 3 |
| 1080p30fps | 2160p30fps | 1 / 1 | 2 / 1 | 4 / 1 |
| 2160p30fps | 2160p30fps | 1 / 1 | 2 / 1 | 4 / 1 |
CPU vs GPU pipeline on g4dn instances
Even when a GPU is available, you can still run all or part of the pipeline on the CPU. The tables below compare the GPU pipeline (Vulkan H264 decode + encode) against CPU-only pipelines (FFmpeg H264 decode + encode at three preset speeds) on the two most common resolutions.
Vulkan H264 on Nvidia cards should produce quality roughly on par with x264’s medium preset. Of the presets benchmarked here, fast is the closest match. ultrafast and veryfast trade noticeable quality for speed.
g4dn.xlarge
CPU: 4vCPU, Memory: 16GB, GPU: Nvidia T4
720p24fps → 720p24fps
| Decoder / Encoder | Input/output ratio | ||
|---|---|---|---|
| 1:1 | 2:1 | 4:1 | |
| Vulkan H264 | 15 / 15 | 30 / 15 | 48 / 12 |
| FFmpeg H264 (ultrafast) | 6 / 6 | 10 / 5 | 12 / 3 |
| FFmpeg H264 (veryfast) | 3 / 3 | 6 / 3 | 8 / 2 |
| FFmpeg H264 (fast) | 2 / 2 | 4 / 2 | 4 / 1 |
1080p30fps → 1080p30fps
| Decoder / Encoder | Input/output ratio | ||
|---|---|---|---|
| 1:1 | 2:1 | 4:1 | |
| Vulkan H264 | 6 / 6 | 12 / 6 | 24 / 6 |
| FFmpeg H264 (ultrafast) | 2 / 2 | 4 / 2 | 4 / 1 |
| FFmpeg H264 (veryfast) | 1 / 1 | 2 / 1 | - |
| FFmpeg H264 (fast) | - | 2 / 1 | - |
g4dn.2xlarge
CPU: 8vCPU, Memory: 32GB, GPU: Nvidia T4
720p24fps → 720p24fps
| Decoder / Encoder | Input/output ratio | ||
|---|---|---|---|
| 1:1 | 2:1 | 4:1 | |
| Vulkan H264 | 15 / 15 | 30 / 15 | 56 / 14 |
| FFmpeg H264 (ultrafast) | 11 / 11 | 18 / 9 | 20 / 5 |
| FFmpeg H264 (veryfast) | 8 / 8 | 14 / 7 | 20 / 5 |
| FFmpeg H264 (fast) | 4 / 4 | 10 / 5 | 12 / 3 |
1080p30fps → 1080p30fps
| Decoder / Encoder | Input/output ratio | ||
|---|---|---|---|
| 1:1 | 2:1 | 4:1 | |
| Vulkan H264 | 6 / 6 | 12 / 6 | 24 / 6 |
| FFmpeg H264 (ultrafast) | 5 / 5 | 8 / 4 | 8 / 2 |
| FFmpeg H264 (veryfast) | 3 / 3 | 6 / 3 | 4 / 1 |
| FFmpeg H264 (fast) | 1 / 1 | 4 / 2 | 4 / 1 |
CPU-only instances
Running Smelter on a CPU-only instance is significantly less cost-efficient than using a GPU instance. Smelter does compositing through GPU shaders, so on a machine without a GPU those shaders have to be emulated in software on the CPU, which is slow and competes for the same cores that are already busy with H264 decoding and encoding. As a result, even a relatively beefy c5 instance handles only a fraction of the workload a comparable g4dn instance can sustain.
c5.2xlarge
CPU: 8vCPU, Memory: 16GB
720p24fps → 720p24fps
| Decoder / Encoder | Input/output ratio | ||
|---|---|---|---|
| 1:1 | 2:1 | 4:1 | |
| FFmpeg H264 (ultrafast) | 1 / 1 | 2 / 1 | - |
| FFmpeg H264 (veryfast) | 1 / 1 | 2 / 1 | - |
| FFmpeg H264 (fast) | 1 / 1 | 2 / 1 | - |
c5.4xlarge
CPU: 16vCPU, Memory: 32GB
720p24fps → 720p24fps
| Decoder / Encoder | Input/output ratio | ||
|---|---|---|---|
| 1:1 | 2:1 | 4:1 | |
| FFmpeg H264 (ultrafast) | 3 / 3 | 4 / 2 | 4 / 1 |
| FFmpeg H264 (veryfast) | 3 / 3 | 4 / 2 | 4 / 1 |
| FFmpeg H264 (fast) | 2 / 2 | 4 / 2 | 4 / 1 |
1080p30fps → 1080p30fps
| Decoder / Encoder | Input/output ratio | ||
|---|---|---|---|
| 1:1 | 2:1 | 4:1 | |
| FFmpeg H264 (ultrafast) | 1 / 1 | 2 / 1 | - |
| FFmpeg H264 (veryfast) | 1 / 1 | 2 / 1 | - |
| FFmpeg H264 (fast) | 1 / 1 | 2 / 1 | - |