Benchmark data

Each of the following examples is testing max number of inputs and outputs that the instance can handle. It provides 3 variants with different ratio of inputs to outputs. Each output renders 1, 2 or 4 inputs using tiles.

Capacity Testing: Each test examines the maximum number of inputs and outputs the server instance can handle.
Processing Details: We’re testing examples with 1, 2, or 4 inputs, each using Tiles component.

You can see the most important benchmarks below. Visit our GitHub repository for more results and the exact implementation of the benchmarks.

Full GPU pipeline

For any meaningful workload, a full GPU pipeline (Vulkan H264 decoding, GPU-based compositing, and Vulkan H264 encoding) is the most cost-efficient way to run Smelter. It avoids CPU/GPU memory transfers and removes CPU encoding as the bottleneck, so a GPU-equipped instance will deliver more throughput per dollar than any CPU-only instance at a comparable price point.

The tables below show all Vulkan results for both g4dn instance types. Notice that the results are almost the same, because all the heavy workloads run on the GPU.

`g4dn.xlarge`

CPU: 4vCPU, Memory: 16GB, GPU: Nvidia T4

Input	Output	Input/output ratio
Input	Output	1:1	2:1	4:1
720p24fps	720p24fps	15 / 15	30 / 15	48 / 12
720p24fps	1080p30fps	7 / 7	14 / 7	24 / 6
1080p30fps	1080p30fps	6 / 6	12 / 6	24 / 6
1080p30fps	1440p30fps	3 / 3	8 / 4	12 / 3
1080p30fps	2160p30fps	1 / 1	2 / 1	4 / 1
2160p30fps	2160p30fps	1 / 1	2 / 1	4 / 1

`g4dn.2xlarge`

CPU: 8vCPU, Memory: 32GB, GPU: Nvidia T4

Input	Output	Input/output ratio
Input	Output	1:1	2:1	4:1
720p24fps	720p24fps	15 / 15	30 / 15	56 / 14
720p24fps	1080p30fps	6 / 6	14 / 7	24 / 6
1080p30fps	1080p30fps	6 / 6	12 / 6	24 / 6
1080p30fps	1440p30fps	3 / 3	6 / 3	12 / 3
1080p30fps	2160p30fps	1 / 1	2 / 1	4 / 1
2160p30fps	2160p30fps	1 / 1	2 / 1	4 / 1

CPU vs GPU pipeline on `g4dn` instances

Even when a GPU is available, you can still run all or part of the pipeline on the CPU. The tables below compare the GPU pipeline (Vulkan H264 decode + encode) against CPU-only pipelines (FFmpeg H264 decode + encode at three preset speeds) on the two most common resolutions.

Vulkan H264 on Nvidia cards should produce quality roughly on par with x264’s medium preset. Of the presets benchmarked here, fast is the closest match. ultrafast and veryfast trade noticeable quality for speed.

`g4dn.xlarge`

CPU: 4vCPU, Memory: 16GB, GPU: Nvidia T4

720p24fps → 720p24fps

Decoder / Encoder	Input/output ratio
Decoder / Encoder	1:1	2:1	4:1
Vulkan H264	15 / 15	30 / 15	48 / 12
FFmpeg H264 (ultrafast)	6 / 6	10 / 5	12 / 3
FFmpeg H264 (veryfast)	3 / 3	6 / 3	8 / 2
FFmpeg H264 (fast)	2 / 2	4 / 2	4 / 1

1080p30fps → 1080p30fps

Decoder / Encoder	Input/output ratio
Decoder / Encoder	1:1	2:1	4:1
Vulkan H264	6 / 6	12 / 6	24 / 6
FFmpeg H264 (ultrafast)	2 / 2	4 / 2	4 / 1
FFmpeg H264 (veryfast)	1 / 1	2 / 1	-
FFmpeg H264 (fast)	-	2 / 1	-

`g4dn.2xlarge`

CPU: 8vCPU, Memory: 32GB, GPU: Nvidia T4

720p24fps → 720p24fps

Decoder / Encoder	Input/output ratio
Decoder / Encoder	1:1	2:1	4:1
Vulkan H264	15 / 15	30 / 15	56 / 14
FFmpeg H264 (ultrafast)	11 / 11	18 / 9	20 / 5
FFmpeg H264 (veryfast)	8 / 8	14 / 7	20 / 5
FFmpeg H264 (fast)	4 / 4	10 / 5	12 / 3

1080p30fps → 1080p30fps

Decoder / Encoder	Input/output ratio
Decoder / Encoder	1:1	2:1	4:1
Vulkan H264	6 / 6	12 / 6	24 / 6
FFmpeg H264 (ultrafast)	5 / 5	8 / 4	8 / 2
FFmpeg H264 (veryfast)	3 / 3	6 / 3	4 / 1
FFmpeg H264 (fast)	1 / 1	4 / 2	4 / 1

CPU-only instances

Running Smelter on a CPU-only instance is significantly less cost-efficient than using a GPU instance. Smelter does compositing through GPU shaders, so on a machine without a GPU those shaders have to be emulated in software on the CPU, which is slow and competes for the same cores that are already busy with H264 decoding and encoding. As a result, even a relatively beefy c5 instance handles only a fraction of the workload a comparable g4dn instance can sustain.

`c5.2xlarge`

CPU: 8vCPU, Memory: 16GB

720p24fps → 720p24fps

Decoder / Encoder	Input/output ratio
Decoder / Encoder	1:1	2:1	4:1
FFmpeg H264 (ultrafast)	1 / 1	2 / 1	-
FFmpeg H264 (veryfast)	1 / 1	2 / 1	-
FFmpeg H264 (fast)	1 / 1	2 / 1	-

`c5.4xlarge`

CPU: 16vCPU, Memory: 32GB

720p24fps → 720p24fps

Decoder / Encoder	Input/output ratio
Decoder / Encoder	1:1	2:1	4:1
FFmpeg H264 (ultrafast)	3 / 3	4 / 2	4 / 1
FFmpeg H264 (veryfast)	3 / 3	4 / 2	4 / 1
FFmpeg H264 (fast)	2 / 2	4 / 2	4 / 1

1080p30fps → 1080p30fps

Decoder / Encoder	Input/output ratio
Decoder / Encoder	1:1	2:1	4:1
FFmpeg H264 (ultrafast)	1 / 1	2 / 1	-
FFmpeg H264 (veryfast)	1 / 1	2 / 1	-
FFmpeg H264 (fast)	1 / 1	2 / 1	-

Benchmark data

Full GPU pipeline

g4dn.xlarge

g4dn.2xlarge

CPU vs GPU pipeline on g4dn instances

g4dn.xlarge

720p24fps → 720p24fps

1080p30fps → 1080p30fps

g4dn.2xlarge

720p24fps → 720p24fps

1080p30fps → 1080p30fps

CPU-only instances

c5.2xlarge

720p24fps → 720p24fps

c5.4xlarge

720p24fps → 720p24fps

1080p30fps → 1080p30fps

`g4dn.xlarge`

`g4dn.2xlarge`

CPU vs GPU pipeline on `g4dn` instances

`g4dn.xlarge`

`g4dn.2xlarge`

`c5.2xlarge`

`c5.4xlarge`