2024-01-23

GlUniformLocation Benchmark

I was looking for some information on whether or not calling glUniformLocation for every frame has any performance impact. Everything I found was either claims without any sources or very old data.

Since I couldn't find anything useful, I ended up writing a small benchmark program to test 3 different ways of using uniform locations:

  • Calling glUniformLocation every time (Lookup).
  • Getting the location at first call and caching it in a std::map using compile time keys (Cached).
  • Getting the location at load time and storing the location in a variable (Static).

The test uses a shader with 20 vec4 uniforms (10 in vertex, 10 in fragment) and sets each uniform 50 times per render (less than 50 times made it hard to measure the time delta) with 1000 renders per run to come up with an average time.

Here are the results of a modern high-end desktop and 5 year old laptop both running Windows 11.

Machine / Method First Second Fastest Slowest Average
NVIDIA RTX4090
- Lookup 1423 1047 578 2317 929
- Cached 366 147 105 402 134
- Static 159 21 5 167 10
i7-8565U / UHD 620
- Lookup 1300 1115 870 1984 1043
- Cached 676 671 488 1849 579
- Static 130 129 73 459 103

Numbers are in μs/microseconds based on 5 runs of the benchmark program for each machine/method combination. The first frame value for "cached" is misleading since it loops all 20 variable 50 times so only needs to call glUniformLocation for 2% of the calls. With 1000 unique uniform values the first frame would likely be similar to "lookup" results.

The difference is pretty significations. If running at 60 fps with 1000 uniforms then looking up the location every frame would use up 5.6% of the frame time on average vs only 0.1-0.6% (depending on hardware) when storing the location.

The idea of caching locations in a std::map worked better than lookup but still significantly worse than using a custom class with uniform variables hardcoded.

Benchmark source code available here.