2024-01-23
GlUniformLocation Benchmark
I was looking for some information on whether or not calling glUniformLocation
for every frame has any performance impact. Everything I found was either claims without any sources or very old data.
Since I couldn't find anything useful, I ended up writing a small benchmark program to test 3 different ways of using uniform locations:
- Calling
glUniformLocation
every time (Lookup). - Getting the location at first call and caching it in a std::map using compile time keys (Cached).
- Getting the location at load time and storing the location in a variable (Static).
The test uses a shader with 20 vec4 uniforms (10 in vertex, 10 in fragment) and sets each uniform 50 times per render (less than 50 times made it hard to measure the time delta) with 1000 renders per run to come up with an average time.
Here are the results of a modern high-end desktop and 5 year old laptop both running Windows 11.
Machine / Method | First | Second | Fastest | Slowest | Average |
---|---|---|---|---|---|
NVIDIA RTX4090 | |||||
- Lookup | 1423 | 1047 | 578 | 2317 | 929 |
- Cached | 366 | 147 | 105 | 402 | 134 |
- Static | 159 | 21 | 5 | 167 | 10 |
i7-8565U / UHD 620 | |||||
- Lookup | 1300 | 1115 | 870 | 1984 | 1043 |
- Cached | 676 | 671 | 488 | 1849 | 579 |
- Static | 130 | 129 | 73 | 459 | 103 |
Numbers are in μs/microseconds based on 5 runs of the benchmark program for each machine/method combination. The first frame value for "cached" is misleading since it loops all 20 variable 50 times so only needs to call glUniformLocation for 2% of the calls. With 1000 unique uniform values the first frame would likely be similar to "lookup" results.
The difference is pretty significations. If running at 60 fps with 1000 uniforms then looking up the location every frame would use up 5.6% of the frame time on average vs only 0.1-0.6% (depending on hardware) when storing the location.
The idea of caching locations in a std::map worked better than lookup but still significantly worse than using a custom class with uniform variables hardcoded.
Benchmark source code available here.