Devlog

2024-05-26

2024-05-17

2024-05-06

2024-04-14

2024-04-07

2024-01-23

GlUniformLocation Benchmark

I was looking for some information on whether or not calling glUniformLocation for every frame has any performance impact. Everything I found was either claims without any sources or very old data.

Since I couldn't find anything useful, I ended up writing a small benchmark program to test 3 different ways of using uniform locations:

  • Calling glUniformLocation every time (Lookup).
  • Getting the location at first call and caching it in a std::map using compile time keys (Cached).
  • Getting the location at load time and storing the location in a variable (Static).

The test uses a shader with 20 vec4 uniforms (10 in vertex, 10 in fragment) and sets each uniform 50 times per render (less than 50 times made it hard to measure the time delta) with 1000 renders per run to come up with an average time.

Here are the results of a modern high-end desktop and 5 year old laptop both running Windows 11.

Machine / Method First Second Fastest Slowest Average
NVIDIA RTX4090
- Lookup 1423 1047 578 2317 929
- Cached 366 147 105 402 134
- Static 159 21 5 167 10
i7-8565U / UHD 620
- Lookup 1300 1115 870 1984 1043
- Cached 676 671 488 1849 579
- Static 130 129 73 459 103

Numbers are in μs/microseconds based on 5 runs of the benchmark program for each machine/method combination. The first frame value for "cached" is misleading since it loops all 20 variable 50 times so only needs to call glUniformLocation for 2% of the calls. With 1000 unique uniform values the first frame would likely be similar to "lookup" results.

The difference is pretty significations. If running at 60 fps with 1000 uniforms then looking up the location every frame would use up 5.6% of the frame time on average vs only 0.1-0.6% (depending on hardware) when storing the location.

The idea of caching locations in a std::map worked better than lookup but still significantly worse than using a custom class with uniform variables hardcoded.

Benchmark source code available here.

2024-01-19

ArcRace Update

A small update to my 3D game demo to test some shader and model loading code.

Download (2.4MB, Windows 10/11)

ArcRace 1.1

Libraries
Music & Sound FX

2024-01-11

OpenGL Text using stb_truetype

A quick on using stb_truetype.h to render text in OpenGL using an alpha mask texture with the stbtt_PackBegin/stbtt_PackFontRage API.

Text rendering

First load the font. The important data is the pixel data for the texture and packed character data. The example below packs characters from codepoints 0 to 125. If a different starting point than 0 is used then lookups in charData array must be offset by that number. (With 0-125 uppercase A will be at charData[65], with 32-125 then A is at charData[65-32]).

stbtt_pack_context packContext;
stbtt_packedchar charData[126];
unsigned char pixels[TEXTURE_WIDTH * TEXTURE_HEIGHT];
stbtt_PackBegin(&packContext, pixels, 
                TEXTURE_WIDTH, TEXTURE_HEIGHT,
                TEXTURE_WIDTH, 1, NULL);
stbtt_PackFontRange(&packContext, (unsigned char*)ttfData, 0, TEXTURE_FONT_SIZE,
                    0, 125, charData);
stbtt_PackEnd(&packContext);

Then put the pixel data into an OpenGL texture. The data is a single channel monochrome color:

glTexImage2D(GL_TEXTURE_2D, 0, GL_R8,
             TEXTURE_WIDTH, TEXTURE_HEIGHT, 0,
             GL_RED, GL_UNSIGNED_BYTE, pixels);

To render text, lookup each character in the charData array to get the position in the texture as char.x0, char.x1, char.y0, char.y1. The actual size of the character can be calculated from the offsets as char.xoff2 - char.xoff and char.yoff2 - char.yoff. The xoff values are the start and end offset from the left. Yoff are the start and end offset from the baseline (y values will be negative when above the baseline, positive when below). The x distance to the next character is available in char.xadvance. This can also be used to calculating the full width of a string before rendering for things like scaling to fit or center/right aligning text.

MIT licensed demo code available here.

2023-12-31

ArcRace Mini Game

A small game demo made to refresh OpenGL and general 3D programming.

Download (2.4MB, Windows 10/11)

ArcRace 1.0

Libraries
  • GLFW: Window, OpenGL context, and keyboard input
  • Galogen: OpenGL API headers
  • GLM: Vector and matrix math
  • std_image: Texture loading
  • SoLoud: Sound playback
Music & Sound FX