When I write a fragment shader, I am writing a program that will execute ten thousand times simultaneously — once for every pixel on screen, all at once, in perfect lockstep. There is no sequencing. There is no one thread waiting for another. There is only the field, firing together.
I kept thinking about this while reading about cortical columns. The brain's visual cortex doesn't process images serially, pixel by pixel, the way a CPU might. It processes entire spatial regions in parallel — columns of neurons responding to orientation, frequency, and phase simultaneously. The architecture is shockingly similar. Not metaphorically. Structurally.
The shader as a field operator
In GLSL, a fragment shader is a pure function. It takes a position in screen space and returns a color. No side effects. No shared mutable state. Every invocation is independent of every other. The GPU schedules thousands of these invocations across its shader cores, grouping them into warps — typically 32 threads that execute the same instruction simultaneously on different data.
// A simple signed-distance field rendered in parallel
void main() {
vec2 uv = (gl_FragCoord.xy - u_resolution * 0.5)
/ min(u_resolution.x, u_resolution.y);
// Every pixel evaluates this simultaneously
float d = length(uv) - 0.3;
float c = smoothstep(0.01, -0.01, d);
gl_FragColor = vec4(vec3(c), 1.0);
}
// This runs on ~1000 cores at once. No loop. No iteration.
This is the thing that changes how you think. When you write a shader, you stop thinking about what happens next and start thinking about what the whole field looks like at once. The question isn't "what does pixel (x, y) become?" The question is "what is the rule that describes the entire image, everywhere, simultaneously?"
To write a shader is to describe a law, not a procedure. The GPU is the instrument that instantiates that law across space. Personal notes, 2025
Neural columns and the warp
A cortical column is approximately 0.5mm in diameter and contains around 10,000 neurons. These neurons are organized into six layers, each connecting to different regions — input, processing, output. A column is a functional unit. It responds to a specific feature of the input stimulus, and it fires together, in synchrony, when that feature is present.
The GPU's warp — 32 threads executing the same instruction — is functionally analogous. Each thread handles a different pixel coordinate, but they all run the same shader code at the same clock cycle. If even one thread in the warp takes a different branch, the entire warp stalls while that branch completes. Divergence is expensive. Unity is efficient.
GPU warp: 32 threads · same instruction · different data · divergence = stall
Cortical column: ~10k neurons · feature-selective · synchronous firing · inhibition maintains coherence
Both systems pay a cost for divergence. Both optimize for coordinated, uniform response.
The inhibitory interneuron as branch predictor
The brain uses inhibitory interneurons — neurons that suppress neighboring activity — to maintain the coherence of a column's response. When a column fires for a specific orientation (say, 45 degrees), inhibitory interneurons suppress columns tuned to similar but distinct orientations. This is not unlike a GPU's branch prediction hardware suppressing divergent execution paths to keep warps coherent.
Both are mechanisms for enforcing uniformity in a massively parallel system. The strategies are different at the physical layer, but the computational problem they solve is identical: how do you keep thousands of parallel units executing coherently?
coherence(t) = f(synchrony, inhibition, common_input)
// Same function. Different substrate.
What this changes about how I write shaders
Once you see this, you cannot unsee it. A shader is not a small program. It is a description of a field that a massively parallel physical system will instantiate. Every line of GLSL you write is a constraint on a simultaneously executing ensemble.
I now think about shader branching differently. An if statement in a shader is not a decision — it is a field partition. The GPU will evaluate both branches everywhere, and mask out the unused result. The question is not "should this pixel do A or B?" The question is "what does the boundary between A-space and B-space look like, and is that boundary worth paying for?"
// Don't think: "if this pixel is inside the circle, color it"
// Think: "the circle is a partition of the field"
float inCircle = step(0.0, 0.3 - length(uv)); // no branch
float inCircleBranch = length(uv) < 0.3 ? 1.0 : 0.0; // branch = stall
// The first is a continuous field function.
// The second is a conditional that breaks warp coherence.
The neuroscience literature on perceptual binding — the question of how the brain integrates distributed processing into coherent experience — starts to read like a shader optimization problem. How does the visual system avoid the equivalent of warp divergence? How do distributed columns, each processing a different feature, produce a unified percept?
The question I cannot answer yet
The GPU produces an image. Many cores, unified output. We can see the output. We can measure its pixels. The computation is complete when the frame buffer is filled.
The brain produces something that we cannot yet measure in the same way. The columns fire. The synchrony is real. The binding happens. But the "frame buffer" — the place where experience is assembled — remains unidentified. This is the hard problem, wearing a new coat.
I do not think the GPU analogy will solve this. But I think it clarifies the question. The question is not "how do neurons produce consciousness?" That's too broad. The question might be: what is the readout mechanism? The GPU writes to a frame buffer that the display reads. What reads the cortical output?
I don't know. But I know where I'm looking next.