aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
...
* mesa: In helpers, only check driver capability for metaNanley Chery2015-11-122-0/+12
| | | | | | | | | | | Make API context and version checks done by the helper functions pass unconditionally while meta is in progress. This transparently makes extension checks solely dependent on struct gl_extensions while in meta. v2: Use an 8-bit data type instead of a GLuint Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Check instructions appear only on supported hardware.Matt Turner2015-11-121-0/+254
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add initial assembly validation pass.Matt Turner2015-11-125-0/+174
| | | | | | | Initially just checks that sources are non-NULL, which would have alerted us to the problem fixed by commit 6c846dc5. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add annotation_insert_error() and support for printing errors.Matt Turner2015-11-122-7/+71
| | | | | | | | Will allow annotations to contain error messages (indicating an instruction violates a rule for instance) that are printed after the disassembly of the block. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Combine assembly annotations if possible.Matt Turner2015-11-121-5/+18
| | | | | | | | Often annotations are identical between sets of consecutive instructions. We can perhaps avoid some memory allocations by reusing the previous annotation. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set annotation_info's mem_ctx.Matt Turner2015-11-123-2/+5
| | | | | | It was being memset to 0 previously. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Don't consider control flow instructions to have sources.Matt Turner2015-11-121-8/+8
| | | | | | | | | | | | And why did IFF have a destination? I suspect that once upon a time the disassembler used this information to know which fields to find the jump targets in. The jump targets have moved, so the disassembler has to know how to handle these per-generation anyway. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fill out instruction list.Matt Turner2015-11-123-14/+42
| | | | | | | | | | | Add some instructions: illegal, movi, sends, sendsc. Remove some instructions with reused opcodes: msave, mrestore, push, pop, goto. I did have some gross code for disassembling opcodes per-generation, but there's very little meaningful overlap so it's probably not needed. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Consolidate is_3src() functions.Matt Turner2015-11-123-8/+7
| | | | Otherwise I'll have to add another later in this series.
* i965/fs/nir: fix the number of register written by FS_OPCODE_GET_BUFFER_SIZESamuel Iglesias Gonsálvez2015-11-121-2/+14
| | | | | | | | | | | FS_OPCODE_GET_BUFFER_SIZE is calculated with a resinfo's sampler message. This patch adjusts the number of registers written by the opcode following what the PRM spec says about the number of registers written by the SIMD8 and SIMD16's writeback messages for sampler messages. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/skl/gt4: Fix URB programming restriction.Ben Widawsky2015-11-111-0/+9
| | | | | | | | | | | | | | The comment in the code details the restriction. Thanks to Ken for having a very helpful conversation with me, and spotting the blurb in the link I sent him :P. There are still stability problems for me on GT4, but this definitely helps with some of the failures. v2: Comment fixes Cc: [email protected] Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split nir_emit_intrinsic by stage with a general fallback.Kenneth Graunke2015-11-112-277/+381
| | | | | | | | | | | | | | | | | | | | | | Many intrinsics only apply to a particular stage (such as discard). In other cases, we may want to interpret them differently based on the stage (such as load_primitive_id or load_input). The current method isn't that pretty - we handle all intrinsics in one giant function. Sometimes we assert on stage, sometimes we forget. Different behaviors are handled via if-ladders based on stage. This commit introduces new nir_emit_<stage>_intrinsic() functions, and makes nir_emit_instr() call those. In turn, those fall back to the generic nir_emit_intrinsic() function for cases they don't want to handle specially. This makes it clear which intrinsics only exist in one stage, and makes it easy to handle inputs/outputs differently for various stages. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/brw_reg: Add a brw_VxH_indirect helperJason Ekstrand2015-11-111-0/+11
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Print force_writemask_all in dump_instructions().Kenneth Graunke2015-11-112-0/+6
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Combine BRW_NEW_*_BINDING_TABLE dirty bits.Kenneth Graunke2015-11-115-26/+14
| | | | | | | | | | | A while back, we moved to directly emitting the Gen7+ state when constructing the binding tables. These flags are only used on Gen4-6, which emit all the binding table pointers at once. We gain nothing by having separate flags, so combine them. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Map GL_PATCHES to 3DPRIM_PATCHLIST_n.Kenneth Graunke2015-11-112-1/+10
| | | | | | | | | | | | | | | | Inspired by a patch by Fabian Bieler. Fabian defined a _3DPRIM_PATCHLIST_0 macro (which isn't actually a valid topology type); I instead chose to make a macro that takes an argument. He also took the number of patch vertices from _mesa_prim (which was set to ctx->TessCtrlProgram.patch_vertices) - I chose to use it directly to avoid the need for the VBO patch. v2: Change macro to 0x20 + (n - 1) instead of 0x1F + n to better match the documentation (suggested by Ian). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is ↵Eduardo Lima Mitev2015-11-101-0/+31
| | | | | | | | | | | | | | | | | | | | | | | a const When both fadd and fmul instructions have at least one operand that is a constant and it is only used once, the total number of instructions can be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because the constants will be progagated as immediate operands of fmul and fadd. This patch detects these situations and prevents fusing fmul+fadd into ffma. Shader-db results on i965 Haswell: total instructions in shared programs: 6235835 -> 6225895 (-0.16%) instructions in affected programs: 1124094 -> 1114154 (-0.88%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 7612 HURT: 843 GAINED: 4 LOST: 0 Reviewed-by: Jason Ekstrand <[email protected]>
* nir/nir_opt_peephole_ffma: Move this lowering pass to the i965 driverEduardo Lima Mitev2015-11-104-1/+272
| | | | | | | | | Because the next patch will add an optimization that is specific to i965, we want to move this loweing pass to that driver altogether. This is safe because i965 is the only consumer. Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: Lower UBO and SSBO access in glsl linkerKristian Høgsberg Kristensen2015-11-102-2/+2
| | | | | | | | | | | All GLSL IR consumers run this lowering pass so we can move it to the linker. This moves the pass up quite a bit, but that's the point: it needs to run before we throw away information about per-component vector access. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
* glsl: Drop exec_list argument to lower_ubo_referenceKristian Høgsberg Kristensen2015-11-101-1/+1
| | | | | | | | | | | We always pass in shader->ir and we already pass in the shader, so just drop the exec_list. Most passes either take just a exec_list or a shader, so this seems more consistent. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
* i965/fs: Use regs_read/written for post-RA scheduling in calculate_depsJason Ekstrand2015-11-071-11/+4
| | | | | | | | | | | | Previously, we were assuming that everything read/wrote exactly 1 logical GRF (1 in SIMD8 and 2 in SIMD16). This isn't actually true. In particular, the PLN instruction reads 2 logical registers in one of the components. This commit changes post-RA scheduling to use regs_read and regs_written instead so that we add enough dependencies. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770 Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* i965/nir/fs: Add comment for no-op memory barrier functionsFrancisco Jerez2015-11-061-0/+19
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965/nir/fs: Implement new barrier functions for compute shadersJordan Justen2015-11-061-0/+7
| | | | | | | | | | | | | | | | | | | | | | For these nir intrinsics, we emit the same code as nir_intrinsic_memory_barrier: * nir_intrinsic_memory_barrier_atomic_counter * nir_intrinsic_memory_barrier_buffer * nir_intrinsic_memory_barrier_image We treat these nir intrinsics as no-ops: * nir_intrinsic_group_memory_barrier * nir_intrinsic_memory_barrier_shared v3: * Add comment for no-op cases (curro) v4: * Moving comment to a separate patch authored by curro Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Fix scalar VS float[] and vec2[] output arrays.Kenneth Graunke2015-11-054-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scalar VS backend has never handled float[] and vec2[] outputs correctly (my original code was broken). Outputs need to be padded out to vec4 slots. In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However, this is wrong: type_size_scalar() for a float[2] would return 2, or for vec2[2] it would return 4. This looked like a single slot, even though in reality each array element would be stored in separate vec4 slots. Because of this bug, outputs[] and output_components[] would not get initialized for the second element's VARYING_SLOT, which meant emit_urb_writes() would skip writing them. Nothing used those values, and dead code elimination threw a party. To fix this, we introduce a new type_size_vec4_times_4() function which pads array elements correctly, but still counts in scalar components, generating correct indices in store_output intrinsics. Normally, varying packing avoids this problem by turning varyings into vec4s. So this doesn't actually fix any Piglit or dEQP tests today. However, if varying packing is disabled, things would be broken. Tessellation shaders can't use varying packing, so this fixes various tcs-input Piglit tests on a branch of mine. v2: Shorten the implementation of type_size_4x to a single line (caught by Connor Abbott), and rename it to type_size_vec4_times_4() (renaming suggested by Jason Ekstrand). Use type_size_vec4 rather than using type_size_vec4_times_4 and then dividing by 4. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Do not mark used surfaces in FS_OPCODE_GET_BUFFER_SIZEIago Toral Quiroga2015-11-052-4/+4
| | | | | | | | Do it in the visitor, like we do for other opcodes. v2: use const, get rid of useless surf_index temporary (Curro) Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Do not mark used surfaces in VS_OPCODE_GET_BUFFER_SIZEIago Toral Quiroga2015-11-052-5/+5
| | | | | | | | Do it in the visitor, like we do for other opcodes. v2: use const, get rid of useless surf_index temporary (Curro) Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Do not mark used direct surfaces in VS_OPCODE_PULL_CONSTANT_LOADIago Toral Quiroga2015-11-053-13/+8
| | | | | | | | | | Right now the generator marks direct surfaces as used but leaves marking of indirect surfaces to the caller. Just make the callers handle marking in both cases for consistency. v2: Use const, do not add unnecessary temporary (Curro) Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: Do not mark used direct surfaces in UNIFORM_PULL_CONSTANT_LOADIago Toral Quiroga2015-11-052-11/+1
| | | | | | | | Right now the generator marks direct surfaces as used but leaves marking of indirect surfaces to the caller. Just make the callers handle marking in both cases for consistency. Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: Do not mark direct used surfaces in VARYING_PULL_CONSTANT_LOADIago Toral Quiroga2015-11-053-13/+8
| | | | | | | | | | Right now the generator marks direct surfaces as used but leaves marking of indirect surfaces to the caller. Just make the callers handle marking in both cases for consistency. v2: Use const and remove useless surf_index temporary (Curro) Reviewed-by: Francisco Jerez <[email protected]>
* i965/skl+: Enable support for 16x multisamplingNeil Roberts2015-11-052-1/+10
| | | | Reviewed-by: Ben Widawsky <[email protected]>
* mesa/meta: Use interpolateAtOffset for 16x MSAA copy blitNeil Roberts2015-11-051-2/+37
| | | | | | | | | | | | | | | | | | | | | | | | | Previously there was a problem in i965 where if 16x MSAA is used then some of the sample positions are exactly on the 0 x or y axis. When the MSAA copy blit shader interpolates the texture coordinates at these sample positions it was possible that it would jump to a neighboring texel due to rounding errors. It is likely that these positions would be used on 16x MSAA because that is where they are defined to be in D3D. To fix that this patch makes it use interpolateAtOffset in the blit shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension is available. This forces it to interpolate the texture coordinates at the pixel center to avoid these problematic positions. This fixes ext_framebuffer_multisample-unaligned-blit and ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on SKL+. v2: Use interpolateAtOffset instead of interpolateAtSample v3: Always try to enable GL_ARB_gpu_shader5 in the shader [Ian Romanick] Reviewed-by: Anuj Phogat <[email protected]>
* meta/blit: Always try to enable GL_ARB_sample_shadingNeil Roberts2015-11-051-14/+2
| | | | | | | | | | | Previously this extension was only enabled when blitting between two multisampled buffers. However I don't think it does any harm to just enable it all the time. The ‘enable’ option is used instead of ‘require’ so that the shader will still compile if the extension isn't available in the cases where it isn't used. This will make the next patch simpler because it wants to add another optional extension. Reviewed-by: Anuj Phogat <[email protected]>
* meta: Support 16x MSAA in the multisample scaled blit shaderNeil Roberts2015-11-053-10/+35
| | | | | | v2: Fix the x_scale in the shader. Remove the doubts in the commit message. Reviewed-by: Anuj Phogat <[email protected]>
* i965/meta: Support 16x MSAA in the meta stencil blitNeil Roberts2015-11-051-5/+17
| | | | | | | The destination rectangle is now drawn at 4x4 the size and the shader code to calculate the sample number is adjusted accordingly. Acked-by: Ben Widawsky <[email protected]>
* i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAANeil Roberts2015-11-051-1/+7
| | | | | | | | In order to accomodate 16x MSAA, the starting sample pair index is now 3 bits rather than 2 on SKL+. Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Support allocating the MCS buffer for 16x MSAANeil Roberts2015-11-051-0/+6
| | | | | | When 16 samples are used the MCS buffer needs 64 bits per pixel. Reviewed-by: Ben Widawsky <[email protected]>
* i965: Support calculating the bits needed to set up 16x MSAANeil Roberts2015-11-051-1/+1
| | | | | | | The gen7_surface_msaa_bits function already returns the right values for 16 samples but it just needs its assert to be relaxed. Reviewed-by: Ben Widawsky <[email protected]>
* i965/fs: Add a sampler program key for whether the texture is 16x MSAANeil Roberts2015-11-053-1/+16
| | | | | | | | | | | | | | | | When 16x MSAA is used for sampling with texelFetch the compiler needs to use a different instruction which passes more arguments for the MCS data. Previously on skl+ it was unconditionally using this new instruction. However since 16x MSAA is probably going to be pretty rare, it is probably worthwhile to avoid using this instruction for the other sample counts. In order to do that this patch adds a new member to brw_sampler_prog_key_data to track when a sampler refers to a buffer with 16 samples. Note that this isn't done for the vec4 backend because it wouldn't change how many registers it uses. Acked-by: Ben Widawsky <[email protected]>
* i965/vec4/skl+: Use ld2dms_w instead of ld2dmsNeil Roberts2015-11-053-2/+18
| | | | | | | | | In order to support 16x MSAA, skl+ has a wider version of ld2dms that takes two parameters for the MCS data. The MCS data in the response still fits in a single register so we just need to ensure we copy both values rather than just the lower one. Acked-by: Ben Widawsky <[email protected]>
* i965/fs/skl+: Use ld2dms_w instead of ld2dmsNeil Roberts2015-11-056-5/+60
| | | | | | | | | | | | In order to support 16x MSAA, skl+ has a wider version of ld2dms that takes two parameters for the MCS data. The MCS data retrieved from the ld_mcs instruction already returns 4 or 8 registers and is documented to return zeroes for the mcsh value when the sample count is less than 16. v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when the message length would be too long in SIMD16. Reviewed-by: Ben Widawsky <[email protected]>
* i965: Program 16x MSAA sample positions.Neil Roberts2015-11-053-7/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the standard pattern used by the other 3D graphics API. BDW has slots for these values, but they aren't actually used until SKL. Even though the documentation for BDW says they must be zero, it doesn't seem to cause any harm to program them anyway. The comment above for the 8x sample positions says that the hardware implements centroid interpolation by picking the centre-most sample that is inside the primitive. That implies that it might be worthwhile to pick a pattern that includes 0.5,0.5. However by experimentation this doesn't seem to actually be the case. With the sample positions in this patch, if I modify the piglit test below so that it instead reports the centroid position, it reports 0.492188,0.421875 which doesn't match any of the positions. If I modify the sample positions so that they include one at exactly 0.5,0.5 it doesn't help and it reports another position which is even further from the center for some reason. arb_gpu_shader5-interpolateAtSample-different Kenneth Graunke experimented with some other patterns that have a higher standard deviation but I think after some discussion it was decided that it would be better to pick the same pattern as the other graphics API in case there are games that rely on this pattern. (Based on a patch by Kenneth Graunke) Cc: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
* i965: Handle 16x MSAA in IMS dimension munging code.Kenneth Graunke2015-11-051-2/+6
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Neil Roberts <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/vec4: select predicate based on writemask for sel emissionsAlejandro Piñeiro2015-11-051-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Equivalent to commit 8ac3b525c but with sel operations. In this case we select the PredCtrl based on the writemask. This patch helps on cases like this: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D 3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD In this case, cmod propagation can't optimize instruction #2, because instructions #1 and #2 have different writemasks, and we can't update directly instruction #2 writemask because our code thinks that sel at instruction #3 reads all four channels of the flag, when it actually only reads .x. So, with this patch, the previous case becames this: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD Now only the x channel of the flag is used, allowing dead code eliminate to update the writemask at the second instruction: 1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F 2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D 3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD So now cmod propagation can simplify out #2: 1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F 2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD Shader-db numbers: total instructions in shared programs: 6235835 -> 6228008 (-0.13%) instructions in affected programs: 219850 -> 212023 (-3.56%) total loops in shared programs: 1979 -> 1979 (0.00%) helped: 1192 HURT: 0
* i965: check inst->predicate when clearing flag_live at dead code eliminateAlejandro Piñeiro2015-11-042-2/+2
| | | | | | | Detected by Matt Turner while reviewing commit a59359ecd22154cc2b3f88bb8c599f21af8a3934 Reviewed-by: Matt Turner <[email protected]>
* i965/meta: Assert fast clears and rep clears never overlapBen Widawsky2015-11-031-0/+2
| | | | | | | | | | | There is nothing wrong with the code today, but as one modifies the code it turns out to be not too difficult to mess up the code, and this easy assertion should catch such driver implementation failures quickly. Cc: Kristian Høgsberg <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
* i965: enable ARB_arrays_of_arraysTimothy Arceri2015-11-041-0/+1
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: add support for image AoATimothy Arceri2015-11-042-14/+18
| | | | | | | | | | | | | V3: clamp array index to the correct size (the size of the current array rather than the inner array) Francisco Jerez. V2: avoid useless zero-initialization and addition for the first AoA level, avoid redundant temporary, make use of type_size_scalar(), rename aoa_size to element_size, assign the indirect indexing temporary directly to image.reladdr, and replace while loop with a for loop. All suggested by Francisco Jerez. Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Send from GRF in atomic operations.Matt Turner2015-11-031-12/+18
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add scalar geometry shader support.Kenneth Graunke2015-11-035-24/+666
| | | | | | | | | | | | | | | | | | | | This is hidden behind INTEL_SCALAR_GS=1 for now, as we don't yet support instanced geometry shaders, and Orbital Explorer's shader spills like crazy. But the infrastructure is in place, and it's largely working. v2: Lots of rebasing. v3: (feedback from Kristian Høgsberg) - Handle stride and subreg_offset correctly for ATTRs; use a helper. - Fix missing emit_shader_time_end() call. - Delete dead code after early EOT in static vertex case to avoid tripping asserts in emit_shader_time_end(). - Use proper D/UD type in intexp2(). - Fix "EndPrimitve" and "to that" typos. - Assert that invocations == 1 so we know this is missing. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Add scalar GS input lowering code.Kenneth Graunke2015-11-031-5/+39
| | | | | | | | | We really ought to compute the VUE map at link time and stash it, rather than recomputing it here, but with the mess of program structures I wasn't sure where to put it. We can improve that later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>