aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Make the fragment shader pull constants index by dwords, not vec4s.Eric Anholt2013-04-016-16/+19
| | | | | | | | | | | | | | | | | We want to load vec4s, since loading a vec4 instead of a dword is basically no increased latency. But for variable indexed access, the previous requirement of aligned vec4s for a sampler LD was hard to implement. Note that this change only affects those messages that use the surface format, like sampler LDs, but not to the untyped data cache loads we've used in other cases. No significant performance difference on my GLSL demo with uniforms forced to take the varying pull constants path (n=4). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make the constant surface interface take a normal byte size.Eric Anholt2013-04-014-17/+16
| | | | | | | | | This puts the rounding-up logic into the function itself instead of all the callers having to manage it. Also drop an "unused" comment in gen4, as the stride *is* used for texbos (and will be for uniforms soon). NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move varying uniform offset compuation into the helper func.Eric Anholt2013-04-013-11/+13
| | | | | | | | I'm going to want to change the math for gen7 using sampler LD instructions in a way that gets CSE to occur like we'd hope. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Remove creation of a MOV instruction that's never used.Eric Anholt2013-04-011-1/+0
| | | | | | | | We weren't inserting it into the list, so it did nothing. This line was replaced by the MOV/MUL block above. NOTE: This is a candidate for the 9.1 branch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Allow constant propagation into MACH.Eric Anholt2013-04-011-2/+4
| | | | | | | | This happens quite a bit with varying-index uniform loads. We could also do better by avoiding the MACH entirely, but there's no reason not to at least take this step. Reviewed-by: Kenneth Graunke <[email protected]>
* r600g/llvm: Update LLVM_REVISION.txtVincent Lejeune2013-04-011-1/+1
|
* r600g/llvm: Use stack_size provided from llvm.Vincent Lejeune2013-04-011-0/+1
|
* r600g/llvm: uses function attribute to pass shader typeVincent Lejeune2013-04-011-0/+1
|
* r600g/llvm: Add support for cf_alu native encodeVincent Lejeune2013-04-013-1/+16
|
* ACTIVE_UNIFORM_MAX_LENGTH should include 3 extra characters for arrays.Haixia Shi2013-04-011-2/+4
| | | | | | | | | | | | | | | | | If the active uniform is an array, then the length of the uniform name should include the three extra characters for the "[0]" suffix, which is required by the GL 4.2 spec to be appended to the uniform name in glGetActiveUniform(). This avoids the situation where the output buffer does not have enough space to hold the "[0]" suffix, resulting in an incomplete array specification like "foobar[0". NOTE: This is a candidate for the 9.1 branch. Change-Id: I41e87ba347a7169eec8c575596cc3416adbe0728 Signed-off-by: Haixia Shi <[email protected]> Reviewed-by: Stéphane Marchesin <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Fix bad interaction between tex swizzles and textureQueryLOD.Matt Turner2013-04-011-1/+1
| | | | | Reported-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove the old brw_optimize() code.Eric Anholt2013-04-013-120/+0
| | | | | | This is now done in the VS backend before instruction emit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Add a pass to set dependency control fields on instructions.Eric Anholt2013-04-013-0/+126
| | | | | | | This is a more aggressive version of the old brw_optimize() path. Reduces cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Dump shader source for linked shader programs.Eric Anholt2013-04-011-2/+18
| | | | | | | | | | We dump shader source in ir_to_mesa.cpp, and we dump linked programs here, but we had no reference from the linked programs to their source. This was preventing improvement of shader-db to use linked shader programs instead of individual shader files (which is bogus, because it means we optimize out VS outputs, and don't interpolate FS inputs!) Reviewed-by: Kenneth Graunke <[email protected]>
* clover: Fix build with LLVM 3.3Mike Lothian2013-04-011-1/+2
|
* llvmpipe: use triangle subdivision to avoid fixed-point overflow issuesBrian Paul2013-04-013-0/+186
| | | | | | | | | | | | If we're drawing to a surface that's 2048 x 2048 pixels or larger there's danger of fixed-point overflow in the triangle rasterization code. That leads to various rendering glitches. Rather than implement some intricate changes to the rasterization code, simply subdivide triangles into smaller subtriangles to avoid the issue. Only do this when the drawing surface is larger than 2048 by 2048. Reviewed-by: José Fonseca <[email protected]>
* mesa: remove platform checks around __builtin_ffs, __builtin_ffsllBrian Paul2013-04-011-6/+0
| | | | | | | | | | | Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC, not just for specific platforms. Fixes Solaris build. Note: This is a candidate for the stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868 Signed-off-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* docs: add a new page documenting known application issuesBrian Paul2013-04-012-0/+84
| | | | | | Let's try to update this when we find other broken applications... Reviewed-by: José Fonseca <[email protected]>
* drirc: set always_have_depth_buffer for TopogonBrian Paul2013-04-011-0/+6
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: Minor comment cleanupAdam Jackson2013-04-011-1/+1
| | | | Signed-off-by: Adam Jackson <[email protected]>
* mesa: fix texture storage multisample prototypes harder.Dave Airlie2013-04-012-4/+4
| | | | | | I just noticed the warnings since I fixed the other bit. Signed-off-by: Dave Airlie <[email protected]>
* r600g/llvm: Update LLVM_REVISIONVincent Lejeune2013-03-311-1/+1
|
* r600g/llvm: use native encode for texVincent Lejeune2013-03-311-23/+27
|
* glapi: fix storage multisample build errorsDave Airlie2013-03-311-2/+2
| | | | | | Reported on #radeon by udovdh Signed-off-by: Dave Airlie <[email protected]>
* docs: mark ARB_texture_storage_multisample doneChris Forbes2013-03-311-1/+1
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: enable ARB_texture_storage_multisample on Gen6+Chris Forbes2013-03-311-0/+1
| | | | | | | | | This can be enabled everywhere that ARB_texture_multisample is supported -- ARB_texture_storage is supported on everything. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: allow multisample texture targets in [Get]TexParameter*Chris Forbes2013-03-311-1/+87
| | | | | | | | | | | | | | | | | | | | ARB_texture_storage_multisample allows texture parameters to be queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY targets. Some parameters may also be set, with the following exceptions: - TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates INVALID_OPERATION - any state which appears in the `per-sampler` state table may not be set; generates INVALID_OPERATION V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: improve reported function name in Tex*MultisampleChris Forbes2013-03-311-16/+14
| | | | | | | | | Now that there are 4 variants, just pass the function name into teximagemultisample rather than reconstructing it. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: add enable bit for ARB_texture_storage_multisampleChris Forbes2013-03-312-0/+2
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* glapi: add definition of ARB_texture_storage_multisampleChris Forbes2013-03-315-2/+68
| | | | | | | | | | Adds XML for the extension, dispatch_sanity enabling, and the two new entrypoints. These are both implemented by calling the shared teximagemultisample() with immutable=GL_TRUE. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: add support for immutable textures to teximagemultisample()Chris Forbes2013-03-311-3/+28
| | | | | | | | | | | | | | | | | | The new entrypoints will come later, but this adds the actual logic for supporting immutable multisample textures: - The immutability flag is set as desired. - Attempting to modify an immutable multisample texture produces INVALID_OPERATION. Note: The extension spec does not mention adding this behavior to TexImage*Multisample, but it seems like the reasonable thing to do. V2: - Cover missing error cases (unsized formats; texture object zero) Signed-off-by: Chris Forbes <[email protected]> [V1] Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: extract _mesa_is_legal_tex_storage_format helperChris Forbes2013-03-312-17/+23
| | | | | | | This is about to be used in teximagemultisample() when immutable=true. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.Kenneth Graunke2013-03-301-10/+0
| | | | | | | | These haven't been used since we deleted NV_vertex_program support. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.Eric Anholt2013-03-301-7/+11
| | | | | | | | | | | | | We are intentionally not allocating a slot for gl_ClipVertex. But by leaving the bit set in the slots_valid, the fragment shader's computation of where varyings are in urb entry coming out of the SF would be off by one. Fixes rendering in Freespace 2 SCP, and improves rendering in TF2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830 Tested-by: Joaquín Ignacio Aramendía <[email protected]> NOTE: This is a candidate for the 9.1 branch. Reviewed-and-tested-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* intel: Remove a never-taken debug print path.Eric Anholt2013-03-301-5/+0
| | | | | | | | Alessandro Pignotti noted when I added this code in commit 0e723b135bfd59868c92c3ae243f1adaedaec3a5 that it's in the else block for "if (busy)", so this debug print couldn't happen. Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: add ir_lod case in GLSL->TGSI code to silence warningBrian Paul2013-03-291-0/+3
|
* glsl: Generated masked write instead of vector array index for UBO loweringIan Romanick2013-03-291-7/+3
| | | | | | | | | | | | | | | When reading a column from a row-major matrix, we would slot the single value read into the vector using an ir_dereference_array of the vector with a constant index. This will (eventually) get optimized to a masked-write, so just generate the masked write in the first place. v2: Remove unused variable 'chan'. Suggested by Ken. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: Eric Anholt <[email protected]>
* glsl: Replace open-coded dot-product with dotIan Romanick2013-03-291-4/+5
| | | | | | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: Eric Anholt <[email protected]> Cc: Paul Berry <[email protected]>
* glsl: Replace constant-index vector array accesses with swizzlesIan Romanick2013-03-292-87/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | Search and replace: ][0] -> ].x ][1] -> ].y ][2] -> ].z ][3] -> ].w Fixes piglit tests inverse-mat[234].{vert,frag}. These tests call the inverse function with constant parameters and expect proper constant folding to happen. My suspicion is that this patch papers over some bug in constant propagation involving array accesses. Either way, all of these accesses eventually get lowered to swizzles. This cuts out the middle man (saving a trivial amount of CPU). NOTE: This is a candidate for the 9.1 branch. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: Eric Anholt <[email protected]> Cc: Paul Berry <[email protected]>
* glsl: Add missing bool case in glsl_type::get_scalar_typeIan Romanick2013-03-291-0/+2
| | | | | | | | | | | | Since the case was missing bec4->get_scalar_type() would return bvec4, but vec4->get_scalar_type() would return float. NOTE: This is a candidate for stable branches. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.Kenneth Graunke2013-03-295-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "discard" instructions generate HALT instructions which jump to a final HALT near the end of the shader. Previously, fs_generator created this final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it to jump right before the FB write epilogue. This is normally good. However, INTEL_DEBUG=shader_time also has an epilogue section which records the final timestamp. The frontend emits IR for this just before FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering: 1. Shader Time Epilogue 2. Final HALT (where discards jump) 3. Framebuffer Write Epilogue This meant that discarded pixels completely skipped the shader time epilogue, causing no ending timestamp to be written. This obviously led to inaccurate results. This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just before any epilogue sections. This is where the final HALT should be generated, and makes it easy to ensure the correct ordering: 1. Final HALT 2. Shader Time Epilogue 3. Framebuffer Write Epilogue For shaders that don't discard, this opcode compiles away to nothing. The scheduler adds barrier dependencies to make sure that it doesn't get moved above any FS_OPCODE_DISCARD_JUMP instructions. One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to a mere 5.13 Gcycles. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add names for all instructions to dump_instruction() in FS and VS.Eric Anholt2013-03-294-25/+113
| | | | | | | I'd previously added the minimum names to understand my dumps, but this makes dumps in general much easier to read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_texture_query_lod.Matt Turner2013-03-292-2/+4
| | | | | v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Generate LOD sampler message from ir_lod.Matt Turner2013-03-295-1/+20
| | | | | v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Implement ARB_texture_query_lodDave Airlie2013-03-2918-18/+101
| | | | | | | | | | | | | | | | | | | v2 [mattst88]: - Rebase. - #define GL_ARB_texture_query_lod to 1. - Remove comma after ir_lod in ir.h for MSVC. - Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp, opt_tree_grafting.cpp. - Rename textureQueryLOD to textureQueryLod, see https://www.khronos.org/bugzilla/show_bug.cgi?id=821 - Fix ir_reader of (lod ...). v3 [mattst88]: - Rename textureQueryLod to textureQueryLOD, pending resolution of Khronos 821. - Add ir_lod case to ir_to_mesa.cpp. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Use measured Gen7 instruction timings on Gen6.Matt Turner2013-03-291-1/+4
| | | | | | | | | | | | | | | | | | x before + after +------------------------------------------------------------------------------+ | x x + | | xx ++ x + | | xx ++ + xx ++ | |x xxx x+++++ + xxx x*x+*+++ + x +| | |_____|____________A______A____M____M_|_______| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 23 8083.78 8287.83 8205.55 8162.7461 68.307951 + 23 8107.56 8358.74 8224.33 8186.1765 71.506301 No difference proven at 95.0% confidence Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Increase and document MAD latency on Gen7.Matt Turner2013-03-291-4/+18
| | | | | | | 58% of mad(8) generated in shader-db are reading registers from the same bank. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Add LRP instruction latency.Matt Turner2013-03-291-0/+26
| | | | | | | | Set its latency to what happens to be the default floating-point instruction latency. One day we may want to handle latency based on register bank information. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Add Haswell cycle timingsMatt Turner2013-03-291-9/+9
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Note that write-after-write dependencies are blocking.Matt Turner2013-03-291-1/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>