summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* i965/vs: Add a pass to set dependency control fields on instructions.Eric Anholt2013-04-013-0/+126
| | | | | | | This is a more aggressive version of the old brw_optimize() path. Reduces cycles spent in the vertex shader on minecraft by 18.6% +/- 10.0% (n=15). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Dump shader source for linked shader programs.Eric Anholt2013-04-011-2/+18
| | | | | | | | | | We dump shader source in ir_to_mesa.cpp, and we dump linked programs here, but we had no reference from the linked programs to their source. This was preventing improvement of shader-db to use linked shader programs instead of individual shader files (which is bogus, because it means we optimize out VS outputs, and don't interpolate FS inputs!) Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: remove platform checks around __builtin_ffs, __builtin_ffsllBrian Paul2013-04-011-6/+0
| | | | | | | | | | | Use the __builtin_ffs, __builtin_ffsll functions whenever we have GCC, not just for specific platforms. Fixes Solaris build. Note: This is a candidate for the stable branches. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62868 Signed-off-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* drirc: set always_have_depth_buffer for TopogonBrian Paul2013-04-011-0/+6
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* mesa: fix texture storage multisample prototypes harder.Dave Airlie2013-04-012-4/+4
| | | | | | I just noticed the warnings since I fixed the other bit. Signed-off-by: Dave Airlie <[email protected]>
* i965: enable ARB_texture_storage_multisample on Gen6+Chris Forbes2013-03-311-0/+1
| | | | | | | | | This can be enabled everywhere that ARB_texture_multisample is supported -- ARB_texture_storage is supported on everything. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: allow multisample texture targets in [Get]TexParameter*Chris Forbes2013-03-311-1/+87
| | | | | | | | | | | | | | | | | | | | ARB_texture_storage_multisample allows texture parameters to be queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY targets. Some parameters may also be set, with the following exceptions: - TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates INVALID_OPERATION - any state which appears in the `per-sampler` state table may not be set; generates INVALID_OPERATION V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: improve reported function name in Tex*MultisampleChris Forbes2013-03-311-16/+14
| | | | | | | | | Now that there are 4 variants, just pass the function name into teximagemultisample rather than reconstructing it. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: add enable bit for ARB_texture_storage_multisampleChris Forbes2013-03-312-0/+2
| | | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* glapi: add definition of ARB_texture_storage_multisampleChris Forbes2013-03-313-2/+33
| | | | | | | | | | Adds XML for the extension, dispatch_sanity enabling, and the two new entrypoints. These are both implemented by calling the shared teximagemultisample() with immutable=GL_TRUE. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: add support for immutable textures to teximagemultisample()Chris Forbes2013-03-311-3/+28
| | | | | | | | | | | | | | | | | | The new entrypoints will come later, but this adds the actual logic for supporting immutable multisample textures: - The immutability flag is set as desired. - Attempting to modify an immutable multisample texture produces INVALID_OPERATION. Note: The extension spec does not mention adding this behavior to TexImage*Multisample, but it seems like the reasonable thing to do. V2: - Cover missing error cases (unsized formats; texture object zero) Signed-off-by: Chris Forbes <[email protected]> [V1] Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: extract _mesa_is_legal_tex_storage_format helperChris Forbes2013-03-312-17/+23
| | | | | | | This is about to be used in teximagemultisample() when immutable=true. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.Kenneth Graunke2013-03-301-10/+0
| | | | | | | | These haven't been used since we deleted NV_vertex_program support. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.Eric Anholt2013-03-301-7/+11
| | | | | | | | | | | | | We are intentionally not allocating a slot for gl_ClipVertex. But by leaving the bit set in the slots_valid, the fragment shader's computation of where varyings are in urb entry coming out of the SF would be off by one. Fixes rendering in Freespace 2 SCP, and improves rendering in TF2. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830 Tested-by: Joaquín Ignacio Aramendía <[email protected]> NOTE: This is a candidate for the 9.1 branch. Reviewed-and-tested-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* intel: Remove a never-taken debug print path.Eric Anholt2013-03-301-5/+0
| | | | | | | | Alessandro Pignotti noted when I added this code in commit 0e723b135bfd59868c92c3ae243f1adaedaec3a5 that it's in the else block for "if (busy)", so this debug print couldn't happen. Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: add ir_lod case in GLSL->TGSI code to silence warningBrian Paul2013-03-291-0/+3
|
* i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.Kenneth Graunke2013-03-295-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "discard" instructions generate HALT instructions which jump to a final HALT near the end of the shader. Previously, fs_generator created this final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it to jump right before the FB write epilogue. This is normally good. However, INTEL_DEBUG=shader_time also has an epilogue section which records the final timestamp. The frontend emits IR for this just before FS_OPCODE_FB_WRITE. Unfortunately, this led to the following ordering: 1. Shader Time Epilogue 2. Final HALT (where discards jump) 3. Framebuffer Write Epilogue This meant that discarded pixels completely skipped the shader time epilogue, causing no ending timestamp to be written. This obviously led to inaccurate results. This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just before any epilogue sections. This is where the final HALT should be generated, and makes it easy to ensure the correct ordering: 1. Final HALT 2. Shader Time Epilogue 3. Framebuffer Write Epilogue For shaders that don't discard, this opcode compiles away to nothing. The scheduler adds barrier dependencies to make sure that it doesn't get moved above any FS_OPCODE_DISCARD_JUMP instructions. One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to a mere 5.13 Gcycles. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add names for all instructions to dump_instruction() in FS and VS.Eric Anholt2013-03-294-25/+113
| | | | | | | I'd previously added the minimum names to understand my dumps, but this makes dumps in general much easier to read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Enable ARB_texture_query_lod.Matt Turner2013-03-291-1/+3
| | | | | v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Generate LOD sampler message from ir_lod.Matt Turner2013-03-295-1/+20
| | | | | v2: Support Ironlake as well. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Implement ARB_texture_query_lodDave Airlie2013-03-293-0/+5
| | | | | | | | | | | | | | | | | | | v2 [mattst88]: - Rebase. - #define GL_ARB_texture_query_lod to 1. - Remove comma after ir_lod in ir.h for MSVC. - Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp, opt_tree_grafting.cpp. - Rename textureQueryLOD to textureQueryLod, see https://www.khronos.org/bugzilla/show_bug.cgi?id=821 - Fix ir_reader of (lod ...). v3 [mattst88]: - Rename textureQueryLod to textureQueryLOD, pending resolution of Khronos 821. - Add ir_lod case to ir_to_mesa.cpp. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Use measured Gen7 instruction timings on Gen6.Matt Turner2013-03-291-1/+4
| | | | | | | | | | | | | | | | | | x before + after +------------------------------------------------------------------------------+ | x x + | | xx ++ x + | | xx ++ + xx ++ | |x xxx x+++++ + xxx x*x+*+++ + x +| | |_____|____________A______A____M____M_|_______| | +------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 23 8083.78 8287.83 8205.55 8162.7461 68.307951 + 23 8107.56 8358.74 8224.33 8186.1765 71.506301 No difference proven at 95.0% confidence Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Increase and document MAD latency on Gen7.Matt Turner2013-03-291-4/+18
| | | | | | | 58% of mad(8) generated in shader-db are reading registers from the same bank. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Add LRP instruction latency.Matt Turner2013-03-291-0/+26
| | | | | | | | Set its latency to what happens to be the default floating-point instruction latency. One day we may want to handle latency based on register bank information. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Add Haswell cycle timingsMatt Turner2013-03-291-9/+9
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Note that write-after-write dependencies are blocking.Matt Turner2013-03-291-1/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* i965: Reword comment about the shared mathbox.Matt Turner2013-03-291-4/+4
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: provide default implementation of QuerySamplesForFormatChris Forbes2013-03-293-1/+21
| | | | | | | | | | | | | | Previously at least i915 failed to provide an implementation, but exposed ARB_internalformat_query anyway, leading to crashes when QueryInternalformativ was called. Default implementation just returns 1 for everything, so is suitable for any driver which does not support multisampling. V2: - Move from intel to core mesa. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printingMarek Olšák2013-03-281-0/+3
| | | | Reviewed-by: Brian Paul <[email protected]>
* i965: Tidy shader time printing code by using printf's field widths.Kenneth Graunke2013-03-281-12/+4
| | | | | | | | | | | | | We can use %-6s%-6s rather than manually counting characters, resulting in much more readable code. This necessitates a small secondary change: using "total fs16" and "" now causes the "" string to be padded out to 6 characters, resulting in too much whitespace. Splitting it into "total" and "fs16" produces the same output as before. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965/vs: Include URB payload setup in shader_time.Eric Anholt2013-03-282-4/+11
| | | | | | | | This much more accurately reflects the cost of the vertex shader, since the payload setup is often a significant fraction of the instructions in the VS. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Use a send from a 2-register VGRF for shader time writes.Eric Anholt2013-03-282-14/+13
| | | | | | | This will let us emit it later, after we're setting up MRFs for the URB write. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Teach copy propagation about sends from GRFs.Eric Anholt2013-03-283-7/+29
| | | | | | | This incidentally also teaches it a bit about gen6 math -- we now allow unswizzled, unmodified GRF temps as the sources for math. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.Eric Anholt2013-03-282-20/+45
| | | | | | v2: Fix silly bool handling, and don't add new tabs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Include everything but the final FB write in shader_time.Eric Anholt2013-03-282-5/+15
| | | | | | | | Previously, if you just wrote a constant color to the render target, no time got noted at all. This is convenient for doing single-instruction timings, but not so much for actual program analysis. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Switch shader_time writes to using GRFs.Eric Anholt2013-03-286-19/+63
| | | | | | | | | This avoids conflicts between shader_time and FB writes, so we can include more of the program under our profiling. This does mean hiding more of the message setup from the optimizer, which doesn't have a way to handle multi-reg sends from GRFs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Provide more detailed information to match shader_time to programs.Eric Anholt2013-03-281-13/+50
| | | | | | | | | Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader names, and I realized that it was really unclear. I'd like to do even better, like noting which one is the clear shader, but that would require exposing the metaops struct to the driver. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Track ARB program state along with GLSL state for shader_time.Eric Anholt2013-03-286-29/+47
| | | | | | This will let us do much better printouts for non-GLSL programs. Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: remove leftover printfs from ReadPixelsMarek Olšák2013-03-281-3/+0
| | | | Oops, I thought I had removed all debugging code.
* i965/fs: Improve performance of copy propagation dataflow using bitsets.Eric Anholt2013-03-281-33/+34
| | | | | | Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10). Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: only check sample count if we actually wanted multisamplingChris Forbes2013-03-271-9/+10
| | | | | | | | | | | | | Fixes various test fallout from 90b5a2425a on Pineview, which claims to support ARB_internalformat_query but doesn't actually provide the driverfunc. That driver is still broken [GetInternalformativ will still segfault!] but it was silly to be going through the sample count logic in the nonmultisampling case at all. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl_to_tgsi: avoid creating arrays if driver doesn't support themChristian König2013-03-261-1/+3
| | | | | | Avoid creating arrays if we replace indirect addressing anyway. Signed-off-by: Christian König <[email protected]>
* glsl_to_tgsi: make simplify_cmp work with arraysChristian König2013-03-261-1/+1
| | | | | | | | | Even when we have arrays it is possible for simplify_cmp to work on temps, just not on arrays. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=62696 Signed-off-by: Christian König <[email protected]>
* st/mesa: fix crash with blit-based GetTexImageMarek Olšák2013-03-261-1/+1
| | | | | | https://bugs.freedesktop.org/show_bug.cgi?id=62573 Tested-by: Andreas Boll <[email protected]>
* cso: add constant buffer save/restore feature for postprocessingMarek Olšák2013-03-261-3/+5
| | | | | Postprocessing is an internal meta op and should restore the states it changes.
* swrast: init vars to silence warningsBrian Paul2013-03-251-4/+4
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* android: fix Android.mk bug in mesa/drivers/dri/commonAdrian Marius Negreanu2013-03-251-1/+2
| | | | | | | | | | | target-specific variables are undefined when used as pre-requisites. instead, use secondary-expansion. I noticed this when building the patch: i965: Add a driconf option to disable flush throttling Signed-off-by: Adrian Marius Negreanu <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Shrink brw_vue_map struct.Paul Berry2013-03-242-2/+10
| | | | | | | | | | | | This patch changes the arrays in brw_vue_map (which only ever contain values from -1 to 58) from ints to signed chars. This reduces the size of the struct from 488 bytes to 136 bytes. Reviewed-by: Kenneth Graunke <[email protected]> v2: fix STATIC_ASSERT to use 127 instead of 128. Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Rename vp_outputs_written to input_slots_valid.Paul Berry2013-03-243-7/+7
| | | | | | | | | | | | | | With the introduction of geometry shaders, fragment inputs will no longer come exclusively from the vertex shader; sometimes they come from the geometry shader. So the name "vp_outputs_written" will become a misnomer. This patch renames vp_outputs_written to input_slots_valid, to reflect the true meaning of the bitfield from the fragment shader's point of view: it indicates which of the possible input slots contain valid data that was written by the previous shader stage. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use brw.vue_map_geom_out instead of VS output VUE map where appropriate.Paul Berry2013-03-247-31/+29
| | | | | | | | | | | | | This patch modifies post-GS pipeline stages (transform feedback, clip, sf, fs) to refer to the VUE map through brw->vue_map_geom_out rather than brw->vs.prog_data->vue_map. This ensures that when geometry shader support is added, these pipeline stages will consult the geometry shader output VUE map when appropriate, rather than the vertex shader output VUE map. v2: Fixed some stale "CACHE_NEW_VS_PROG" comments. Reviewed-by: Kenneth Graunke <[email protected]>