aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965
Commit message (Collapse)AuthorAgeFilesLines
...
* i965: Make INTEL_DEBUG=mip print out whether HiZ is enabled.Kenneth Graunke2014-06-161-0/+2
| | | | | | | | | We only enable HiZ for miplevels which are aligned on 8x4 blocks. When debugging HiZ failures, it's useful to know whether a particular miplevel is using HiZ or not. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/cs: Use override structure rather than separate env varJordan Justen2014-06-162-4/+2
| | | | | | | | | | | | In 25268b93, we added a new environment variable (INTEL_COMPUTE_SHADER) to allow some constant values to be upgraded for the ARB_compute_shader extension. Now, we can look to see if the extension was enabled via the MESA_EXTENSION_OVERRIDE environment variable. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* Enable GL_ARB_explicit_uniform_location in the drivers.Tapani Pälli2014-06-161-0/+1
| | | | | | | v2: enable also for i915 (Ian) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Petri Latvala <[email protected]>
* i965/vec4: Use the sampler for pull constant loads on Broadwell.Kenneth Graunke2014-06-151-8/+8
| | | | | | | | | | | | | | | | | | | | | We've used the LD sampler message for pull constant loads on earlier hardware for some time, and also were already using it for the FS on Broadwell. This patch makes us use it for Broadwell VS/GS as well. I believe that when I wrote this code in 2012, we still used the data port in some cases, and I somehow neglected to convert it while rebasing. Improves performance in GLBenchmark 2.7 Egypt by 416.978% +/- 2.25821% (n = 17). Many other applications should benefit similarly: this speeds up uniform array access in the VS, which is commonly used for skinning shaders, among other things. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Tested-by: Ben Widawsky <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing newlines to a few perf_debug messages.Kenneth Graunke2014-06-151-2/+2
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Drop Broadwell perf_debugs about missing MOCS that aren't missing.Kenneth Graunke2014-06-152-4/+0
| | | | | | | | | I actually added MOCS support for these things, but forgot to delete the corresponding perf_debug() warnings. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing MOCS setup for 3DSTATE_INDEX_BUFFER on Broadwell.Kenneth Graunke2014-06-151-3/+1
| | | | | | | | Somehow I missed this when adding all of the other MOCS values. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/vec4: Fix dead code elimination for VGRFs of size > 1.Kenneth Graunke2014-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | When faced with code such as: mov vgrf31.0:UD, 960D mov vgrf31.1:UD, vgrf30.xxxx:UD The dead code eliminator didn't consider reg_offsets, so it decided that the second instruction was writing was writing to the same register as the first one, and eliminated the first one. But they're actually different registers. This fixes INTEL_DEBUG=shader_time for vertex shaders. In the above code, vgrf31.0 represents the offset into the shader_time buffer where the data should be written, and vgrf31.1 represents the actual time data. With a completely undefined offset, results were...unexpected. I think this is probably one of the few cases (maybe only case) where we generate multiple MOVs to a large VGRF. Normally, we just use them as texturing results; the other SEND-from-GRF uses a size 1 VGRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79029 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Add SHADER_OPCODE_SHADER_TIME_ADD to dump_instructions() decode.Kenneth Graunke2014-06-151-0/+2
| | | | | | | "shader_time_add" is a lot more informative than "op152". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Set the fast clear color value for texture surfacesNeil Roberts2014-06-122-2/+6
| | | | | | | | | | | | | | When a multisampled texture is used for sampling the fast clear color value needs to be programmed into the surface state. This was being left as all zeroes so if the surface was cleared to a value other than black then it wouldn't work properly. This doesn't matter for single-sample textures because in that case the MCS buffer is resolved before it is used as a texture source. https://bugs.freedesktop.org/show_bug.cgi?id=79729 Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.1 10.2" <[email protected]>
* i965: Fix disassembly of BLORP clear programs.Kenneth Graunke2014-06-121-1/+1
| | | | | | | Too many levels of indirection. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Move FB write default state mashing in a level.Kenneth Graunke2014-06-121-7/+7
| | | | | | | | | | We only need to alter the default state if we're emitting MOVs for header related fields. So, we can simply move the push/pop of state in to the if (header_present) block, bypassing it in the common case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
* i965: Fix Haswell discard regressions since Gen4-5 line AA fix.Kenneth Graunke2014-06-121-2/+7
| | | | | | | | | | | | | | | | In commit dc2d3a7f5c217a7cee92380fbf503924a9591bea, Iago accidentally moved fire_fb_write() above the brw_pop_insn_state(), which caused the SEND to lose its predication and change from WE_normal to WE_all. Haswell uses predicated SENDs for discards, so this broke Piglit's tests for discards. We want the Gen4-5 MOV to be uncompressed, unpredicated, and unmasked, but the actual FB write itself should respect those. So, pop state first, and force it again around the single MOV. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79903
* i965: Use brw->gen in some generation checks.Matt Turner2014-06-115-11/+17
| | | | | | | Will simplify the automated conversion if we want to allow compiling the driver for a single generation. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Clean up tabs in brw_fs_cse.cpp.Matt Turner2014-06-111-43/+43
| | | | I'm adding vec4 CSE, and I want to diff the files.
* i965/vec4: Emit smarter code for b2f of a comparisonIan Romanick2014-06-112-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | Previously we would emit the comparison, emit an AND to mask off extra bits from the comparison result, then convert the result to float. Now, do the comparison, then use a cleverly constructed SEL to pick either 0.0f or 1.0f. No piglit regressions on Ivybridge. total instructions in shared programs: 1642311 -> 1639449 (-0.17%) instructions in affected programs: 136533 -> 133671 (-2.10%) GAINED: 0 LOST: 0 Programs that are affected appear to save between 1 and 5 instuctions (just by skimming the output from shader-db report.py. v2: s/b2i/b2f/ in commit subject (noticed by Chris Forbes). Remove extraneous fix_3src_operand (suggested by Matt). The latter change required swapping the order of the operands and using predicate_inverse. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Silence a couple unused parameter warningsIan Romanick2014-06-111-2/+2
| | | | | | | | brw_vec4_visitor.cpp:2717:1: warning: unused parameter 'ir' [-Wunused-parameter] brw_vec4_visitor.cpp:2723:1: warning: unused parameter 'ir' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add GPU BLIT of texture image to PBO in Intel driverJon Ashburn2014-06-101-0/+110
| | | | | | | | | | | | | | | | | | | Add Intel driver hook for glGetTexImage to accelerate the case of reading texture image into a PBO. This case gets huge performance gains by using GPU BLIT directly to PBO rather than GPU BLIT to temporary texture followed by memcpy. No regressions on Piglit tests with Intel driver. Performance gain (1280 x 800 FBO, Ivybridge): glGetTexImage + glMapBufferRange with patch 1.45 msec glGetTexImage + glMapBufferRange without patch 4.68 msec v3: (by Kenneth Graunke) - Fix compile after Eric's change to drop the tiling argument to intel_miptree_create_for_bo. - Add GL_TEXTURE_3D to blacklisted texture targets to prevent Piglit regressions. - Squash in several whitespace and coding style fixes.
* i965: Invalidate live intervals when inserting Gen4 SEND workarounds.Kenneth Graunke2014-06-101-0/+6
| | | | | | | | | We need to invalidate the live intervals when inserting new instructions. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965: Don't use the head sentinel as an fs_inst in Gen4 workaround code.Kenneth Graunke2014-06-101-1/+1
| | | | | | | | | | | When walking backwards, we want to stop at the head sentinel, which is where scan_inst->prev->prev == NULL, not scan_inst->prev == NULL. Fixes random crashes, as well as valgrind errors. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* i965/fs: Combine generate_math[12]_gen6 methods.Kenneth Graunke2014-06-102-33/+13
| | | | | | | | | | | | | | These used to call different math emitters (brw_math vs. brw_math2). Now that they both call gen6_math, they're virtually identical. When unrolling SIMD16 to multiple SIMD8 operations, we should take care not to apply sechalf to brw_null_reg for src1. Otherwise, we'd end up with BRW_ARF_NULL + 1 as the register number, and I'm not sure if that's valid. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Drop the generate_math[12]_gen7 methods.Kenneth Graunke2014-06-102-30/+5
| | | | | | | | | | These functions are basically identical, so we should combine them. However, they're so trivial, we may as well just fold them into their only call sites. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Combine generate_math[12]_gen6 methods.Kenneth Graunke2014-06-102-28/+12
| | | | | | | | | These are trivial to combine: we should just avoid checking the second operand if it's brw_null_reg. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/vec4: Drop the generate_math2_gen7() method.Kenneth Graunke2014-06-102-14/+1
| | | | | | | | | It's now a single line of code, so we may as well fold it into the caller. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Rename brw_math to gen4_math.Kenneth Graunke2014-06-106-46/+46
| | | | | | | | | Usually, I try to use "brw" for functions that apply to all generations, and "gen4" for dead end/legacy code that is only used on Gen4-5. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Split Gen4-5 and Gen6+ MATH instruction emitters.Kenneth Graunke2014-06-104-89/+39
| | | | | | | | | | | | | | | | | | | | Our existing functions, brw_math and brw_math2, had unclear roles: Gen4-5 used brw_math for both unary and binary math functions; it never used brw_math2. Since operands are already in message registers, this is reasonable. Gen6+ used brw_math for unary math functions, and brw_math2 for binary math functions, duplicating a lot of code. The only real difference was that brw_math used brw_null_reg() for src1. This patch improves brw_math2's assertions to allow both unary and binary operations, renames it to gen6_math(), and drops the Gen6+ code out of brw_math(). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Make src_reg::equals() take a constant reference, not a pointer.Kenneth Graunke2014-06-103-14/+14
| | | | | | | This is more typical C++ style. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Don't set the "switch" flag on control flow instructions on Gen6+.Kenneth Graunke2014-06-101-6/+4
| | | | | | | | | | | | | | Thread switching on control flow instructions is a documented workaround for Gen4-5 errata. As far as I can tell, it hasn't been needed since Sandybridge. Thread switching is not free, so in theory this may help performance slightly. Flow control instructions with the "switch" flag cannot be compacted, so removing it will make these instructions compactable. (Of course, we still have to implement compaction for flow control instructions...) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Allow CSE on math opcodes on Gen6+.Kenneth Graunke2014-06-101-0/+11
| | | | | | | | | | total instructions in shared programs: 2081469 -> 2081248 (-0.01%) instructions in affected programs: 22606 -> 22385 (-0.98%) No programs were hurt by this patch. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Replace open-coded linked list with exec_list.Matt Turner2014-06-104-62/+45
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Make gen7_pi field of brw_instruction use unsigned instead of GLuintKristian Høgsberg2014-06-091-12/+12
| | | | | | | | Nothing else uses GL-types here. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Don't include mtypes.h in brw_disasm.cKristian Høgsberg2014-06-091-2/+0
| | | | | | | | It's not used. Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: initialize src as reg_undef for texture opcodes on Gen4.Matt Turner2014-06-091-6/+6
| | | | Untested.
* i965/fs: initialize src as reg_undef for texture opcodes on Gen5/6.Tapani Pälli2014-06-091-9/+9
| | | | | | | | | | | | Commit 07af0ab changed fs_inst to have 0 sources for texture opcodes in emit_texture_gen5 (Ironlake, Sandybrige) while fs_generator still uses a single source from brw_reg struct. Patch sets src as reg_undef which matches the behavior before the constructor got changed. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Signed-off-by: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79534
* i965/disasm: Properly debug negate source modifier for logical instructionsAbdiel Janulgue2014-06-091-3/+21
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/vec4: skip copy-propate for logical instructions with negated src entriesAbdiel Janulgue2014-06-091-0/+17
| | | | | | | | | The negation source modifier on src registers has changed meaning in Broadwell when used with logical operations. Don't copy propagate when negate src modifier is set and when the destination instruction is a logical op. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/fs: skip copy-propate for logical instructions with negated src entriesAbdiel Janulgue2014-06-091-0/+17
| | | | | | | | | The negation source modifier on src registers has changed meaning in Broadwell when used with logical operations. Don't copy propagate when negate src modifier is set and when the destination instruction is a logical op. Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965/fs: Refactor check for potential copy propagated instructions.Abdiel Janulgue2014-06-091-10/+17
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]>
* i965: Ensure that we end instruction streams properly.Iago Toral Quiroga2014-06-091-0/+2
| | | | | | | | | | | | | Threads must terminate with a SEND message to a particular shared function, such as a URB write or FB write, so the instruction stream really shouldn't ever end in an IF/ELSE/ENDIF or similar block structure. However, if the instruction stream (incorrectly) ends in a block structure the last block's end pointer will not be set, leading to a crash later on in fs_live_variables::setup_def_use(). It is better to detect this earlier, so assert on that. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Add Gen < 6 runtime checks for line antialiasing.Iago Toral Quiroga2014-06-092-27/+67
| | | | | | | | | | | | | | | | | | | | | | | In Gen < 6 the hardware generates a runtime bit that indicates whether AA data has to be sent as part of the framebuffer write SEND message. This affects the specific case where we have setup antialiased line rendering and we render polygons which have one face setup in GL_LINE mode (line antialiasing will be used) and the other one in GL_FILL mode (no line antialiasing needed). Currently we are not doing this runtime test and instead we always send AA data, which produces incorrect rendering of the GL_FILL face of the polygon in in the aforementioned scenario (verified in ironlake and gm45). In Gen4 this is, likely, a regression introduced with commit 098acf6c843. In Gen5 this has never worked properly. Gen > 5 are not affected by this. The patch fixes the problem by adding the appropriate runtime check and adjusting the framebuffer write message accordingly in the conflictive scenario. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78679 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Let the gen < 8 generator know about runtime_check_aads_emitIago Toral Quiroga2014-06-094-3/+7
| | | | | | | In gen < 6 we need to produce conditional code based on this flag when doing framebuffer writes. Reviewed-by: Kenneth Graunke <[email protected]>
* Revert "i965: Move brw_land_fwd_jump() to compilation unit of its use."Iago Toral Quiroga2014-06-073-16/+21
| | | | | | | | | | This reverts commit f3cb2e6ed7059b22752a6b7d7a98c07ba6b5552e. brw_land_fwd_jump() is convenient wherever we produce JMPI instructions and we will use JMPI to implement framebuffer writes that involve line antialiasing in gen < 6. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix else and brace placement in brw_eu_emit.c.Kenneth Graunke2014-06-071-28/+13
| | | | | | | | I'm making a lot of changes to this area, and I figured I may as well not conflate these trivial changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Drop the remaining default predication whacking.Kenneth Graunke2014-06-072-5/+1
| | | | | | | | | | With my earlier cleaning in place (see git log brw_eu_emit.c), nothing relies on the instruction emitters for IF/WHILE/JMPI disabling predication. Drop it in favor of making callers do the right thing explicitly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/sf: Use brw_set_default_predicate_control().Kenneth Graunke2014-06-071-2/+2
| | | | | | | This is a bit tidier than poking at p->current directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Support GL_CLAMP natively on Broadwell.Kenneth Graunke2014-06-053-4/+13
| | | | | | | | | | | | The new hardware actually supports this OpenGL 1.x feature natively, so we can finally drop our shader workarounds. Not many applications use GL_CLAMP, and most use it unintentionally, but it's trivial to do right, so we should. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Pass brw to translate_wrap_mode().Kenneth Graunke2014-06-053-8/+9
| | | | | | | | This lets us do generation checks. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: use _mesa_align_malloc in intel_miptree_map_movntdqaTapani Pälli2014-06-051-2/+2
| | | | | | | | | This fixes case where we have 1x1 size buffer and misalignment is 0. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79616
* i965/fs: Allow array dereference of HW_REG.Chris Forbes2014-06-051-1/+1
| | | | | | | | | | | | | | When dereferencing an element of gl_SampleMaskIn[], the source register here will be a HW_REG rather than a VGRF because the payload slot is now exposed directly. Fixes an assertion failure in the Piglit test: tests/spec/arb_gpu_shader5/execution/samplemaskin-basic Signed-off-by: Chris Forbes <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix copy and pasted values in Broadwell code.Kenneth Graunke2014-06-031-10/+21
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Cc: "10.2" <[email protected]>