aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
* i965: Rename brw_wm_sampler_state.c to brw_sampler_state.c.Kenneth Graunke2014-08-023-2/+2
| | | | | | | | | | | | | | | When the driver was originally written, it only supported texturing in the pixel shader backend; vertex and geometry shader texturing came much later. Originally, the pixel shader was referred to as "WM" (the Windowizer/Masker unit). So, this code happened to only be relevant for the WM stage, at the time. However, sampler state really applies to all stages, so putting "wm" in the filename doesn't make sense. I dropped it in gen7_sampler_state.c; at this point the asymmetry just trips people up. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Don't set min_mag_neq bit in Gen6 SAMPLER_STATE.Kenneth Graunke2014-08-021-2/+0
| | | | | | | | | The "Min/Mag State Not Equal" bit is supposed to be set when the min/mag filters or address rounding modes differ. BLORP uses identical min/mag settings, so the bit should be unset. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/miptree: Layout 1D Array as 2D Array with height of 1Jordan Justen2014-08-011-0/+20
| | | | | | | | | | | | | | | | | | | 1D array miptrees were being laid out as a 2D texture with 1 slice. This happened due to the mesa core storing the 1D array slice count in the height field. On Intel hardware, we want to create a 2D array with a height of 1 for the 1D array case. Fixes assertion failure in piglit (gen6, gen8): spec/glsl-1.30/execution/tex-miplevel-selection textureOffset 1DArrayShadow In release builds of Mesa, this test was observed to cause a GPU hang on gen8. Signed-off-by: Jordan Justen <[email protected]> Cc: "10.2" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81450 Tested-by: Ben Widawsky <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* xmlconfig: Use program_invocation_short_name when building for cygwinYaakov Selkowitz2014-07-291-0/+2
| | | | | | | | | mesa/mesa/src/mesa/drivers/dri/common/xmlconfig.c:104:10: warning: #warning "Per application configuration won't work with your OS version." [-Wcpp] # warning "Per application configuration won't work with your OS version." Signed-off-by: Yaakov Selkowitz <[email protected]> Reviewed-by: Jon TURNEY <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965/fs: Decide predicate/predicate_inverse outside of the for loop.Matt Turner2014-07-241-9/+14
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Swap if/else conditions in SEL peephole.Matt Turner2014-07-241-3/+3
| | | | | | Will clarify make the next commit easier to read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Improve dead control flow elimination.Matt Turner2014-07-241-10/+15
| | | | | | | | ... to eliminate an ELSE instruction followed immediately by an ENDIF. instructions in affected programs: 704 -> 700 (-0.57%) Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Accelerate uploads of RGBA and BGRA GL_UNSIGNED_INT_8_8_8_8_REV texturesJason Ekstrand2014-07-231-1/+5
| | | | | | | | | | Since intel is always going to be little-endian, GL_UNSIGNED_INT_8_8_8_8_REV is the same as GL_UNSIGNED_BYTE for RGBA and BGRA textures, so the same acceleration code will work. We might as well use it. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Set LastRT on the final FB write on Broadwell.Kenneth Graunke2014-07-231-4/+2
| | | | | | | | | | | | | | | | | In Piglit's EXT_framebuffer_multisample/alpha-to-coverage-dual-src-blend test, key->nr_color_regions == 2, but the dual source blend FB write has ir->target set to 0. So we failed to set "Last Render Target Select" on any FB write message. We only emit one FB write per render target, so my comment about setting LastRT on every FB write directed at the last color region is a bit... misinformed. According to the documentation, depth buffer writes and scoreboard updates happen on the FB write with LastRT set, so I believe we want to set it only once. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Port INTEL_DEBUG=optimizer to the vec4 backend.Kenneth Graunke2014-07-231-6/+36
| | | | | | | Largely via copy and paste. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Save the gl_shader_stage enum in backend_visitor.Kenneth Graunke2014-07-232-1/+4
| | | | | | | | | This will be useful for INTEL_DEBUG=optimizer in the vec4 backend, which needs to know whether it's currently processing a VS or GS. It isn't worth adding virtual methods for this case. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Don't print WE_normal in disassembly.Kenneth Graunke2014-07-231-1/+1
| | | | | | | | | Dropping this helps most lines fit in an 80 column terminal. The absence of WE_normal also helps call attention to WE_all, where something unusual is going on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* meta: Add a meta implementation of GL_ARB_clear_textureNeil Roberts2014-07-231-0/+1
| | | | | | | | | | | | | | | | | | | | Adds an implementation of the ClearTexSubImage driver entry point that tries to set up an FBO to render to the texture and then calls glClearBuffer with a scissor to perform the actual clear. If an FBO can't be created for the texture then it will fall back to using _mesa_store_ClearTexSubImage. When used in combination with _mesa_store_ClearTexSubImage this should provide an implementation that works for all DRI-based drivers. However as this has only been tested with the i965 driver it is currently only enabled there. v2: Only enable the extension for the i965 driver instead of all DRI drivers. Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add some more comments. v3: Use glClearBuffer* to avoid having to modify glClearColor and friends. Handle sRGB textures. Explicitly disable dithering. Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
* i965/fs: Fix gl_SampleMask handling for SIMD16 on Gen8+.Kenneth Graunke2014-07-211-5/+0
| | | | | | | | | | | | We actually want to use mov(16), not mov(8). Fixes 7 Piglit tests: ARB_sample_shading/builtin-gl-sample-mask [2468] and ARB_sample_shading/builtin-gl-sample-mask-simple [468]. Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991 Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.Kenneth Graunke2014-07-213-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | We might be able to do this without an extra program key field, but this is non-invasive and fixes the bug, for now. This fixes the following Piglit tests on Broadwell: - ARB_sample_shading/builtin-gl-sample-id 2 - ARB_sample_shading/builtin-gl-sample-position 2 - EXT_framebuffer_multisample/multisample-blit 2 color - EXT_framebuffer_multisample/multisample-blit 2 color linear - EXT_framebuffer_multisample/multisample-blit 2 depth - EXT_framebuffer_multisample/no-color 2 depth combined - EXT_framebuffer_multisample/no-color 2 depth separate - EXT_framebuffer_multisample/no-color 2 depth single - EXT_framebuffer_multisample/no-color 2 depth-computed combined - EXT_framebuffer_multisample/no-color 2 depth-computed separate - EXT_framebuffer_multisample/no-color 2 depth-computed single - EXT_framebuffer_multisample/unaligned-blit 2 color msaa - EXT_framebuffer_multisample/unaligned-blit 2 depth msaa Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991 Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing persample_shading field to brw_wm_debug_recompile.Kenneth Graunke2014-07-211-0/+2
| | | | | | | | Otherwise, the performance warning for shader recompiles will just say "something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/disasm: Don't disassemble the URB complete field on Broadwell.Kenneth Graunke2014-07-211-2/+4
| | | | | | | | It doesn't exist, so attempting to read it will trigger generation assertions in the brw_inst API. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Disable hex offset printing in disassembly.Kenneth Graunke2014-07-211-1/+2
| | | | | | | | | | | | | | | Printing the hex offsets makes it basically impossible to diff assembly: if you add even a single instruction, the entire shader shows up as a difference. So, every time I want to compare assembly, I have to strip this out. The hex offsets might be useful when debugging compaction, or when inspecting the program cache buffer. Since it's occasionally useful, but uncommon, this patch disables it by default, but makes it easy to re-enable it temporarily when the need arises. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Use foreach_inst_in_block a couple more places.Matt Turner2014-07-212-8/+2
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Replace cfg instances with calls to calculate_cfg().Matt Turner2014-07-215-22/+22
| | | | | | | | | | | Avoids regenerating it unnecessarily. Every program in shader-db improved, none by an amount less than a 1/3 reduction. One Dota2 shader decreased from 62 -> 24. cfg calculations: 429492 -> 193197 (-55.02%) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/cfg: Add a foreach_block_and_inst macro.Matt Turner2014-07-211-0/+4
| | | | | | Will let us abstract how the instructions are stored. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add cfg to backend_visitor.Matt Turner2014-07-219-33/+48
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Silence unused parameter warningIan Romanick2014-07-191-1/+1
| | | | | | | brw_fs_visitor.cpp:2400:1: warning: unused parameter 'ir' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence 'comparison is always true' warningIan Romanick2014-07-191-2/+0
| | | | | | | | | | | | | The parameter is an int16_t, and we're check that it's value will fit in 16-bits. Yes, the value that is stored in 16-bits will surely fit in 16-bits. brw_inst.h: In function 'brw_inst_set_gen6_jump_count': brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Silence many unused parameter warningsIan Romanick2014-07-191-0/+10
| | | | | | | | brw_inst.h: In function 'brw_inst_set_src1_vstride': brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter] Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* Add support for RGBA8 and RGBX8 textures in intel_texsubimage_tiled_memcpyJason Ekstrand2014-07-171-0/+11
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Improve debug output in intelTexImage and intelTexSubimageJason Ekstrand2014-07-172-1/+9
| | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* st/mesa,gallium: add a workaround for Unigine Heaven 4.0 and Valley 1.0Marek Olšák2014-07-182-3/+22
| | | | | | | Most (all?) Unigine shaders fail to compile without this if sample shading is advertised. This is, of course, Unigine developers' fault. Reviewed-by: Brian Paul <[email protected]>
* i965: Fix z_offset computation in intel_miptree_unmap_depthstencil()Anuj Phogat2014-07-171-2/+2
| | | | | | | | | | | | | | | | The bug is triggered by using glTexSubImage2d() with GL_DEPTH_STENCIL as base internal format and non-zero x, y offsets. Currently x, y offsets are ignored while updating the texture image. Fixes Khronos GLES3 CTS tests: npot_tex_sub_image_2d npot_tex_sub_image_3d npot_pbo_tex_sub_image_2d npot_pbo_tex_sub_image_2d Cc: <[email protected]> Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* Revert "i965: Extend compute-to-mrf pass to understand blocks of MOVs"Anuj Phogat2014-07-171-53/+10
| | | | | | | | | | | | | | This reverts commit bbefb15e01e1c16af69646898918982ae00f8c92. Fixes the 11 regressions caused in framebuffer_blit tests in Khronos GLES3 CTS tests: Original patch reduced the instruction count but had no performance benefits. So, it's safe to revert it without causing any performance regressions. Signed-off-by: Anuj Phogat <[email protected]> Acked-by: Kristian Høgsberg <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i915: Fix up intelInitScreen2 for DRI3Adel Gadllah2014-07-171-1/+2
| | | | | | | | | | | | | | | | | Commit 442442026eb updated both i915 and i965 for DRI3 support, but one check in intelInitScreen2 was missed for i915 causing crashes when trying to use i915 with DRI3. So fix that up. Reported-by: Igor Gnatenko <[email protected]> References: https://bugzilla.redhat.com/show_bug.cgi?id=1115323 References: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=754297 Tested-by: František Zatloukal <[email protected]> Tested-by: Dirk Griesbach <[email protected]> Signed-off-by: Adel Gadllah <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Cc: "10.2" <[email protected]>
* Revert "i965: Implement GL_PRIMITIVES_GENERATED with non-zero streams."Kenneth Graunke2014-07-162-26/+7
| | | | | | | | | | | | | | | | This reverts commit 3178d2474ae5bdd1102fb3d76a60d1d63c961ff5. This caused GPU hangs on Ivybridge for some users and huge (80%) performance regressions across the board on multiple platforms. We need to find a better solution. I've made several attempts, but none of them have worked yet. In the meantime, we should revert this. Reverting it breaks GL_PRIMITIVES_GENERATED for non-zero streams, but that's okay, since we don't expose GL_ARB_gpu_shader5 yet. Fixes Piglit's EXT_transform_feedback/generatemipmap prims_generated test case on Haswell.
* i965: Don't copy propagate abs into Broadwell logic instructions.Kenneth Graunke2014-07-152-12/+6
| | | | | | | | | | | | It's not clear what abs on logical instructions means on Broadwell, and it doesn't appear to do anything sensible. Fixes 270 Piglit tests (the bitand/bitor/bitxor tests with abs). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81157 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Use WE_all for gl_SampleID header register munging.Kenneth Graunke2014-07-151-5/+9
| | | | | | | | | | | | This code should execute without regard to the currently executing channels. Asking for gl_SampleID inside control flow might break in strange ways. It appears to break even at the top of the program in SIMD16 mode occasionally as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965/fs: Set force_uncompressed and force_sechalf on samplepos setup.Kenneth Graunke2014-07-151-6/+8
| | | | | | | | | | | | | | gen8_fs_generator uses these to decide whether to set the execution size to 8 or 16, so we incorrectly made both of these MOVs the full width in SIMD16 shaders. (It happened to work out on Gen4-7.) Setting them should also help inform optimization passes what's really going on, which could help avoid bugs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* i965: Set execution size to 8 for instructions with force_sechalf set.Kenneth Graunke2014-07-151-1/+1
| | | | | | | | | | | | | | | | | Both inst->force_uncompressed and inst->force_sechalf mean that the generated instruction should be uncompressed and have an execution size of 8. We don't require the visitor to set both flags - setting inst->force_sechalf by itself is supposed to be enough. On Gen4-7, guess_execution_size() demoted instructions to 8-wide based on the default compression state. On Gen8+, we instead set a default execution size, which worked great...except that we forgot to check inst->force_sechalf when deciding whether to use 8 or 16. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected]
* exec_list: Make various places use the new length() method.Connor Abbott2014-07-152-7/+2
| | | | | | | | | | Instead of hand-rolling it. v2 [mattst88]: Rename get_size to length. Expand comment in ir_reader. Reviewed-by: Ian Romanick <[email protected]> [v1] Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Connor Abbott <[email protected]>
* i965/fs: Relax interference check in register coalescing.Matt Turner2014-07-151-11/+12
| | | | | | | | | | | | | A similar attempt was made in commit 5ff1e446 and was reverted in commit a39428cf after causing a regression in an ES 3 conformance test. The test still passes after this commit. total instructions in shared programs: 1994827 -> 1992858 (-0.10%) instructions in affected programs: 128247 -> 126278 (-1.54%) GAINED: 0 LOST: 1 Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Perform CSE on sends-from-GRF rather than textures.Matt Turner2014-07-151-1/+1
| | | | | | | | | Should potentially allow a few more cases, while avoiding doing CSE on texture operations on Gen <= 6 with the MRF. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80211 Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: lu hua <[email protected]>
* i965: Initialize new chunks of realloc'd memory.Matt Turner2014-07-151-0/+4
| | | | | | | Otherwise we'd compare uninitialized pointers with NULL and dereference, leading to crashes. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Invalidate live intervals in opt_cse, not _local.Matt Turner2014-07-141-3/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Move aeb list into opt_cse_local.Matt Turner2014-07-142-7/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Invalidate live intervals in opt_cse, not _local.Matt Turner2014-07-141-3/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move aeb list into opt_cse_local.Matt Turner2014-07-142-7/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* Avoid mesa_dri_drivers import lib being installedJon TURNEY2014-07-131-2/+1
| | | | | | | | | | | | | | On Cygwin and MinGW, linking a shared library also generates an import library Use a wildcard which also matches the name of the megadriver import lib, mesa_dri_drivers.dll.a, so that is also removed after megadriver symlinks are created (This then matches src/gallium/targets/dri/Makefile.am, which already does things this way) Signed-off-by: Jon TURNEY <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965/vec4: Silence warnings about unhandled interpolation opsChris Forbes2014-07-131-0/+3
| | | | Signed-off-by: Chris Forbes <[email protected]>
* i965/fs: add support for ir_*_interpolate_at_* expressionsChris Forbes2014-07-132-2/+150
| | | | | | | | | | | | | | SIMD8-only for now. V5: - Fix style complaints - Move prototype to be with other oddball emit functions - Use unreachable() instead of assert() where possible V6: - Describe what is happening with the clamping - Add reg_width to make some expressions clearer Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Skip channel expressions splitting for interpolationChris Forbes2014-07-131-0/+25
| | | | | | | | The backend will have to do a message send, so we want to keep these in one piece, just like texture ops. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add generator support for pixel interpolator queryChris Forbes2014-07-134-0/+59
| | | | | | | | | | V5: - Split into separate opcodes - Pass message data in src1 immediate - Put noperspective bit in fs_inst rather than adding any junk to backend_instruction Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: add low-level support for send to pixel interpolatorChris Forbes2014-07-132-0/+38
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>