summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri
Commit message (Collapse)AuthorAgeFilesLines
* android: dri: link against libmesa_utilEmil Velikov2015-04-221-1/+2
| | | | | | | | | The dri modules depend on symbols provided by it. Cc: "10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Chih-Wei Huang <[email protected]> (cherry picked from commit 618885f71fcacb3d68bf37fa23be36830d4178d2)
* android: dri/common: conditionally include drm_cflags/set __NOT_HAVE_DRM_HEmil Velikov2015-04-221-0/+14
| | | | | | | | Otherwise we'll fail to find the drm.h header. Cc: "10.4 10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 8d90bfb724f89b04d703f869362cf2fc2a3d7567)
* android: add $(mesa_top)/src include to the whole of mesaEmil Velikov2015-04-221-1/+0
| | | | | | | | | | Many parts of mesa already have the include with others depending on it but it's missing. Add it once at the top makefile and be done with it. Cc: "10.4 10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Chih-Wei Huang <[email protected]> (cherry picked from commit 6fb801786604c270fae99c3d665dcebaa0bff3a6)
* android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERSEmil Velikov2015-04-221-1/+0
| | | | | | | | | | ... to manage the LIBDRM*_CFLAGS. The former is the recommended approach by the Android build system developers while the latter has been depreciated for quite some time. Cc: "10.4 10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 86919352e3da1c80409fdcb67c36f29a9687b7a9)
* drirc: Add "Second Life" quirk (allow_glsl_extension_directive_midshader).Kenneth Graunke2015-04-221-0/+4
| | | | | | | | | | | | | Appears to fix shader compilation. Tested by starting the client, dragging the "quality and speed" slider back and forth, and watching the console output - instead of piles of "shader failed to compile", the CPU seems to be busy compiling shaders. I haven't actually tried to play. Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69226 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71591 Cc: [email protected] (cherry picked from commit 00bf7d2e9cd60dbd82d25b459c448e11c545a89a)
* i965: Rewrite ir_tex to ir_txl with lod 0 for vertex shadersKristian Høgsberg2015-04-221-0/+9
| | | | | | | | | | | | | | | | | The ir_tex opcode turns into a sample or sample_c message, which will try to compute derivatives to determine the lod. This produces garbage for non-fragment shaders where the sample coordinates don't correspond to subspans. We fix this by rewriting the opcode from ir_tex to ir_txl and setting the lod to 0. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89457 Cc: "10.5" <[email protected]> Signed-off-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 993a6288f72fa98932df7cdb6f64d9dd645e670d)
* i965: Flush batchbuffer containing the query on glQueryCounter.Mathias Froehlich2015-04-221-0/+2
| | | | | | | | | | | | | | This change fixes a regression with timer queries introduced with commit 3eb6258. There the pending batchbuffer is flushed only if glEndQuery is executed. This present change adds such a flush to glQueryCounter which also schedules a value query just like glEndQuery does. The patch fixes GPU timer queries going mad from within osgviewer. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]> Cc: [email protected] (cherry picked from commit 1e1d5456ba3dff82301ad4bbdde2fb6e2f562fe3)
* i965: Fix URB size for CHVVille Syrjälä2015-04-081-1/+1
| | | | | | | | | Increase the device info .urb.size for CHV to match the default URB size (192kB). Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> (cherry picked from commit 970dc2360372a7859691d690bd2f1976c3c97fb0)
* i965: Do not render primitives in non-zero streams then TF is disabledIago Toral Quiroga2015-04-081-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Haswell hardware seems to ignore Render Stream Select bits from 3DSTATE_STREAMOUT packet when the SOL stage is disabled even if the PRM says otherwise. Because of this, all primitives are sent down the pipeline for rasterization, which is wrong. If SOL is enabled, Render Stream Select is honored and primitives bound to non-zero streams are discarded after stream output. Since the only purpose of primives sent to non-zero streams is to be recorded by transform feedback, we can simply discard all geometry bound to non-zero streams then transform feedback is disabled to prevent it from ever reaching the rasterization stage. Notice that this patch introduces a small change in the behavior we get when a geometry shader emits more vertices than the maximum declared: before, a vertex that was emitted to a non-zero stream when TF was disabled would still count for the purposes of checking that we don't exceed the maximum number of output vertices declared by the shader. With this change, these vertices are completely ignored and won't increase the output vertex count, making more room for other (hopefully more useful) vertices. Fixes piglit test arb_gpu_shader5-emitstreamvertex_nodraw on Haswell and Broadwell. v2 (Ken): Drop is_haswell check in favor of doing this unconditionally. Broadwell needs the workaround as well, and it doesn't hurt to do it in general. Also tweak comments - the Haswell PRM does actually mention this ("Command Reference: Instructions" page 797). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83962 Reviewed-by: Kenneth Graunke <[email protected]> Cc: [email protected] (cherry picked from commit 2042a2f961a07e04eaca0347e42859c249325531)
* i965: Add forgotten multi-stream code to Gen8 SOL state.Kenneth Graunke2015-04-081-0/+9
| | | | | | | | | Fixes Piglit's arb_gpu_shader5-xfb-streams-without-invocations. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected] (cherry picked from commit f368d0fa1fe37a58780ee555d4a9ccf15474782b)
* i965: Fix instanced geometry shaders on Gen8+.Kenneth Graunke2015-04-081-0/+2
| | | | | | | | | | | | | | Jordan added this in commit 741782b5948bb3d01d699f062a37513c2e73b076 for Gen7 platforms. I missed this when adding the Broadwell code. Fixes Piglit's spec/arb_gpu_shader5/invocation-id-{basic,in-separate-gs} with MESA_EXTENSION_OVERRIDE=GL_ARB_gpu_shader5 set. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected] (cherry picked from commit f9e5dc0a85df8dbfb8213ff772dfeb218972db12)
* xmlpool: don't forget to ship the MOSEmil Velikov2015-04-081-1/+8
| | | | | | | | | | | | | This will allow us to finally remove python from the build time dependencies list. Considering that you're building from a release tarball of course :-) Cc: Bernd Kuhls <[email protected]> Reported-by: Bernd Kuhls <[email protected]> Cc: "10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit a665b9b3c89095923cf2251895afc69c9f79aafe)
* i965: Set nr_params to the number of uniform components in the VS/GS path.Francisco Jerez2015-03-263-15/+4
| | | | | | | | | | | | | | | | | | | | | Both do_vs_prog and do_gs_prog initialize brw_stage_prog_data::nr_params to the number of uniform *vectors* required by the shader rather than the number of uniform components, contradicting the comment. This is inconsistent with what the state upload code and scalar path expect but it happens to work until Gen8 because vec4_visitor interprets it as a number of vectors on construction and later on overwrites its original value with the number of uniform components referenced by the shader. Also there's no need to add the number of samplers, they're not actually passed in as uniforms. Fixes a memory corruption issue on BDW with SIMD8 VS. Cc: "10.5" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit fd149628e142af769c1c0ec037bc297d8a3e871f) [Emil Velikov: s/DIV_ROUND_UP/CEILING/] Signed-off-by: Emil Velikov <[email protected]>
* i965: Fix out-of-bounds accesses into pull_constant_loc arrayIago Toral Quiroga2015-03-111-2/+7
| | | | | | | | | | | | | | | | The piglit test glsl-fs-uniform-array-loop-unroll.shader_test was designed to do an out of bounds access into an uniform array to make sure that we handle that situation gracefully inside the driver, however, as Ken describes in bug 79202, Valgrind reports that this is leading to an out-of-bounds access in fs_visitor::demote_pull_constants(). Before accessing the pull_constant_loc array we should make sure that the uniform we are trying to access is valid. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=79202 Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit 6ac1bc90c4a7a6f32901a9782e14b090f6fe5270) Nominated-by: Matt Turner <[email protected]>
* i965/fs: Don't issue FB writes for bound but unwritten color targets.Kenneth Graunke2015-03-111-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | We used to loop over all color attachments, and emit FB writes for each one, even if the shader didn't write to a corresponding output variable. Those color attachments would be filled with garbage (undefined values). Football Manager binds a framebuffer with 4 color attachments, but draws to it using a shader that only writes to gl_FragData[0..2]. This meant that color attachment 3 would be filled with garbage, resulting in rendering artifacts. Now we skip writing to it, fixing rendering. Writes to gl_FragColor initialize outputs[0..nr_color_regions-1] to GRFs, while writes to gl_FragData[i] initialize outputs[i]. Thanks to Jason Ekstrand for tracking this down. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86747 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: [email protected] (cherry picked from commit e95969cd9548033250ba12f2adf11740319b41e7) Conflicts: src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
* i965/fs: Make emit_shader_time_end() insert before EOT.Kenneth Graunke2015-03-112-23/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we emitted the shader-time epilogue from emit_fb_writes(), during the middle of looping through color regions (or emit_urb_writes for the VS). This is duplicated several times and rather awkward. I need to fix a bug in our FB write handling, and it will be a lot easier if we move emit_shader_time_end() out of there. Now, we simply emit FB writes/URB writes, and subsequently have emit_shader_time_end() insert instructions before the final SEND with EOT. Not only is this simpler, it's actually a slight improvement: we now include the MOVs to set up the final FB write payload in our shader-time measurements. Note that INTEL_DEBUG=shader_time only exists on Gen7+, and uses send-from-GRF. (In the past, we might have hit trouble where both attempt to use MRFs for messages; that's not a problem now.) v2: Rebase on v3 of the previous patch and other shader_time fixes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> [v1] Acked-by: Matt Turner <[email protected]> Cc: [email protected] (cherry picked from commit 4ebeb71573ad44f7657810dc5dd2c9030e3e63db) Conflicts: src/mesa/drivers/dri/i965/brw_fs.cpp
* i965/fs: Make get_timestamp() pass back the MOV rather than emitting it.Kenneth Graunke2015-03-112-5/+16
| | | | | | | | | | | | | | This makes another part of the INTEL_DEBUG=shader_time code emittable at arbitrary locations, rather than just at the end of the instruction stream. v2: Don't lose smear! Caught by Topi Pohjolainen. v3: Don't set smear on the destination of the MOV. Thanks Topi! Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected] (cherry picked from commit e43af8d09f919d02b5ac0810c1c0f1783cbef6ef)
* i965/fs: Make emit_shader_time_write return rather than emit.Kenneth Graunke2015-03-112-10/+8
| | | | | | | | | | | | | Instead of emit_shader_time_write, we now do emit(SHADER_TIME_ADD(...)). The advantage is that we can also insert a shader time write at an arbitrary location in the instruction stream, rather than being restricted to emitting at the end. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected] (cherry picked from commit bea854c7f33cc10b8292f931f114afc4f88a8dd4)
* i965/fs: Set smear on shader_time diff register.Kenneth Graunke2015-03-111-0/+1
| | | | | | | | | | | | | | | | | | The ADD(diff, diff, fs_reg(-2u)) instruction reads diff, which is a width 1 register. We need to read it as <0,1,0> with a subreg of 0, which is what smear accomplishes. Fixes assertion: brw_eu_emit.c:285: validate_reg: Assertion `hstride == 0' failed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected] (cherry picked from commit f1adc45dbe649cdd4538fb96f6d2a27328bbfba1) Conflicts: src/mesa/drivers/dri/i965/brw_fs.cpp
* i965/fs: Set force_writemask_all on shader_time instructions.Kenneth Graunke2015-03-111-2/+7
| | | | | | | | | | | | | | | | These computations don't have anything to do with the currently executing channels, so they should use force_writemask_all. This fixes assert failures. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86974 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: [email protected] (cherry picked from commit ef9cc7d0c176669c03130abf576f2b700be39514) Conflicts: src/mesa/drivers/dri/i965/brw_fs.cpp
* i965: Avoid applying negate to wrong MAD source.Matt Turner2015-03-071-15/+13
| | | | | | | | | | | | | | | | | | | For some given GLSL IR like (+ (neg x) (* 1.2 x)), the try_emit_mad function would see that one of the +'s sources was a negate expression and set mul_negate = true without confirming that it was actually a multiply. Cc: 10.5 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89315 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89095 Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit d528907fd2950c7bb968fff66dd79863cd128890) [Emil Velikov: drop the changes in brw_vec4_visitor.cpp] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/mesa/drivers/dri/i965/brw_fs_visitor.cpp src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
* intel: fix EGLImage renderbuffer _BaseFormatFrank Henigman2015-03-072-3/+2
| | | | | | | | | | | | Correctly set _BaseFormat field when creating a gl_renderbuffer with EGLImage storage. Change-Id: I8c9f7302d18b617f54fa68304d8ffee087ed8a77 Signed-off-by: Frank Henigman <[email protected]> Reviewed-by: Stéphane Marchesin <[email protected]> Reviewed-by: Chad Versace <[email protected]> (cherry picked from commit e43729943e67972e547a19123fb3afca6b77202b) Nominated-by: Chad Versace <[email protected]>
* i965/vec4: Don't lose the saturate modifier in copy propagation.Andrey Sudnik2015-03-071-1/+1
| | | | | | | | Cc: 10.4, 10.5 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89224 Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 0dfec59a2785cf7a87ee5128889ecebe810b611b)
* i965: Split Gen4-5 BlitFramebuffer code; prefer BLT over Meta.Kenneth Graunke2015-03-071-1/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | A while back I switched intel_blit_framebuffer to prefer Meta over the BLT. This meant that Gen8 platforms would start using the 3D engine for blits, just like we do on Gen6-7.5. However, I hadn't considered Gen4-5 when making that change. The BLT engine appears to be substantially faster on 965GM than using Meta to drive the 3D engine. This isn't too surprising: original Gen4 doesn't support tile offsets (that came on G45), and the level/layer fields don't work for cubemap rendering, so for inconvenient miplevel alignments, we end up blitting or copying data to/from temporaries in order to render to it. We may as well just use the blitter. I chose to use the BLT on Gen4-5 because they use the same ring for both 3D and BLT; Gen6+ splits it out. Fixes regressions on 965GM due to botched tile offset code (we should fix those properly as well, but they're longstanding bugs - for now, put things back to the status quo). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89430 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: "10.5" <[email protected]> (cherry picked from commit aa0705c06c03d2b882ac7b185ed123bc8a10d716)
* i965: Tell intel_get_memcpy() which direction the memcpy() is going.Matt Turner2015-03-075-42/+106
| | | | | | | | | | | | | | | | | | | | | | | The SSSE3 swizzling code was written for fast uploads to the GPU and assumed the destination was always 16-byte aligned. When we began using this code for fast downloads as well we didn't do anything to account for the fact that the destination pointer given by glReadPixels() or glGetTexImage() is not guaranteed to be suitably aligned. With SSSE3 enabled (at compile-time), some applications would crash when an SSE aligned-store instruction tried to store to an unaligned destination (or an assertion that the destination is aligned would trigger). To remedy this, tell intel_get_memcpy() whether we're uploading or downloading so that it can select whether to assume the destination or source is aligned, respectively. Cc: 10.5 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89416 Tested-by: Uriy Zhuravlev <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 2e4c95dfe2cb205c327ceaa12b44a9273bdb20dc)
* i965/fs: Don't propagate cmod to inst with different type.Matt Turner2015-03-072-0/+38
| | | | | | | Cc: 10.5 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89317 Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 1e128e9b69c6336762a2b6ee5d356c763b9ae3b0)
* i965/fs: Don't use backend_visitor::instructions after creating the CFG.Matt Turner2015-03-071-10/+0
| | | | | | | | | | | | | | | | | | | | | This is a fix for a regression introduced in commit a9f8296d ("i965/fs: Preserve the CFG in a few more places."). The errata this code works around is described in a comment before the function: "[DevBW, DevCL] Errata: A destination register from a send can not be used as a destination register until after it has been sourced by an instruction with a different destination register. The framebuffer write's sources must be in message registers, which SEND instructions cannot have as a destination. There's no way for this errata to affect anything at the end of the program. Just remove the code. Cc: 10.4, 10.5 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84613 Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit e214000f258ae564e64d839cccee9418526f226b)
* i965: Consider scratch writes to have side effects.Matt Turner2015-03-071-0/+1
| | | | | | | | | We could do better by tracking scratch reads and writes. Cc: 10.5 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88793 Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit da20bf068ef0f816968d9bc4dfea81facf0fd680)
* i965/gs: Check newly-generated GS-out VUE map against correct stageChris Forbes2015-03-071-1/+1
| | | | | | | | | | | | | | | | Previously, we compared our new GS-out VUE map to the existing *VS*-out VUE map, which is bogus. This would mostly manifest as redundant dirty flagging where the GS is in use but the VS and GS output layouts differ; but there is a scary case where we would fail to flag a GS-out layout change if it happened to match the VS-out layout. Signed-off-by: Chris Forbes <[email protected]> Cc: "10.5, 10.4" <[email protected]> Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88885 (cherry picked from commit b51ff50a767cc78d678ed3d2c25995f5c4194fea)
* i965/vec4: Fix implementation of i2b.Matt Turner2015-03-071-1/+1
| | | | | | | | | I broke this in commit 2881b123d. I must have misread i2b as b2i. Cc: 10.5 <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88246 Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 43ef2657a08f850c5756f28520f2cbe506807f24)
* i965/fs/nir: Use emit_math for nir_op_fpowIan Romanick2015-03-071-1/+1
| | | | | | | | | | It appears that all the other instructions that need it already use it. This one just got missed. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Cc: "10.5" <[email protected]> (cherry picked from commit b8a1637119249c1d5e76c27d0053360bbb7f4e77)
* xmlpool: make sure we ship options.hEmil Velikov2015-03-021-1/+1
| | | | | | | | | | | The header is included in ../xmlpool.h. With the latter of which used directly in a number of places in mesa. Note that we can also add it (alongside t_option.h) to noinst_HEADERS, but neither solution fixes the issue that brough us here - namely: Do not regenerate the headers, if it already exists. Cc: "10.5" <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* i965/skl: Implement WaDisable1DDepthStencilNeil Roberts2015-02-271-0/+12
| | | | | | | | | | | | | Skylake+ doesn't support setting a depth buffer to a 1D surface but it does allow pretending it's a 2D texture with a height of 1 instead. This fixes the GL_DEPTH_COMPONENT_* tests of the copyteximage piglit test (and also seems to avoid a subsequent GPU hang). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89037 Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 5b29b2922afe2b8167a589fc2896a071fc85b693) Nominated-by: Ian Romanick <[email protected]>
* i965: Link test programs with gtest before pthreads.Matt Turner2015-02-241-10/+10
| | | | | | Cc: "10.5" <[email protected]> Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=540962 (cherry picked from commit 0b6d43e329d194b01ab5cd554617f79a13f6669a)
* i965/vec4: Add and use byte-MOV instruction for unpack 4x8.Matt Turner2015-02-244-2/+21
| | | | | | | | | | Previously we were using a B/UB source in an Align16 instruction, which is illegal. It for some reason works on all platforms, except Broadwell. Cc: "10.5" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86811 Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit e0137fd6f720e4977466b1760ac02a72c5abceb8)
* i965/fs: Consider MOV.SAT to interfere if it has a source modifier.Matt Turner2015-02-242-4/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The saturate propagation pass recognizes that the second instruction below does not interfere with an attempt to propagate the saturate modifier from instruction 3 to 1. 1: add(8) dst0 src0 src1 2: mov.sat(8) dst1 dst0 3: mov.sat(8) dst2 dst0 Unfortunately, we did not consider the case of instruction 2 having a source modifier on dst0. Take for instance: 1: add(8) dst0 src0 src1 2: mov.sat(8) dst1 -dst0 3: mov.sat(8) dst2 dst0 Consider such an instruction to interfere. Increase instruction counts in Anomaly 2, which could be a bug fix depending on the values the first instruction produces. instructions in affected programs: 53228 -> 53934 (1.33%) HURT: 360 Cc: <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 7f8dd91d166e49d7da98f90d6428dc2705fb96d0)
* i965/fs: Use fs_inst::overwrites_reg() in saturate propagation.Matt Turner2015-02-242-4/+44
| | | | | | | | This is safer and matches the conditional_mod propagation pass. Cc: <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 871ad3f08bc34e16fdd728e9a4821b9a83e509f0)
* i965/fs: Add unit tests for saturate propagation pass.Matt Turner2015-02-242-0/+362
| | | | | | Cc: <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit bf3389ec49a158e0b66db8e038d801eacabd20f1)
* i965: Prefer Meta over the BLT for BlitFramebuffer.Kenneth Graunke2015-02-241-7/+7
| | | | | | | | | | | | | | | | | | There's some debate about whether we should use Meta or BLORP, but either should run circles around the BLT engine. In particular, this means that Gen8+ will use the 3D engine for blits, like we do on Gen6-7. Improves performance in "copypixrate -blit -back" (from Mesa demos) by 232.037% +/- 3.15795% (n=10) on Broadwell GT3e. v2: Rebase on Laura's changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.5" <[email protected]> (cherry picked from commit d523fefa756eef9c7a2c0d91cf4c2df10b89ed2a)
* i965/vec4/vp: Use vec4_visitor::CMP.Matt Turner2015-02-241-2/+1
| | | | | | | | | | | | | | | | | | ... instead of emit(BRW_OPCODE_CMP, ...). In commit 6b3a301f I changed vec4_visitor::CMP to set the destination's type to that of src0. In the following commit (2335153f) I removed an apparently now unnecessary work around for Gen8 that did the same thing. But there was a single place that emitted a CMP instruction without using the vec4_visitor::CMP function. Use it there. And change dst_null_d to dst_null_f for good measure, since ARB vp doesn't have integers. Cc: "10.5" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89032 Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 72b9f8db2a035b52dbd59af0d2a6441cbde11a4c)
* i965: Fix integer border color on Haswell.Kenneth Graunke2015-02-243-0/+66
| | | | | | | | | | +82 Piglits - 100% of border color tests now pass on Haswell. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected] (cherry picked from commit 08a06b6b891df456902f5e170f1d82236d0c73d2)
* i965: Use a gl_color_union for sampler border color.Kenneth Graunke2015-02-241-53/+52
| | | | | | | | | | | | This should have no effect, but will make it easier to implement other bug fixes. v2: Eliminate "unsigned one" local; just use the value where necessary. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: [email protected] (cherry picked from commit e1e73443c572b5432ef66a923fe64b73467f411b)
* i965: Override swizzles for integer luminance formats.Kenneth Graunke2015-02-241-0/+8
| | | | | | | | | | | | | | | | The hardware's integer luminance formats are completely unusable; currently we fall back to RGBA. This means we need to override the texture swizzle to obtain the XXX1 values expected for luminance formats. Fixes spec/EXT_texture_integer/texwrap formats bordercolor [swizzled] on Broadwell - 100% of border color tests now pass on Broadwell. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Cc: [email protected] (cherry picked from commit 8cb18760cccf2c89d94c50ff14b330ec2d5c4a3c)
* i965: Add more stringent blitter assertionsBen Widawsky2015-02-071-0/+3
| | | | | | | | | | | | Blits to or from a y-tiled surface must always be a multiple of the tile size. From page 16 of the HSW PRM (https://01.org/linuxgraphics/sites/default/files/documentation/intel-gfx-prm-osrc-hsw-memory-views.pdf#16) "The pitch of a tiled enclosing region must be an integral number of tile widths" Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Consolidate some of the intel_blit logicBen Widawsky2015-02-071-20/+8
| | | | | | | | | | | An upcoming patch is going to introduce some code here, and having this code organized as the patch does makes it a bit easier to read later. There should be no functional change here. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/vec4: Correct MUL destination hazardBen Widawsky2015-02-061-4/+4
| | | | | | | | | | | | | | | | | | | | | | As it turns out, we were over-thinking the cause of the hang on Cherryview. It's simply errata for Cherryview. commit 88fea85f09e2252035bec66ab26c375b45b000f5 Author: Ben Widawsky <[email protected]> Date: Fri Nov 21 10:47:41 2014 -0800 i965/vec4/gen8: Handle the MUL dest hazard exception This is an explanation to why we never saw the hang on BDW. NOTE: The problem the original patch was trying to fix does still exist. It will have to be fixed at some point. v2: Modify commit message, s/CHV/BDW Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=84212 Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Fix INTEL_DEBUG=shader_time for SIMD8 VS (and GS).Kenneth Graunke2015-02-051-9/+25
| | | | | | | | | | | | We were incorrectly attributing VS time to FS8 on Gen8+, which now use fs_visitor for vertex shaders. We don't hit this for geometry shaders yet, but we may as well add support now - the fix is obvious, and we'll just forget later. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: Use inst->eot rather than opcodes in register allocation.Kenneth Graunke2015-02-051-11/+10
| | | | | | | | | | | | Previously, we special cased FB writes and URB writes in the register allocation code. What we really wanted was to handle any message with EOT set. This saves us from extending the list with new opcodes in the future. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965/fs: Delete is_last_send(); just check inst->eot.Kenneth Graunke2015-02-051-14/+1
| | | | | | | | | | | | | | | This helper function basically just checks inst->eot, but also asserts that only opcodes we expect to terminate threads have EOT set. As far as I'm aware, we've never had such a bug. Removing it means that we don't have to extend the list for new opcodes. Cherryview and Skylake introduce an optimization where sampler messages can have EOT set; scalar GS/HS/DS will likely introduce new opcodes as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* i965: Remove now unnecessary Gen8 CMP destination type override.Matt Turner2015-02-041-8/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>