summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* mesa/dri: always link against shared glapiEmil Velikov2017-05-121-7/+9
| | | | | | | | | | Analogous to previous commit. Check with the extensive commit description and bug report referenced. Cc: [email protected] Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> (cherry picked from commit 51accecce7755be9b7eb1baadaec7e4b7d1011af)
* i965/vec4: don't modify regioning parameters to the sources of DF align1 ↵Samuel Iglesias Gonsálvez2017-05-121-8/+1
| | | | | | | | | | | | | | | | | | | instructions The regioning parameters are now properly set by convert_to_hw_regs() and we don't need to fix them in the generator. That latter fix previously done in the generator was strictly speaking wrong for any non-identity regions. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit f57e234fdd52331d0aa6656a36efdebea9d11e9d) [Andres Gomez: resolve trivial conflicts] Signed-off-by: Andres Gomez <[email protected]> Conflicts: src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
* i965/vec4: fix register width for DF VGRF and UNIFORMSamuel Iglesias Gonsálvez2017-05-121-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | On gen7, the swizzles used in DF align16 instructions works for element size of 32 bits, so we can address only 2 consecutive DFs. As we assumed that in the rest of the code and prepare the instructions for this (scalarize_df()), we need to set it to two again. However, for DF align1 instructions, a width of 2 is wrong as we are not reading the data we want. For example, an uniform would have a region of <0, 2, 1> so it would repeat the first 2 DFs, when we wanted to access to the first 4. This patch sets the default one to 4 and then modifies the width of align16 instruction's DF sources when we translate the logical swizzle to the physical one. v2: - Remove conditional (Curro). Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit aaeb1c99beed39d85c300ebdb8a7bf056ee6717c)
* i965/vec4: fix vertical stride to avoid breaking region parameter ruleSamuel Iglesias Gonsálvez2017-05-121-16/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From IVB PRM, vol4, part3, "General Restrictions on Regioning Parameters": "If ExecSize = Width and HorzStride ≠ 0, VertStride must be set to Width * HorzStride." In next patch, we are going to modify the region parameter for uniforms and vgrf. For uniforms that are the source of DF align1 instructions, they will have <0, 4, 1> regioning and the execsize for those instructions will be 4, so they will break the regioning rule. This will be the same for VGRF sources where we use the vstride == 0 exploit. As we know we are not going to cross the GRF boundary with that execsize and parameters (not even with the exploit), we just fix the vstride here. v2: - Move is_align1_df() (Curro) - Refactor exec_size == width calculation (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.1" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit 7f728bce811fc283e672e3a07b008bb7b52de35e) [Andres Gomez: use original is_align1_df] Signed-off-by: Andres Gomez <[email protected]> Conflicts: src/mesa/drivers/dri/i965/brw_vec4.cpp
* st/mesa: move duplicated st_ws_framebuffer() function into header fileBrian Paul2017-05-103-28/+18
| | | | | Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit b71ef173a5a61a667380dc77f5ae1f7e8c0c2fb8)
* mesa: validate sampler type across the whole programTimothy Arceri2017-04-263-6/+24
| | | | | | | | | | | | | | | | | | Currently we were only making sure types were the same within a single stage. This looks to have regressed with 953a0af8e3f73. Fixes: 953a0af8e3f73 ("mesa: validate sampler uniforms during gluniform calls") Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> https://bugs.freedesktop.org/show_bug.cgi?id=97524 (cherry picked from commit d682f8aa8e0edd166166f87fcd774dd2d57b4180) [Andres Gomez: there was an intermediate cleanup but this commit basically brings everything that was missing back] Signed-off-by: Andres Gomez <[email protected]> Conflicts: src/mesa/main/uniforms.c
* st/mesa: automake: honour the vdpau header install locationEmil Velikov2017-04-261-0/+1
| | | | | | | | | | | | | | | If VDPAU is installed in the non-default location, we'll fail to find the headers and error at build time. ../../src/gallium/include/state_tracker/vdpau_dmabuf.h:37:25: fatal error: vdpau/vdpau.h: No such file or directory #include <vdpau/vdpau.h> ^ Fixes: faba96bc60b ("st/vdpau: add new interop interface") Cc: Christian König <[email protected]> Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> (cherry picked from commit 51c0c213b7fa53b249e9fcb9004a3ba1076fe773)
* i965/vec4: Avoid reswizzling MACH instructions in opt_register_coalesce().Kenneth Graunke2017-04-261-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | opt_register_coalesce() was optimizing sequences such as: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) vgrf5.xy:D, attr18.xyyy:D, attr19.xyyy:D mov(8) m4.zw:F, vgrf5.xxxy:F into: mul(8) acc0:D, attr18.xyyy:D, attr19.xyyy:D mach(8) m4.zw:D, attr18.xxxy:D, attr19.xxxy:D This doesn't work - if we're going to reswizzle MACH, we'd need to reswizzle the MUL as well. Here, the MUL fills the accumulator's .zw components with attr18.yy * attr19.yy. But the MACH instruction expects .z to contain attr18.x * attr19.x. Bogus results ensue. No change in shader-db on Haswell. Prevents regressions in Timothy's patches to use enhanced layouts for varying packing (which rearrange code just enough to trigger this pre-existing bug, but were fine themselves). Acked-by: Timothy Arceri <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 2faf227ec2e22c7a37e0a54783a3f0a0062ac852) Squashed with commit: i965/vec4: Use reads_accumulator_implicitly(), not MACH checks. Curro pointed out that I should not just check for MACH, but use the reads_accumulator_implicitly() helper, which would also prevent the same bug with MAC and SADA2 (if we ever decide to use them). Cc: [email protected] Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit 6b10c37b9c3a73add73f444fe1aee73c9ec82c94)
* intel/fs: Take into account amount of data read in spilling cost heuristic.Francisco Jerez2017-04-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Until now the spilling cost calculation was neglecting the amount of data read from the register during the spilling cost calculation. This caused it to make suboptimal decisions in some cases leading to higher memory bandwidth usage than necessary. Improves Unigine Heaven performance by ~4% on BDW, reversing an unintended FPS regression from my previous commit 147e71242ce539ff28e282f009c332818c35f5ac with n=12 and statistical significance 5%. In addition SynMark2 OglCSDof performance is improved by an additional ~5% on SKL, and a Kerbal Space Program apitrace around the Moho planet I can provide on request improves by ~20%. Cc: <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 58324389be7bc7c5e10093b9cc0a8efa9b4c93a9) [Andres Gomez: resolve trivial conflicts] Signed-off-by: Andres Gomez <[email protected]> Conflicts: src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
* intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.Francisco Jerez2017-04-261-2/+1
| | | | | | | | | | | | | | | | This is what we use later on to compute the number of registers that will actually get spilled to memory, so it's more likely to match reality than the current open-coded approximation. Cc: <[email protected]> Reviewed-by: Plamena Manolova <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit ecc19e12dca95d2571d3761dea6dec24b061013c) [Andres Gomez: resolve trivial conflicts] Signed-off-by: Andres Gomez <[email protected]> Conflicts: src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
* st/mesa: invalidate the readpix cache in st_indirect_draw_vboMarek Olšák2017-04-261-0/+2
| | | | | | | Cc: <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Brian Paul <[email protected]> (cherry picked from commit 7cd6e2df65de9e2f0d77022a64c4e48ca2ebcb33)
* vbo: fix gl_DrawID handling in glMultiDrawArraysNicolai Hähnle2017-04-261-6/+15
| | | | | | | | | Fixes a bug in KHR-GL45.shader_draw_parameters_tests.ShaderMultiDrawArraysParameters. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 51deba0eb35d0d27560bb7dad24b8d39abb58be6)
* mesa: move glMultiDrawArrays to vbo and fix error handlingNicolai Hähnle2017-04-265-18/+126
| | | | | | | | | | | | | | | | | | | | | | When any count[i] is negative, we must skip all draws. Moving to vbo makes the subsequent change easier. v2: - provide the function in all contexts, including GLES - adjust validation accordingly to include the xfb check v3: - fix mix-up of pre- and post-xfb prim count (Nils Wallménius) Cc: [email protected] Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 42d5465b9ba85b4918b9e6fb57994720e3c8a80b) [Andres Gomez: resolve trivial conflicts] Signed-off-by: Andres Gomez <[email protected]> Conflicts: src/mesa/main/varray.c
* mesa: extract need_xfb_remaining_prims_checkNicolai Hähnle2017-04-261-20/+28
| | | | | | | | | The same logic needs to be applied to glMultiDrawArrays. Cc: [email protected] Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 756e9ebbdd84018382908d3556973a62dbda09ca)
* mesa: fix remaining xfb prims check for GLES with multiple instancesNicolai Hähnle2017-04-261-1/+1
| | | | | | | | | Found by inspection. Cc: [email protected] Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit ea9a8940cadb30ac8d72a26b82bdb54872c0e199)
* i965: Set kernel features before computing max GL version.Kenneth Graunke2017-04-121-24/+24
| | | | | | | | | | | | | | | | | | | | | | We check these bitfields when computing the Haswell max GL version. We need to set them ahead of time, or they won't exist, and all our checks will fail. That sets the max core profile GL version to 4.2. This introduces the bizarre situation where asking for a GL context with version 4.3+ fails, but asking for a GL core profile context with version <= 4.2 actually promotes you a 4.5 context. GLX_MESA_query_renderer also reported the bogus 4.2 value. Now it shows 4.5. Cc: "17.0" <[email protected]> Reported-and-tested-by: Rafael Ristovski <[email protected]> (cherry picked from commit 02ccd8f52cffcc25e5fefdd0f900cf04230395f4) [Emil Velikov: resolve trivial conflicts] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/mesa/drivers/dri/i965/intel_screen.c
* i965: Skip register write detection when possible.Kenneth Graunke2017-04-121-2/+8
| | | | | | | | | | | | | | | | | | | | Detecting register write support by trial and error introduces a stall at screen creation time, which it would be nice to avoid. Certain command parser versions guarantee this will work (see the giant comment in intelInitScreen2 below, or a few commits ago): - Ivybridge: version >= 1 (kernel v3.16) - Baytrail: version >= 2 (kernel v3.19) - Haswell: version >= 7 (kernel v4.8) For simplicity, we don't bother with version 1 in this patch. This assumes that the user hasn't disabled aliasing PPGTT via a kernel command line parameter. Don't do that - you're only breaking things. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> (cherry picked from commit 5e29af5f772c1e1b02a4cc46d2f7d3b5d2151ad8)
* i965: Set screen->cmd_parser_version to 0 if we can't write registers.Kenneth Graunke2017-04-121-6/+11
| | | | | | | | | | | | | | | | | | | | | | If we can't write registers, then the effective command parser version is 0 - it may exist, but it's not usefully enabling anything. See kernel commit 1ca3712ca3429a617ed6c5f87718e4f6fe4ae0c6 (in v4.8) where the kernel starts doing this for us. This makes us do more or less the same thing on older kernels. This should preserve a bit of sanity by allowing us to perform a screen->cmd_parser_version > N check to determine that we really can use the features promised by command parser version N. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> (cherry picked from commit 31693a13f8fbc52d4f19f1e8800a4edabeecbe19) [Emil Velikov: resolve trivial conflicts] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/mesa/drivers/dri/i965/intel_screen.c
* i965: Document the sad story of the kernel command parser.Kenneth Graunke2017-04-121-0/+97
| | | | | | | | | This should help us figure out the complexities of which kernel versions we need to get various features on various platforms. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> (cherry picked from commit 4a2ad6b145b4dd0d19a8e5e0ee6bed09e08ce0eb)
* i965/blorp: Bump the batch space estimateJason Ekstrand2017-04-121-1/+1
| | | | | | | | | | | | | | | Commit f938354362655a378d474c5f79c52cea9852ab91 recently increased the alignment on vertex buffer data from 32 to 64. This caused us to consume a bit more batch than we were before and we now go over the estimate by a small amount on certain blits on gen8+. This commit bumps then gen8 batch estimate by a bit to compensate. Haswell and older still seems to be well within the limit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100582 Reviewed-by: Iago Toral Quiroga <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]> (cherry picked from commit c9c39812b91c8104bc0bea16053312547846249c)
* i965/blorp: Align vertex buffers to 64BJason Ekstrand2017-04-121-1/+13
| | | | | | | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]> (cherry picked from commit f938354362655a378d474c5f79c52cea9852ab91) [Emil Velikov: brw_state_batch has different signature] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/mesa/drivers/dri/i965/genX_blorp_exec.c
* i965/fs: Always provide a default LOD of 0 for TXS and TXLJason Ekstrand2017-04-121-9/+9
| | | | | | | | | | | | | | We already provide a default LOD for textureQueryLevels and texture() on non-fragment stages. However, there are more cases where one is needed such as textureSize(gsampler2DMS*) in SPIR-V. Instead of trying to list out all of the cases one at a time, just provide the default for all TXS and TXL operations. This fixes a shader validation error in the new Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391 Reviewed-by: Anuj Phogat <[email protected]> Cc: "13.0 17.0" <[email protected]> (cherry picked from commit 3503b2714b98684a2ceba5f4fd9a5bfbfbcaad38)
* st: Add cubeMapFace parameter to st_finalize_texture.Michal Srb2017-04-126-7/+9
| | | | | | | | | | | | st_finalize_texture always accesses image at face 0, but it may not be set if we are working with cubemap that had other face set. This fixes crash in piglit same-attachment-glFramebufferTexture2D-GL_DEPTH_STENCIL_ATTACHMENT. Cc: [email protected] Reviewed-by: Nicolai Hähnle <[email protected]> (cherry picked from commit 52f9ccefcb75a9d42307890d7714b1cd92e864cb)
* i965/fs: Don't emit SEL instructions for type-converting MOVs.Matt Turner2017-03-291-0/+2
| | | | | | | | | | | | SEL can only convert between a few integer types, which we basically never do. Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow Cc: [email protected] Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Acked-by: Francisco Jerez <[email protected]> (cherry picked from commit 7dccd38b400d3a65da20ddefe282a7bb0b7ccb58)
* mesa/main: fix MultiDrawElements[BaseVertex] validation of primcountNicolai Hähnle2017-03-292-3/+23
| | | | | | | | | | | | | | | | | | | | | | | | primcount must be a GLsizei as in the signature for MultiDrawElements or bad things can happen. Furthermore, an error should be flagged when primcount is negative. Curiously, this code used to work somewhat correctly even when primcount was negative, because the loop that checks count[i] would iterate out of bounds and almost certainly hit a negative value at some point. Found by an ASAN error in GL45-CTS.gtf32.GL3Tests.draw_elements_base_vertex.draw_elements_base_vertex_primcount Note that the OpenGL spec seems to have s/primcount/drawcount/ at some point, and the code still reflects the old language. v2: provide the correct spec quotes (pointed out by Ian) Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> (v1) Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit c11dcfb5e9b051b9036949b3e40a9dc15138bd97)
* i965: Fall back to GL 4.2/4.3 on Haswell if the kernel isn't new enough.Kenneth Graunke2017-03-291-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit d2590eb65ff28a9cbd592353d15d7e6cbd2c6fc6 I enabled GL 4.5 on Haswell...but failed to check if we could do indirect compute shader dispatch...and query buffer objects. Indirect compute shader dispatch requires command parser version 5 (kernel commit 7b9748cb513a6bef4af87b79f0da3ff7e8b56cd8, which is in Linux v4.4). On earlier kernels we would have disabled ARB_compute_shader, which is a mandatory part of OpenGL 4.3+. Query buffer objects currently require MI_MATH and MI_LOAD_REGISTER_REG, which mean command parser version 7 (Linux v4.8). On earlier kernels we would have disabled ARB_query_buffer_object, which is a mandatory part of OpenGL 4.4+. The new version support looks like: - Kernel 4.1 and older => OpenGL 3.3 - Kernel 4.2-4.3 => OpenGL 4.2 - Kernel 4.4-4.7 => OpenGL 4.3 - Kernel 4.8+ => OpenGL 4.5 Cc: "17.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> (cherry picked from commit 9b324e4dca4754801e5db59aba0ab559f2cf35ea)
* intel: Correct the BDW surface state sizeNanley Chery2017-03-291-3/+2
| | | | | | | | | | | | The PRMs state that this packet is 16 DWORDS long. Ensure that the last three DWORDS are zeroed as required by the hardware when allocating a null surface state. Cc: <[email protected]> Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> (cherry picked from commit 7c50f9903f58ef04ff393505a383d06f499f1fdc)
* st/mesa: set result writemask based on ir typeIlia Mirkin2017-03-291-0/+1
| | | | | | | | | | | This prevents textureQueryLevels, which maps as LODQ, from ending up with a xyzw writemask, which is illegal. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100061 Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit dab88e9af7a35ebcdd0fc87df97f4b13e908552a)
* i965/gen8+: Do full stall when switching pipelineTopi Pohjolainen2017-03-291-1/+2
| | | | | | | | | | just as earlier gens do. CC: "17.0 13.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96743 Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> (cherry picked from commit bd25d9670b466043cdb5d9668f82accbd587c889)
* i965: move brw_define.h ifndef guard to the topEmil Velikov2017-03-161-3/+3
| | | | | | | | | | | | Cc: [email protected] Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 077078ce77e8653725def01ed291eb486989a9ad) [Emil Velikov: resolve trivial conflicts] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/mesa/drivers/dri/i965/brw_defines.h
* mesa: Avoid read of uninitialized variableRobert Foss2017-03-151-1/+1
| | | | | | | | | | | | | | | | | The is_color_attachement variable is later read when handling two separate error cases, where only one of the cases results in the variable being initialized. This can be avoided by giving the variable a safe default value. Coverity-Id: 1398631 Cc: [email protected] Signed-off-by: Robert Foss <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit 88becf73022d780cfd0d7dbc5bb3911f8b0d2b11)
* st/mesa: inform the driver of framebuffer changes before compute dispatchesNicolai Hähnle2017-03-151-1/+9
| | | | | | | | | | | | | | | | | | | | Even though compute shaders cannot access the framebuffer, there is a synchronization issue when a compute dispatch accesses a texture that was previously bound and drawn to as a framebuffer. Section 9.3 (Feedback Loops Between Textures and the Framebuffer) of the OpenGL 4.5 spec rather implicitly clarifies that undefined behavior results if the texture is still attached to the currently bound framebuffer. However, the feedback loop is broken when the application changes the framebuffer binding before a compute dispatch, and the state tracker needs to let the driver known about this. Fixes GL45-CTS.compute_shader.pipeline-post-fs on SI family Radeons. Cc: [email protected] Signed-off-by: Marek Olšák <[email protected]> (cherry picked from commit 40c77bbf83a369f21c5a95f14417348aae2dbe42)
* st/glsl_to_tgsi: avoid iterating past the head of the instruction listNicolai Hähnle2017-03-151-2/+9
| | | | | | | | | | | | exec_node::get_prev() does not guard against going past the beginning of the list, so we need to add explicit checks here. Found by ASAN in piglit arb_shader_storage_buffer_object-rendering. Cc: [email protected] Signed-off-by: Marek Olšák <[email protected]> (cherry picked from commit 911391bd70fe30ad970c5e56632b2d7ccc29d955)
* i965/fs: emit MOV_INDIRECT with the source with the right register typeSamuel Iglesias Gonsálvez2017-03-151-1/+1
| | | | | | | | | This was hiding bugs as it retyped the source to destination's type. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.0" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit 0dddad5b1bb3b05190074a71d274c04c0b5ea700)
* i965/fs: fix source type when emitting MOV_INDIRECT to read ICP handlesSamuel Iglesias Gonsálvez2017-03-151-3/+3
| | | | | | | | | | | | | | | When generating the MOV INDIRECT instruction, the source type is ignored and it is set to destination's type. However, this is going to change in a later patch, so we need to explicitly set the proper source type. brw_vec8_grf() creates an float type's fs_reg by default, when the ICP handle is actually unsigned. This patch fixes these cases before applying the aforementioned patch. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.0" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit d8122128bc6bd291ff0abcb7f2e52d9cdc631527)
* i965/fs: fix indirect load DF uniforms on BSW/BXTSamuel Iglesias Gonsálvez2017-03-151-21/+20
| | | | | | | | | | | | | | | | | | | | | | The lowered BSW/BXT indirect move instructions had incorrect source types, which luckily wasn't causing incorrect assembly to be generated due to the bug fixed in the next patch, but would have confused the remaining back-end IR infrastructure due to the mismatch between the IR source types and the emitted machine code. v2: - Improve commit log (Curro) - Fix read_size (Curro) - Fix DF uniform array detection in assign_constant_locations() when it is acceded with 32-bit MOV_INDIRECTs in BSW/BXT. v3: - Move changes in assign_constant_locations() to other patch. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.0" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit 56266df7ed9dbdf63acfd58944442893b4cd0c0b)
* i965/fs: detect different bit size accesses to uniforms to push them in ↵Samuel Iglesias Gonsálvez2017-03-151-16/+34
| | | | | | | | | | | | | | | proper locations Previously, if we had accesses with different sizes to the same uniform, we might not push it aligned with the bigger one. This is a problem in BSW/BXT when we access an array of DF uniform with both direct and indirect addressing because for the latter we use 32-bit MOV INDIRECT instructions. However this problem can happen with other generations and bitsizes. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.0" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit a497ab6838ae5a9898abfed82f7bc8295b490911)
* i965/fs: mark last DF uniform array element as 64 bit live oneSamuel Iglesias Gonsálvez2017-03-151-0/+3
| | | | | | | | | | This bug can make that we don't detect the end of a contiguous area correctly and push larger areas than the real ones. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: "17.0" <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> (cherry picked from commit 7427425247d80c9f59a3c3ad2dfeeb2429de6f67)
* st/mesa: set blend state for PBO readbacksMarek Olšák2017-03-151-0/+6
| | | | | | | | v2: restore the state Cc: 13.0 17.0 <[email protected]> Reviewed-by: Brian Paul <[email protected]> (cherry picked from commit cc2f92b09f8ab0470106185585fdc1282da523e6)
* st/mesa: reset sample_mask, min_sample, and render_condition for PBO opsMarek Olšák2017-03-152-0/+13
| | | | | | Cc: 13.0 17.0 <[email protected]> Reviewed-by: Brian Paul <[email protected]> (cherry picked from commit a40b76143d8b929412bed6fbed04810902844c40)
* intel/blorp: Explicitly flush all allocated stateJason Ekstrand2017-03-011-0/+8
| | | | | | | | | Found by inspection. However, I expect it fixes real bugs when using blorp from Vulkan on little-core platforms. Reviewed-by: Lionel Landwerlin <[email protected]> Cc: "13.0 17.0" <[email protected]> (cherry picked from commit 075ed20614e91110322aadff44dbd4c1ca2422e8)
* i965/fs: fix uninitialized memory accessLionel Landwerlin2017-03-011-3/+2
| | | | | | | | | Found while running shader-db under valgrind. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Cc: "13.0 17.0" <[email protected]> (cherry picked from commit a0ac118398c924f2ae75e5649fbaacd95abd231f)
* i965/fs: Fix the inline nir_op_pack_double optimizationJason Ekstrand2017-03-011-29/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We can only do the optimization if the source *is* SSA. Reviewed-by: Kenneth Graunke <[email protected]> Cc: "13.0 17.0" <[email protected]> (cherry picked from commit a4393bd97fe62e8299273bae769201c5c9c816ea) Squashed with commit: i965/fs: Remove the inline pack_double_2x32 optimization It's broken in a number of ways. In particular, a bunch of the conditions are backwards so it doesn't actually detect what it's supposed to detect. Since it's been broken, it hasn't actually been helping anything so just deleting it isn't a regression. This (and removing another optimization) were done on master in commit b07381161777ba5d5f4a1d713f7655bcaede4139. Cc: "Kenneth Grunke" <[email protected]> Cc: "Mark Janes" <[email protected]> [Emil Velikov: patch is a backport of the below "cherry pick"] Fixes: a4393bd97fe ("i965/fs: Fix the inline nir_op_pack_double optimization") (cherry picked from commit b07381161777ba5d5f4a1d713f7655bcaede4139)
* mesa: Do (TCS && !TES) draw time validation in ES as well.Kenneth Graunke2017-02-231-19/+26
| | | | | | | | | | | | | | | | | | Now that we have OES_tessellation_shader, the same situation can occur in ES too, not just GL core profile. Having a TCS but no TES may confuse drivers - i965 crashes, for example. This prevents regressions in ES31-CTS.core.tessellation_shader.single.xfb_captures_data_from_correct_stage with some SSO pipeline validation changes I'm making. v2: Add an ES spec citation (suggested by Alejandro) Cc: "17.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]> (cherry picked from commit 05a56893aa2570cb1f6e61e3c9cf365266ea1d3a)
* i965/sampler_state: Set the "Base Mip Level" field on Sandy BridgeJason Ekstrand2017-02-232-1/+20
| | | | | | | | | | | | Fixes two GL ES 3.0 CTS tests on Sandy Bridge: ES3-CTS.functional.texture.mipmap.cube.base_level.linear_linear ES3-CTS.functional.texture.mipmap.cube.base_level.linear_nearest Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "17.0 13.0" <[email protected]> (cherry picked from commit c59d1ea51bd0809761094e54c66bf3a200d964ff)
* i965/sampler_state: Pass texObj into update_sampler_stateJason Ekstrand2017-02-231-6/+4
| | | | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "17.0 13.0" <[email protected]> (cherry picked from commit c4f8f395b291a88eb74b07b90a4028ef4f026f58)
* i965/sampler_state: Clamp min/max LOD to 14 on gen7+Jason Ekstrand2017-02-231-2/+5
| | | | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "17.0" <[email protected]> (cherry picked from commit 9df3778016e9153bc8759f84075db2d62a62a596)
* st/mesa: don't pass compare mode for stencil-sampled texturesIlia Mirkin2017-02-231-1/+1
| | | | | | | | | Fixes dEQP-GLES31.functional.stencil_texturing.misc.compare_mode_effect Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: [email protected] (cherry picked from commit 3970257cef5e0c7b5b31c023450f1ea55b784e88)
* Revert "i965: Disable guardband clipping in the smaller-than-viewport case."Kenneth Graunke2017-02-101-31/+0
| | | | | | | | | | | | | | | | | | | | This reverts commit 0bac2551e40410e2251daf4fd9faf69310ab34ce. Now that we position the guardband correctly (applying translations in addition to scaling) and made it as large (or larger) than the render target, this shouldn't be necessary. Now we leave guardband clipping enabled 100% of the time, like the Windows driver does. Fixes GL45-CTS.gtf21.GL2FixedTests.clip.clip. It tries to draw a 16384x64 rectangle, and it appears that some kind of numerical imprecisions in the clipper result in some edge pixels going missing. The Windows driver passes this test because of guardband clipping. Cc: "17.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit ce8a63de6dffd4a7bc704b63bdd48a63798a438e)
* i965: Always scissor on Gen6-7.5 instead of disabling guardband.Kenneth Graunke2017-02-103-48/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we disabled the guardband when the viewport was smaller than the framebuffer on Gen6-7.5, to prevent portions of primitives from being draw outside of the viewport. On Gen8+, we relied on the viewport extents test to effectively scissor this away for us. We can simply always enable scissoring instead. We already include the viewport in the scissor rectangle, so this will effectively do the viewport extents test for us. (The only difference is that the scissor rectangle doesn't support sub-pixel values. I think that's okay.) Given that the viewport extents test is essentially a second scissor, and is enabled for basically all 3D drawing on Gen8+, it stands to reason that scissoring is cheap. Enabling the guardband reduces the cost of clipping, which is expensive. The Windows driver appears to never disable guardband clipping, and appears to use scissoring in this case. I don't know if they leave it on universally though. This fixes misrendering in Blender, where the "floor plane" grid lines started rendering at wrong angles after I disabled XY clipping of line primitives. Enabling the guardband seems to solve the issue. Cc: "17.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99339 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit ece0e535a44c228dd994861592deb155c14740d8)