summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965/gen4-5: Emit MI_FLUSH as required prior to switching pipelines.Francisco Jerez2016-01-141-0/+13
| | | | | | | | | AFAIK brw_emit_select_pipeline() is only called once during context init on Gen4-5, at which point the pipeline is likely to be already idle so it may just happen to work by luck regardless of the MI_FLUSH. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6-7: Implement stall and flushes required prior to switching pipelines.Francisco Jerez2016-01-141-0/+37
| | | | | | | | | | | | | | | | | Switching the current pipeline while it's not completely idle or the read and write caches aren't flushed can lead to corruption. Fixes misrendering of at least the following Khronos CTS test: ES31-CTS.shader_image_load_store.basic-allTargets-store-fs The stall and flushes are no longer required on Gen8+. v2: Emit PIPE_CONTROL with non-zero post-sync op before the write cache flush on SNB due to hardware bug. (Ken) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93323 Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen8+: Invalidate color calc state when switching to the GPGPU pipeline.Francisco Jerez2016-01-141-0/+20
| | | | | | | | | | | This hardware bug can cause a hang on context restore while the current pipeline is set to GPGPU (BDWGFX HSD 1909593). In addition to clearing the valid bit, mark the CC state as dirty to make sure that the CC indirect state pointer is re-emitted when we switch back to the 3D pipeline. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add state bit to trigger re-emission of color calculator state.Francisco Jerez2016-01-143-0/+4
| | | | | | | | | | | | | This will be used on Gen8+ to make sure that the color calculator state pointers are re-emitted when switching back to the 3D pipeline after some GPGPU workload due to a hardware workaround. There are other state bits already defined that could be used to achieve the same effect but they all cause a ton of unrelated state to be re-emitted (e.g. BRW_NEW_STATE_BASE_ADDRESS), so just define a new one, state bits are cheap. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nv50/ir: rebase indirect temp arrays to 0, so that we use less lmem spaceIlia Mirkin2016-01-141-14/+44
| | | | | | | | | | | | | Reduces local memory usage in a lot of Metro 2033 Redux and a few KSP shaders: total local used in shared programs : 54116 -> 30372 (-43.88%) Probably modest advantage to execution, but it's an imporant prerequisite to dropping some of the TGSI optimizations done by the state tracker. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: only use FILE_LOCAL_MEMORY for temp arrays that use indirectionIlia Mirkin2016-01-141-15/+50
| | | | | | | | | | | | | | | | | | | Previously we were treating any indirect temp array usage to mean that everything should end up in lmem. The MemoryOpt pass would clean a lot of that up later, but in the meanwhile we would lose a lot of opportunity for optimization. This helps a lot of Metro 2033 Redux and a handful of KSP shaders: total instructions in shared programs : 6288373 -> 6261517 (-0.43%) total gprs used in shared programs : 944051 -> 945131 (0.11%) total local used in shared programs : 54116 -> 54116 (0.00%) A typical case is for register usage to double and for instructions to halve. A future commit can also optimize local memory usage size to be reduced with better packing. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: be careful about propagating very large offsets into const loadIlia Mirkin2016-01-144-1/+19
| | | | | | | | | | | | | Indirect constbuf indexing works by using very large offsets. However if an indirect constbuf index load is const-propagated, it becomes a very large const offset. Take that into account when legalizing the SSA by moving the high parts of that offset into the file index. Also disallow very large (or small) indices on most other instructions. This fixes regressions in ubo_array_indexing/*-two-arrays piglit tests. Fixes: abd326e81b (nv50/ir: propagate indirect loads into instructions) Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: allow fragment shader inputs to use indirect indexingIlia Mirkin2016-01-141-1/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* st/mesa: use surface format to generate mipmaps when availableIlia Mirkin2016-01-141-2/+8
| | | | | | | | This fixes the recently posted mipmap + texture views piglit test. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* radeonsi: don't miss changes to SPI_TMPRING_SIZEMarek Olšák2016-01-141-2/+7
| | | | | | | | | | I'm not sure about the consequences of this bug, but it's definitely dangerous. This applies to SI, CIK, VI. Cc: 11.0 11.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* svga: add DXGenMips command supportCharmaine Lee2016-01-1410-26/+144
| | | | | | | | | | | | | For those formats that support hw mipmap generation, use the DXGenMips command. Otherwise fallback to the mipmap generation utility. Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench) v2: make sure the texture surface was created with the render target bind flag set relocation flag to SVGA_RELOC_WRITE for the texture surface Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: add num-generate-mipmap HUD queryCharmaine Lee2016-01-143-1/+12
| | | | | | | | The actual increment of the num-generate-mipmap counter will be done in a subsequent patch when hw generate mipmap is supported. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium/st: add pipe_context::generate_mipmap()Charmaine Lee2016-01-1420-5/+89
| | | | | | | | | | | | | | | | This patch adds a new interface to support hardware mipmap generation. PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify if this new interface is supported; if not supported, the state tracker will fallback to mipmap generation by rendering/texturing. v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers v3: add format to the generate_mipmap interface to allow mipmap generation using a format other than the resource format v4: fix return type of trace_context_generate_mipmap() Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* st/mesa: declare struct pipe_screen in st_cb_bufferobjects.hBrian Paul2016-01-141-0/+1
| | | | To silence a compiler warning. Trivial.
* nir: Lower bitfield_extract.Matt Turner2016-01-146-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | The OpenGL specifications for bitfieldExtract() says: The result will be undefined if <offset> or <bits> is negative, or if the sum of <offset> and <bits> is greater than the number of bits used to store the operand. Therefore passing bits=32, offset=0 is legal and defined in GLSL. But the earlier SM5 ubfe/ibfe opcodes are specified to accept a bitfield width ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits of the width operand, making them not able to implement the GLSL-specified behavior directly. This commit adds ubfe/ibfe operations from SM5 and a lowering pass for bitfield_extract to to handle the trivial case of <bits> = 32 as bitfieldExtract: bits > 31 ? value : bfe(value, offset, bits) Fixes: ES31-CTS.shader_bitfield_operation.bitfieldExtract.uvec3_0 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 Reviewed-by: Connor Abbott <[email protected]> Tested-by: Marta Lofstedt <[email protected]>
* nir: Handle <bits>=32 case in bitfield_insert lowering.Matt Turner2016-01-142-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The OpenGL specifications for bitfieldInsert() says: The result will be undefined if <offset> or <bits> is negative, or if the sum of <offset> and <bits> is greater than the number of bits used to store the operand. Therefore passing bits=32, offset=0 is legal and defined in GLSL. But the earlier SM5 bfi opcode is specified to accept a bitfield width ranging from 0-31. As such, Intel and AMD instructions read only the low 5 bits of the width operand, making them not able to implement the GLSL-specified behavior directly. This commit fixes the lowering of bitfield_insert to handle the trivial case of <bits> = 32 as bitfieldInsert: bits > 31 ? insert : bfi(bfm(bits, offset), insert, base) Fixes: ES31-CTS.shader_bitfield_operation.bitfieldInsert.uint_2 ES31-CTS.shader_bitfield_operation.bitfieldInsert.uvec4_3 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92595 Reviewed-by: Connor Abbott <[email protected]> Tested-by: Marta Lofstedt <[email protected]>
* st/mesa: add check for color logicop in blit_copy_pixels()Brian Paul2016-01-141-0/+1
| | | | | | | | We check that a bunch of raster operations are disabled in blit_copy_pixels(). We also need to check that color logicop is disabled. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: do not reallocate user memory buffersNicolai Hähnle2016-01-144-8/+43
| | | | | | | | | The whole point of AMD_pinned_memory is that applications don't have to map buffers via OpenGL - but they're still allowed to, so make sure we don't break the link between buffer object and user memory unless explicitly instructed to. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-145-9/+22
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: reset valid_buffer_range on PIPE_TRANSFER_DISCARD_WHOLE_RESOURCENicolai Hähnle2016-01-141-0/+3
| | | | | | | This accomodates a streaming pattern where the discard flag is set when the application wraps back to the beginning of the buffer. Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: implement Driver.InvalidateBufferSubDataNicolai Hähnle2016-01-143-3/+33
| | | | Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: use pipe->invalidate_resource instead of buffer re-allocationNicolai Hähnle2016-01-141-13/+18
| | | | | | | Drivers are expected to avoid unnecessary work when possible in this code path. Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-1417-2/+23
| | | | | | | | | It makes sense to re-use pipe->invalidate_resource for the purpose of glInvalidateBufferData, but this function is already implemented in vc4 where it doesn't have the expected behavior. So add a capability flag to indicate that the driver supports the expected behavior. Reviewed-by: Marek Olšák <[email protected]>
* mesa: add Driver.InvalidateBufferSubDataNicolai Hähnle2016-01-142-8/+9
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: fix the checks in _mesa_InvalidateBuffer(Sub)DataNicolai Hähnle2016-01-141-3/+15
| | | | | | | | Change the check to be in line with what the quoted spec fragment says. I have sent out a piglit test for this as well. Reviewed-by: Ian Romanick <[email protected]>
* winsys/radeon: fix warnings about incompatible pointer typesNicolai Hähnle2016-01-141-6/+6
| | | | | | | Some confusion between pb_buffer and radeon_bo as well as between radeon_drm_winsys and radeon_winsys. Reviewed-by: Marek Olšák <[email protected]>
* texobj: Check completeness with InternalFormat rather than Mesa formatNeil Roberts2016-01-141-1/+1
| | | | | | | | | | | | | | The internal Mesa format used for a texture might not match the one requested in the internalFormat when the texture was created, for example if the driver is internally remapping RGB textures to RGBA. Otherwise it can cause false positives for completeness if one mipmap image is created as RGBA and the other as RGB because they would both have an RGBA Mesa format. If we check the InternalFormat instead then we are directly checking the API usage which I think better matches the intention of the check. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93700 Reviewed-by: Anuj Phogat <[email protected]>
* i965: Remove unused hw_must_use_separate_stencilBen Widawsky2016-01-133-5/+1
| | | | | | | | | | I spotted this while looking for what needs updating in future platforms. I'm too lazy to go through the git logs, but it was probably missed by Jason when all the brw refactoring happened. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop extra newline from shader compile messages.Matt Turner2016-01-132-2/+2
| | | | | Ilia changed shader-db's run.c to not expect messages to contain a newline in shader-db commit 51bbc8035.
* nir: Change bfm's semantics to match Intel/AMD/SM5.Matt Turner2016-01-131-3/+6
| | | | | | | | | | | | Intel/AMD's hardware instructions do not handle arguments of 32. Constant evaluation should not produce a result different from the hardware instruction. The s/1ull/1u/ change is intentional: previously we wanted defined behavior for the "1 << 32" case, but we're making this case undefined so we can make it 1u and save ourselves a 64-bit operation. Reviewed-by: Ian Romanick <[email protected]>
* glsl: Fix undefined shifts.Matt Turner2016-01-132-7/+7
| | | | | | | Shifting into the sign bit is undefined, as is shifting by 32. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Handle failure of Python codegen scripts.Matt Turner2016-01-131-5/+5
| | | | | | | | If a Python codegen script failed, it would write a zero-byte file, which on subsequent invocations of make would trick it into thinking the file was appropriately generated. Reviewed-by: Ian Romanick <[email protected]>
* glsl, nir: Make ir_triop_bitfield_extract a vectorized operation.Kenneth Graunke2016-01-136-15/+20
| | | | | | | | | | | | | | | | | | | | We would like to be able to combine result.x = bitfieldExtract(src0.x, src1.x, src2.x); result.y = bitfieldExtract(src0.y, src1.y, src2.y); result.z = bitfieldExtract(src0.z, src1.z, src2.z); result.w = bitfieldExtract(src0.w, src1.w, src2.w); into a single ivec4 bitfieldInsert operation. This should be possible with most drivers. This patch changes the offset and bits parameters from scalar ints to ivecN or uvecN. The type of all three operands will be the same, for simplicity. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* glsl, nir: Make ir_quadop_bitfield_insert a vectorized operation.Kenneth Graunke2016-01-137-18/+24
| | | | | | | | | | | | | | | | | | | | We would like to be able to combine result.x = bitfieldInsert(src0.x, src1.x, src2.x, src3.x); result.y = bitfieldInsert(src0.y, src1.y, src2.y, src3.y); result.z = bitfieldInsert(src0.z, src1.z, src2.z, src3.z); result.w = bitfieldInsert(src0.w, src1.w, src2.w, src3.w); into a single ivec4 bitfieldInsert operation. This should be possible with most drivers. This patch changes the offset and bits parameters from scalar ints to ivecN or uvecN. The type of all four operands will be the same, for simplicity. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: Delete the ir_binop_bfm and ir_triop_bfi opcodes.Kenneth Graunke2016-01-1313-145/+16
| | | | | | | | | | | | | TGSI doesn't use these - it just translates ir_quadop_bitfield_insert directly. NIR can handle ir_quadop_bitfield_insert as well. These opcodes were only used for i965, and with Jason's recent patches, we can do this lowering in NIR (which also gains us SPIR-V handling). So there's not much point to retaining this GLSL IR lowering code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* nir: Fix constant evaluation of bfm.Matt Turner2016-01-131-1/+1
| | | | | | | | NIR's bfm, like Intel/AMD's hardware instructions and GLSL IR's ir_binop_bfm takes <bits> as src0 and <offset> as src1. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/fs: Skip assertion on NaN.Matt Turner2016-01-131-1/+2
| | | | | | | | | A shader in Unreal4 uses the result of divide by zero in its color output, producing NaN and triggering this assertion since NaN is not equal to itself. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93560 Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/fs: Add debugging to constant combining pass.Matt Turner2016-01-131-1/+20
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* meta: remove const qualifier on _mesa_meta_fb_tex_blit_begin()Brian Paul2016-01-132-2/+2
| | | | | | To silence a compiler warning about a const/non-const mismatch. Reviewed-by: Ian Romanick <[email protected]>
* st/mesa: fix incorrect buffer token passed to _mesa_BindFramebuffer()Brian Paul2016-01-131-2/+2
| | | | | | | | I added this code right at the end, and got it wrong. Only used by the WGL_ARB_render_texture code. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* docs: add news item and link release notes for 11.1.1Emil Velikov2016-01-132-0/+7
| | | | Signed-off-by: Emil Velikov <[email protected]>
* docs: add sha256 checksums for 11.1.1Emil Velikov2016-01-131-1/+2
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 4b2d9f29e9b75cbbeb76ccf753a256e11f07ee1a)
* docs: add release notes for 11.1.1Emil Velikov2016-01-131-0/+196
| | | | | Signed-off-by: Emil Velikov <[email protected]> (cherry picked from commit 330aa44a0da7548000a6b2fc2bb580e9c8e733cc)
* i965/gen9: Don't allow the RGBX formats for texturing/renderingNeil Roberts2016-01-131-0/+28
| | | | | | | | | | | | | | | | | The RGBX surface formats aren't renderable so we internally remap them to RGBA when rendering. They are retained as RGBX when used as textures. However since the previous patch fast clears are disabled for surfaces that use a different format for rendering than for texturing. To avoid this situation we can just pretend not to support RGBX formats at all. This will cause the upper layers of mesa to pick an RGBA format internally instead. This should be safe because we always override the alpha component to 1.0 for RGBX in the texture swizzle anyway. We could also do this for all gens except that it's a bit more difficult when the hardware doesn't support texture swizzling. Gens using the blorp have further problems because that doesn't implement this swizzle override. Reviewed-by: Anuj Phogat <[email protected]>
* radeonsi: move POSITION and FACE fragment shader inputs to system valuesMarek Olšák2016-01-133-45/+25
| | | | | | And FACE becomes integer instead of float. Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: simplify gl_FragCoord behaviorMarek Olšák2016-01-131-23/+22
| | | | | | It will become a system value, not an input. Reviewed-by: Edward O'Callaghan <[email protected]>
* glsl: add image_format check in cross_validate_globals()Samuel Iglesias Gonsálvez2016-01-131-0/+6
| | | | | | | | | | | Fixes CTS test: ES31-CTS.shader_image_load_store.negative-linkErrors Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93410 Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: do not validate io of non-compute and compute stageTapani Pälli2016-01-131-0/+7
| | | | | | | | | Fixes regression on SSO tests that have both non-compute and compute programs in a program pipeline. Signed-off-by: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93532 Reviewed-by: Marta Lofstedt <[email protected]>
* glsl: add packed varyings for outputs with single stage programTapani Pälli2016-01-131-7/+2
| | | | | | | | | | | | | | Commit 8926dc8 added a check where we add packed varyings of output stage only when we have multiple stages, however duplicates are already handled by changes in commit 0508d950 and we want to add outputs also in case where we have only one stage. Fixes regression caused by 8926dc8 for following test: ES31-CTS.program_interface_query.separate-programs-vertex Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Marta Lofstedt <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* llvmpipe: (trivial) use cast wrapper for __m128d to __m128 castsRoland Scheidegger2016-01-131-2/+2
| | | | some compiler was unhappy.