aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
...
* i965/sampler_state: Clamp min/max LOD to 14 on gen7+Jason Ekstrand2017-02-121-2/+5
| | | | | | Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "17.0" <[email protected]>
* st/mesa: don't pass compare mode for stencil-sampled texturesIlia Mirkin2017-02-121-1/+1
| | | | | | | | Fixes dEQP-GLES31.functional.stencil_texturing.misc.compare_mode_effect Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: [email protected]
* nv50,nvc0: use alternate samplers for stencilIlia Mirkin2017-02-121-3/+3
| | | | | | | | The blob uses these, and it fixes a bunch of dEQP stencil sampling tests involving border colors. Probably the Z-based samplers work somehow differently wrt border colors when using the stencil swizzle. Signed-off-by: Ilia Mirkin <[email protected]>
* radv: Fix radv_GetPhysicalDeviceQueueFamilyProperties2KHR.Bas Nieuwenhuizen2017-02-131-9/+36
| | | | | | | The struct have different size, so the arrays have different stride. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* etnaviv: Set shader instruction area correctly for GC3000Wladimir J. van der Laan2017-02-121-5/+21
| | | | | | | | | | | | | - Use the same instruction area on GC3000 as the Vivante driver. This allows the same number of instructions on GC3000 as GC2000 instead of half. - Makes sure that the "PE to FE" stall before updating the shader code or constants is hit (which is conditional on vs_offset > 0x4000). This is necessary on GC3000 too, it increases stability. Signed-off-by: Wladimir J. van der Laan <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: Update hw header filesWladimir J. van der Laan2017-02-125-48/+160
| | | | | | | | Update from etnaviv repository rnndb. This adds some newly discovered state for GC3000 (and some GC2000) features. Signed-off-by: Wladimir J. van der Laan <[email protected]> Acked-by: Christian Gmeiner <[email protected]>
* radv: reduce CPU overhead merging bo lists.Dave Airlie2017-02-121-1/+11
| | | | | | | | | | | | Just noticed we do a fair bit of unneeded searching here. Since we know that the buffers in a CS are unique already, the first time we get any buffers, we can just memcpy those into place, and when we are searching for subsequent CSes, we only have to search up until where the previous unique buffers were. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nvc0: set the render condition in the compute objectIlia Mirkin2017-02-111-2/+10
| | | | | | | Fixes GL45-CTS.compute_shader.conditional-dispatching Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* gm107/ir: fix address offset bitfield for ATOMSIlia Mirkin2017-02-111-1/+1
| | | | | | | Fixes GL45-CTS.compute_shader.atomic-case1 on Maxwell Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50/ir: convert an ATOM.EXCH without a destination into a storeIlia Mirkin2017-02-111-0/+5
| | | | | | | | | | On SM35 there does not appear to be a way to emit a ATOM.EXCH with a null destination. This should be functionally equivalent to a plain store however, so just do that. Fixes GL45-CTS.compute_shader.atomic-case2 on SM35. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: fix 64-bit integer query buffer writesIlia Mirkin2017-02-113-20/+37
| | | | | | | The former logic just plain didn't work at all. We need to write the subsequent dword to the next buffer location. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: return a register when retrieving thread id sysvalIlia Mirkin2017-02-111-1/+1
| | | | | | | | | | We have logic to short-circuit such retrievals to zero. However "zero" was an immediate, and some logic expected to get registers (to later be propagated). Fix this by using loadImm. Fixes GL45-CTS.gpu_shader5.images_array_indexing Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add missing break after DSSGIlia Mirkin2017-02-111-0/+1
| | | | | | Recently broken during int64 addition. Signed-off-by: Ilia Mirkin <[email protected]>
* etnaviv: shader-db tracesChristian Gmeiner2017-02-114-1/+47
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-By: Wladimir J. van der Laan <[email protected]>
* etnaviv: keep track of emitted loopsChristian Gmeiner2017-02-112-0/+7
| | | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Lucas Stach <[email protected]> Reviewed-by: Wladimir J. van der Laan <[email protected]>
* etnaviv: wire up core pipe_debug_callbackChristian Gmeiner2017-02-112-0/+15
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Lucas Stach <[email protected]>
* glsl: non-last member unsized array on SSBO must fail compilation on GLSL ES 3.1Jose Maria Casanova Crespo2017-02-101-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | From GLSL ES 3.10 spec, section 4.1.9 "Arrays": "If an array is declared as the last member of a shader storage block and the size is not specified at compile-time, it is sized at run-time. In all other cases, arrays are sized only at compile-time." In desktop GLSL it is allowed to have unsized-arrays that are not last, as long as we can determine that they are implicitly sized, which is detected at link-time. With this patch Mesa reports a compilation error as glslang does with the following shader: buffer SSBO { vec4 data[]; vec4 moreData;}; void main (void) { } Fixes: dEQP-GLES31.functional.debug.negative_coverage.log.shader.compile_compute_shader dEQP-GLES31.functional.debug.negative_coverage.callbacks.shader.compile_compute_shader dEQP-GLES31.functional.debug.negative_coverage.get_error.shader.compile_compute_shader Cc: "17.0" <[email protected]> Signed-off-by: Jose Maria Casanova Crespo <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* vc4: Enable glSampleMask() even when !rasterizer->multisample.Eric Anholt2017-02-101-2/+1
| | | | | | | | gallium's blitter expects that it can set the sample mask even when the rasterizer doesn't have the flag on. Between this and the previous test, 10 new ext_framebuffer_multisample tests start passing.
* vc4: Respect glSampleMask() even when we're not writing color.Eric Anholt2017-02-101-3/+13
| | | | | | gallium's quad-based blitter for copying MSAA depth textures expects to be able to do 4 passes updating a sample at a time using glSampleMask, and there's no color buffer bound when it's doing that.
* vc4: Use the nir_builder helper for loading sample mask.Eric Anholt2017-02-101-10/+1
|
* vc4: Use accurate 1/w in coordinate shader as well as vert shader.Eric Anholt2017-02-101-1/+1
| | | | | We probably shouldn't be emitting different scaled viewport coordinates between vertex and coord.
* vc4: Drop VS inputs to 8.Eric Anholt2017-02-101-4/+1
| | | | | | | In the hardware we only get to declare 8 vertex elements (GLES2's minimum), so we should be exposing that number here. Fixes an assertion failure in piglit texrect-many, at the expense of various GL 2.0-ish minmax tests now complaining that our count is too low.
* vc4: Avoid emitting small immediates for UBO indirect load address guards.Eric Anholt2017-02-105-4/+20
| | | | | | | | | | | | The kernel will reject our shader if we emit one here, and having 4, 8, or 12 as the top end of our UBO clamp rare is enough that it's not worth making the kernel let us. Fixes piglit fs-const-array-of-struct and fs-const-array-of-struct-of-array since recent GLSL linking changes made us get this as an indirect load of a uniform, instead of a tempoary. Cc: "13.0 17.0" <[email protected]>
* util/disk_cache: use stat() to check if entry is a directoryTimothy Arceri2017-02-101-9/+24
| | | | | | | | d_type is not supported on all systems. Tested-by: Vinson Lee <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97967
* st/nine: update configure options in the READMEEmil Velikov2017-02-101-2/+1
| | | | | Cc: Axel Davy <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* loader: unconditionally include unistd.h and stdlib.hNicolai Hähnle2017-02-101-2/+2
| | | | | | | | | | | | | | | | | Otherwise we would fail with "implicit declaration of function" geteuid and getenv respectively. To trigger (re)move the libdrm.pc file and use the following: $ ./autogen.sh --disable-egl --disable-gbm --disable-dri \ --with-dri-drivers=swrast --with-gallium-drivers=swrast $ make Cc: Vinson Lee <[email protected]> Fixes: 3f462050c ("loader: Add an environment variable to override driver name choice. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99701 v2: [Emil: handle stdlib.h add commit message] Signed-off-by: Emil Velikov <[email protected]>
* intel/blorp: do not return const data by get_px_size_sa()Emil Velikov2017-02-101-1/+1
| | | | | | | | | | | | | Not much point in the const qualifier since we provide a copy to the user. Resolves the following -Wignored-qualifiers warning. src/intel/blorp/blorp_blit.c:1857:8: warning: 'const' type qualifier on return type has no effect [-Wignored-qualifiers] v2: keep const qualifier of local variable. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* gallium/radeon: use staging for texture read mappings from GTT WCMarek Olšák2017-02-101-4/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: ignore the level parameter in buffer_transfer_mapMarek Olšák2017-02-101-5/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: fix performance of buffer readbacksMarek Olšák2017-02-101-8/+9
| | | | | | | | | | | | | | | We want cached GTT for all non-persistent read mappings. Set level = 0 on purpose. Use dma_copy, because resource_copy_region causes a failure in the PBO read of piglit/getteximage-luminance. If Rocket League used the READ flag, it should get cached GTT. v2: mask out UNSYNCHRONIZED Cc: 13.0 17.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: align vertex buffer descriptor list size for optimal prefetchMarek Olšák2017-02-104-2/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: align shader binaries to CP DMA alignment for optimal prefetchMarek Olšák2017-02-101-1/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move CP_DMA_ALIGNMENT definitionMarek Olšák2017-02-102-10/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFERMarek Olšák2017-02-103-6/+6
| | | | | | not necessary Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove separate CB/DB_META flush flagsMarek Olšák2017-02-103-17/+8
| | | | | | not used separately Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: reduce the number of FMASK input coordinatesMarek Olšák2017-02-101-7/+3
| | | | | | | | | Before: image_load v3, v[0:3] ... After: image_load v3, v[0:1] ... Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: write shader asm annotated with wave info into GPU hang reportsMarek Olšák2017-02-103-3/+252
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note that the disassembly is written twice - first the unmodified compiler output and then the wave-annotated output only if there are waves executing the shader. Sample output from a real GPU hang most likely caused by image_sample: The number of active waves = 28 Pixel Shader - annotated disassembly: s_mov_b64 s[6:7], exec ; BE86017E [PC=0x10f3e3800, off=0, size=4] s_wqm_b64 exec, exec ; BEFE077E [PC=0x10f3e3804, off=4, size=4] ... image_sample v[7:9], v[0:1], s[12:19], s[20:23] dmask:0x7 ; F0800700 00A30700 [PC=0x10f3e3a94, off=660, size=8] s_buffer_load_dword s20, s[0:3], 0x50 ; C0220500 00000050 [PC=0x10f3e3a9c, off=668, size=8] s_load_dwordx4 s[24:27], s[4:5], 0x170 ; C00A0602 00000170 [PC=0x10f3e3aa4, off=676, size=8] s_load_dwordx8 s[12:19], s[4:5], 0x140 ; C00E0302 00000140 [PC=0x10f3e3aac, off=684, size=8] s_buffer_load_dword s11, s[0:3], 0x5c ; C02202C0 0000005C [PC=0x10f3e3ab4, off=692, size=8] s_buffer_load_dword s21, s[0:3], 0x54 ; C0220540 00000054 [PC=0x10f3e3abc, off=700, size=8] s_buffer_load_dword s22, s[0:3], 0x58 ; C0220580 00000058 [PC=0x10f3e3ac4, off=708, size=8] s_waitcnt vmcnt(0) ; BF8C0F70 [PC=0x10f3e3acc, off=716, size=4] ^ SE0 SH0 CU1 SIMD1 WAVE0 EXEC=aaaaaaa555aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD2 WAVE0 EXEC=aaaa85555555552a INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD3 WAVE0 EXEC=000000000000000a INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD1 WAVE0 EXEC=25a5a5aa82aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD3 WAVE0 EXEC=50aaaa8fffa55555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE0 EXEC=5554aaaaaaa1a555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE1 EXEC=aaaa5555ffffffff INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD1 WAVE0 EXEC=555557aaaaaaaaa5 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD3 WAVE0 EXEC=5555aaaaaaaaaa85 INST32=BF8C0F70 ^ SE1 SH0 CU3 SIMD1 WAVE0 EXEC=aaaaaaaaaaaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD0 WAVE0 EXEC=aaaaaaaa5a5a5a5a INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD1 WAVE0 EXEC=aaaaaaa5a5a5a4a5 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD2 WAVE0 EXEC=5555555000000000 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD3 WAVE0 EXEC=aa555554155aaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD0 WAVE0 EXEC=55ffff55555555aa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD1 WAVE0 EXEC=555555555aaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD2 WAVE0 EXEC=a0aaaaaaa8555555 INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD3 WAVE0 EXEC=8aaaaaaaaaaaa555 INST32=BF8C0F70 ^ SE1 SH0 CU6 SIMD0 WAVE0 EXEC=000000002aaaaaaa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD0 WAVE0 EXEC=5aaaa5400aaaa15a INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD1 WAVE0 EXEC=00aaaaaaaa5555aa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD2 WAVE0 EXEC=aa00005555554555 INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD3 WAVE0 EXEC=aaaaaaa000000000 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD0 WAVE0 EXEC=5555aaaaaaaaaaaa INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD2 WAVE0 EXEC=ffaaaaaaaaaa5555 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD3 WAVE0 EXEC=aaaa55555555aa00 INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD0 WAVE0 EXEC=00aaaaaaaaaaaa5a INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD1 WAVE0 EXEC=5a555555005555ff INST32=BF8C0F70 v_mul_f32_e32 v7, s6, v7 ; 0A0E0E06 [PC=0x10f3e3ad0, off=720, size=4] ... Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: write wave information into GPU hang reportsMarek Olšák2017-02-101-0/+20
| | | | | | | | UMR is our new debugging tool. It must have +s set for Mesa to use it without root privileges: sudo chmod +s .../umr Reviewed-by: Nicolai Hähnle <[email protected]>
* tgsi-dump: dump label if instruction has oneMarc-André Lureau2017-02-101-11/+13
| | | | | | | | | | | | | | The instruction has an associated label when Instruction.Label == 1, as can be seen in ureg_emit_label() or tgsi_build_full_instruction(). This fixes dump generating extra :0 labels on conditionals, and virgl parsing more than the expected tokens and eventually reaching "Illegal command buffer" (when parsing more than a safety margin of 10 we currently have). Signed-off-by: Marc-André Lureau <[email protected]> Cc: "13.0 17.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* tgsi: remove ureg_label_insnMarc-André Lureau2017-02-102-38/+0
| | | | | | | Unused since commit 2897cb3dba9287011f9c43cd2f214100952370c0. Signed-off-by: Marc-André Lureau <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: handle queue submission with no cs but semaphoresDave Airlie2017-02-091-2/+20
| | | | | | | | | It's legal to submit just semaphores with no command streams, this patch fixes this case by emitting the empty cs, it also handles the fence emission for this case better. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* util/disk_cache: error check asprintf()Timothy Arceri2017-02-101-5/+7
| | | | | | | Fixes: f3d911463e8 "util/disk_cache: stop using ralloc_asprintf() unnecessarily" Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* nvc0/ir: fix ubo max clamp, reset file indexIlia Mirkin2017-02-091-1/+3
| | | | | | | | | | | We just increased the max UBO, so we should also increase the clamp that we do for robustness. Similarly, as we're including the fileIndex in the new indirect value, we should reset fileIndex to 0 so that it is not added in a second time. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: always return 0 when trying to read thread id along unit dimIlia Mirkin2017-02-094-5/+17
| | | | | | | | | | | | | Many many many compute shaders only define a 1- or 2-dimensional block, but then continue to use system values that take the full 3d into account (like gl_LocalInvocationIndex, etc). So for the special case that a dimension is exactly 1, we know that the thread id along that axis will always be 0, so return it as such and allow constant folding to fix things up. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Pierre Moreau <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nvc0/ir: fix robustness guarantees for constbuf loads on kepler+ computeIlia Mirkin2017-02-091-25/+22
| | | | | | | | | | | | | Kepler and up unfortunately only support up to 8 constbufs. We work around this by loading from constbufs as if they were storage buffers. However we were not consistently applying limits to loads from these buffers. Make sure to do the same thing we do for storage buffers. Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nvc0: increase number of ubo binding pointsIlia Mirkin2017-02-091-3/+2
| | | | | | | | | | | | | Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but it's unclear what it refers to). We need to expose an extra binding point for the "program parameters", which means this must be 15. Remove the last vestige of the "use c14 for immediates" idea. Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nvc0: expose int64Ilia Mirkin2017-02-091-1/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: make it possible to have the flags def in def0Ilia Mirkin2017-02-095-12/+15
| | | | | | | There's all kinds of logic that doesn't like there being holes in defs or srcs lists. Avoid them. This also fixes the sched logic for maxwell. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add support for 64-bit shift lowering on SM20/SM30Ilia Mirkin2017-02-091-6/+62
| | | | | | | Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have to do a bit more work to get the job done. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add support for all the new int64 tgsi opcodesIlia Mirkin2017-02-096-5/+302
| | | | | | | | | | | | A few thoughts: - Some of that LegalizeSSA logic should really live much earlier and be subject to the likes of DCE and other useful passes - Some of the "lowering" done in from_tgsi should be done later so that proper optimization might be done. However this all works and the above can be improved upon later. Signed-off-by: Ilia Mirkin <[email protected]>