summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* glsl: split layout_defaults into specific typesTimothy Arceri2016-01-201-4/+22
| | | | | | | | This will allow merging of duplicate layout qualifiers as allowed by ARB_shading_language_420pack Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: allow duplicate layout-qualifier-namesTimothy Arceri2016-01-201-1/+2
| | | | | | | | | | | | | | | This is added by ARB_enhanced_layouts although it doesn't fit into any of the six main changes so we enable this independently. From the ARB_enhanced_layouts spec: "More than one layout qualifier may appear in a single declaration. Additionally, the same layout-qualifier-name can occur multiple times within a layout qualifier or across multiple layout qualifiers in the same declaration" Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vec4: Spaces around operators.Matt Turner2016-01-191-1/+1
|
* i965: Inform compiler of variable range to silence warning.Matt Turner2016-01-191-1/+2
| | | | | | | Extends commit 6531ccb70 to silence the warning in release builds as well. Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: Restore Mesa-style to shader_enums.c/h.Matt Turner2016-01-192-16/+24
|
* st/va: fix motion adaptive deinterlacingChristian König2016-01-191-1/+1
| | | | Signed-off-by: Christian König <[email protected]>
* util/u_pstipple.c: copy immediates during transformationNicolai Hähnle2016-01-191-0/+1
| | | | | | | | | | | Apparently, nobody has combined stippling with a fragment shader containing immediates in almost five years... Fixes a bug in Kodi with radeonsi reported by Christian König. Cc: "11.0 11.1" <[email protected]> Tested-by: Christian König <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: Move sanity check of BindVertexBuffer for OpenGL ES 3.1Marta Lofstedt2016-01-192-1/+5
| | | | | | | | | Sanity check of BindVertexBuffer for OpenGL ES in _mesa_handle_bind_buffer_gen breaks OpenGL ES 2 conformance. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93426 Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: fix interface block error messageTimothy Arceri2016-01-191-1/+1
| | | | | | | | Print the stream value not the pointer to the expression, also use the unsigned format specifier. Cc: 11.1 <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: swap the least-ref'd source into src1 when both const/immIlia Mirkin2016-01-181-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The whole point of inlining sources is to reduce loads. We can end up in a situation where one value is used a lot of times, and one value is used only once per instruction. The once-per-instruction one is the one that should get inlined, but with the previous algorithm, it was given no preference. This flips things around to preferring putting less-referenced values into src1 which increases the likelihood of them being inlined. While we're at it, adjust the heuristic to not treat 0 as an immediate, as well as (effectively) check for situations where LIMMs can't be loaded. All this yields improvements on nvc0: total instructions in shared programs : 6261157 -> 6255985 (-0.08%) total gprs used in shared programs : 945082 -> 943417 (-0.18%) total local used in shared programs : 30372 -> 30288 (-0.28%) total bytes used in shared programs : 50089256 -> 50047880 (-0.08%) local gpr inst bytes helped 21 822 3332 3332 hurt 0 278 565 565 And more importantly avoids generating really bad code with SSBOs, where we end up checking a lot of different values (usually immediates) against the length. On nv50 we get comparable results, and even improve packing (bytes went down more than instructions): total instructions in shared programs : 6346564 -> 6341277 (-0.08%) total gprs used in shared programs : 728719 -> 725131 (-0.49%) total local used in shared programs : 3552 -> 3552 (0.00%) total bytes used in shared programs : 43995688 -> 43932928 (-0.14%) local gpr inst bytes helped 0 1380 3252 3774 hurt 0 287 1710 1365 Signed-off-by: Ilia Mirkin <[email protected]>
* st/mesa: restore the stObj's size if it was cleared outIlia Mirkin2016-01-181-0/+6
| | | | | | | | | | | | | An issue could still occur if the base level is set, but fixing that would require a lot more logic. This fixes the recently-failing texelFetch 3D tests because the mipmaps were no longer being generated, which in turn caused the copying logic to be hit, which in turn didn't work because of the broken width/height/depth. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno/a4xx: use smaller threadsize for more registersRob Clark2016-01-181-2/+5
| | | | | | | | Once we go past half of the "GPR" register file, it seems like we need to run frag shader with smaller threadsize. (The vertex shader already runs at TWO_QUADS, which is the minimum.) Signed-off-by: Rob Clark <[email protected]>
* freedreno: per-generation OUT_IB packetRob Clark2016-01-189-6/+43
| | | | | | | | | | Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled) version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per generation. Switch a3xx and a4xx over to using prefetch-enabled version (which is also what blob does.. it seems only on a2xx we cannot use PFE). Signed-off-by: Rob Clark <[email protected]>
* gallium: bundle the compat header u_pwr8.h in the tarballEmil Velikov2016-01-181-0/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* mapi: include gl.xml in the tarballEmil Velikov2016-01-181-0/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* i965: adding missing headers to the dist tarballEmil Velikov2016-01-181-0/+2
| | | | Signed-off-by: Emil Velikov <[email protected]>
* st/va: add motion adaptive deinterlacing v2Christian König2016-01-184-7/+82
| | | | | | v2: minor cleanup Signed-off-by: Christian König <[email protected]>
* gallium/radeon: Rename do_invalidate_resource to invalidate_bufferMichel Dänzer2016-01-181-4/+6
| | | | | | | And only call it from r600_invalidate_resource for buffer resources. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/dri: Don't call invalidate_resource for NULL depth/stencil buffersMichel Dänzer2016-01-181-2/+4
| | | | | | | Fixes crash in 4 EGL piglit tests with radeonsi. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDRMichel Dänzer2016-01-181-0/+3
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* radeonsi: Print "LLVM emitted unknown config register" warning only onceMichel Dänzer2016-01-181-2/+9
| | | | | | | Say "LLVM" instead of "Compiler" for clarity. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: use vpkswss when dst is signedOded Gabbay2016-01-181-16/+15
| | | | | | | | | | | | | | | | | | | This patch fixes a bug when building a pack instruction. For POWER (altivec), in case the destination is signed and the src width is 32, we need to use vpkswss. The original code used vpkuwus, which emits an unsigned result. This fixes the following piglit tests on ppc64le: - spec@arb_color_buffer_float@gl_rgba8-drawpixels - shaders@glsl-fs-fogscale I've also corrected some coding style issues in the function. v2: Returned else statements to vmware style Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* glsl: fix subroutine lowering reusing actual parmatersDave Airlie2016-01-181-5/+19
| | | | | | | | | | | | | | | One of the oglconform tests was crashing here, and it was due to not cloning the actual parameters before creating the new call. This makes a call clone function that does the right things to make sure we clone all the needed info, and points the callee at it. (It differs from ->clone due to this). this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this patch in my cts fixes tree, but hadn't had time to make sure I liked it. Cc: "11.0 11.1" <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: remove special case for detecting stream duplicatesTimothy Arceri2016-01-181-5/+0
| | | | | | | Any duplicates in a single declaration will already fail the generic duplicates test due to the explicit_stream flag being set. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: add missing explicit_stream flag to has_layout()Timothy Arceri2016-01-181-1/+2
| | | | | | | | | This will allow the ARB_shading_language_420pack rules in glsl_parser.yy for catching duplicate layout qualifiers to be triggered for the stream identifier rather than relying on the code meant to catch duplicates within a single layout(...) Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* mesa: fix segfault in glUniformSubroutinesuiv()Timothy Arceri2016-01-181-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL 4.5 Core spec: "The command void UniformSubroutinesuiv(enum shadertype, sizei count, const uint *indices); will load all active subroutine uniforms for shader stage shadertype with subroutine indices from indices, storing indices[i] into the uniform at location i. The indices for any locations between zero and the value of ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not used will be ignored." V2: simplify NULL check suggested by Jason. Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Cc: "11.0 11.1" [email protected] https://bugs.freedesktop.org/show_bug.cgi?id=93731
* glsl: fix segfault linking subroutine uniform with explicit locationTimothy Arceri2016-01-181-1/+1
| | | | | Reviewed-by: Dave Airlie <[email protected]> Cc: "11.0 11.1" [email protected]
* gm107/ir: don't do indirect frag shader inputs on GM107Ilia Mirkin2016-01-171-0/+1
| | | | | | | | Apparently the IPA op decided to stop working with offsets. Need to figure out if we need to do an AL2P situation or something similar. For now just turn it back off. Signed-off-by: Ilia Mirkin <[email protected]>
* tgsi: initialize Atomic field in tgsi_default_declarationIlia Mirkin2016-01-171-0/+1
| | | | | | | | Spotted by Coverity. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0: bsp_bo can't be nullIlia Mirkin2016-01-171-1/+1
| | | | | | | We already deref it earlier. And these are all allocated on load. Spotted by Coverity. Signed-off-by: Ilia Mirkin <[email protected]>
* llvmpipe: fix arguments order given to vec_andcOded Gabbay2016-01-172-1/+7
| | | | | | | | | | | | | | | | This patch fixes a classic "confuse the enemy" bug. _mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the arguments are opposite. _mm_andnot_si128 performs "r = (~a) & b" while vec_andc performs "r = a & (~b)" To make sure this error won't return in another place, I added a wrapper function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* freedreno/ir3: fix mad 3rd src delay calcRob Clark2016-01-171-1/+1
| | | | | | | In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by one, but missed updating delay-slot calc. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: better array register allocationRob Clark2016-01-162-9/+51
| | | | | | | Detect arrays which don't conflict with each other and allow overlapping register allocation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: array offset can be negativeRob Clark2016-01-165-12/+13
| | | | | | | | | | | | | | | | | | | | | | It at least happens with some piglit tests, like $piglit/bin/vp-address-01 VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], COLOR DCL CONST[0..7] DCL ADDR[0] 0: ARL ADDR[0].x, IN[1].xxxx 1: MOV_SAT OUT[1], CONST[ADDR[0].x-1] 2: DP4 OUT[0].x, CONST[4], IN[0] 3: DP4 OUT[0].y, CONST[5], IN[0] 4: DP4 OUT[0].z, CONST[6], IN[0] 5: DP4 OUT[0].w, CONST[7], IN[0] 6: END Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: workaround bug/featureRob Clark2016-01-161-0/+9
| | | | | | | | | | Seems like in certain cases, we cannot use c<a0.x+0> as the third src to cat3 instructions. This may be slightly conservative, we may only have this restriction when the first src is also const. This fixes, for example, +24/-0 of the variable-indexing piglit tests. Signed-off-by: Rob Clark <[email protected]>
* ttn: use writemask for store_varRob Clark2016-01-161-26/+2
| | | | | | | Only user is freedreno, and after array-rework it can cope. Avoids generating loads for a store. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: array reworkRob Clark2016-01-169-363/+365
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: refactor/simplify cpRob Clark2016-01-161-87/+82
| | | | | | | | | If we handle separately the special case of eliminating output mov (which includes keeps and various other cases where we don't have a consuming instruction's src register to collapse things into), we can simplify the logic. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix incorrect decoding of mov instructionsRob Clark2016-01-161-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove unused tgsi tokens ptrRob Clark2016-01-161-1/+0
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: bit of ra refactorRob Clark2016-01-161-25/+20
| | | | | | | Shuffle things slightly, passing instr-data to ra_name() to reduce the number of places where we need to add support for array names. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: cosmetic de-indentRob Clark2016-01-161-36/+34
| | | | | | Collapse two nested if's into one to reduce indent level. Signed-off-by: Rob Clark <[email protected]>
* ttn: add missing writemask on store_outputRob Clark2016-01-161-0/+1
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/print: const_index is signedRob Clark2016-01-161-1/+1
| | | | | | | Noticed this with $piglit/bin/vp-address-01 Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: few missing struct namesRob Clark2016-01-161-3/+3
| | | | | | | | | | | | | nir.h is a bit inconsistent about 'typedef struct {} nir_foo' vs 'typedef struct nir_foo {} nir_foo'. But missing struct name tags is inconvenient when you need a fwd declaration without pulling in all of nir. So add missing struct name tag for nir_variable, and a couple other spots where it would likely be useful. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* nv50/ir: add saturate support on ex2Ilia Mirkin2016-01-162-0/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallivm: avoid crashing in mod by 0 with llvmpipeJeff Muizelaar2016-01-161-2/+16
| | | | | | | This adds code that is basically the same as the code in umod, udiv and idiv. However, unlike idiv we return -1. Reviewed-by: Roland Scheidegger <[email protected]>
* glsl: Allow implicit int -> uint conversions for bitwise operators (&, ^, |).Kenneth Graunke2016-01-151-8/+38
| | | | | | | | | | | | | | | | | | The ARB has decided that implicit conversions should be performed for bitwise operators in future language revisions. Implementations of current language revisions may or may not perform them. This patch makes Mesa apply implicti conversions even on current language versions. Applications appear to expect this behavior, and there's really no downside to doing so. Fixes shader compilation in Shadow of Mordor. Bugzilla: https://www.khronos.org/bugzilla/show_bug.cgi?id=1405 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Cc: [email protected]
* i965/fs: Always set channel 2 of texture headers in some stagesJason Ekstrand2016-01-151-0/+8
| | | | | | | | | | | | | | | | In the vertex and fragment stages, the hardware is nice to us and leaves g0.2 zerod out for us so we can use it for headers. However, in compute, geometry, and tessellation stages, the hardware is not so nice. In particular, for compute shaders on BDW, the hardware places some debug bits in 23:15. As it happens, bit 15 is interpreted by the sampler as the alpha channel mask. This means that if you use a texturing instruction with a header in a compute shader, you may randomly get the alpha channel disabled. Since channel masks affect the return length of the sampler message, this can lead the GPU to expect a different mlen to the one you specified in the shader and this, in turn, hangs your GPU. Cc: "11.1" <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs/generator: Take an actual shader stage rather than a stringJason Ekstrand2016-01-157-11/+14
| | | | | | Cc: "11.1" <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]>