aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: don't flip SHL(ADD) into ADD(SHL) if ADD sources have modifiersIlia Mirkin2016-01-201-0/+2
| | | | | Fixes: 31fde8fa (nv50/ir: flip shl(add, imm) into add(shl, imm)) Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: fix load from shared memoryIlia Mirkin2016-01-201-1/+1
| | | | | | It was accidentally using the store opcode. Signed-off-by: Ilia Mirkin <[email protected]>
* gk110/ir: add partial BAR supportIlia Mirkin2016-01-201-2/+18
| | | | | | This is enough for the plain TGSI BARRIER implementation. Signed-off-by: Ilia Mirkin <[email protected]>
* Revert "glsl: move uniform calculation to link_uniforms"Tapani Pälli2016-01-203-85/+24
| | | | This reverts commit 4475d8f9169195baefa893b9b147fe20414cda7c.
* glsl: move uniform calculation to link_uniformsTapani Pälli2016-01-203-24/+85
| | | | | | | | | | | | | | | | | | | | | Patch moves uniform calculation to happen during link_uniforms, this is possible with help of UniformRemapTable that has all the reserved locations. Location assignment for implicit locations is changed so that we utilize also the 'holes' that explicit uniform location assignment might have left in UniformRemapTable, this makes it possible to fit more uniforms as previously we were lazy here and wasting space. Fixes following CTS tests: ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max ES31-CTS.explicit_uniform_location.uniform-loc-mix-with-implicit-max-array v2: code cleanups, increment NumUniformRemapTable correctly, fix find_empty_block to work properly and add some more comments. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Marta Lofstedt <[email protected]>
* glsl: add missing explicit_image_format flag to has_layout()Timothy Arceri2016-01-201-0/+1
| | | | | | | | | | | Fixes piglit regression after fixes to duplicate layout rules. Previously catching multiple layouts was relying on the code meant to catch duplicates within a single layout(...), this change triggers the rules for multiple layouts. Cc: Mark Janes <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* llvmpipe: turn depth clears into full depth/stencil clears for d24x8 formatsRoland Scheidegger2016-01-201-11/+14
| | | | | | | | | | | | If we have a d24x8 format, there is no stencil. Therefore, we can always clear these bits too, which means this will be some kind of memset rather than read-modify-write. This is good for some 7% increase or so in gears with huge window size - seems to have a bigger effect if things aren't in caches. Of course, any real app won't spend nearly as much time comparatively in clearing depth buffer in the first place, so the speedup will be much lower. Reviewed-by: Jose Fonseca <[email protected]>
* i965: Implement compute sampler state atom.Francisco Jerez2016-01-194-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a number of GLES31 CTS failures and hangs on various hardware: ES31-CTS.texture_gather.plain-gather-depth-2d ES31-CTS.texture_gather.plain-gather-depth-2darray ES31-CTS.texture_gather.plain-gather-depth-cube ES31-CTS.texture_gather.offset-gather-depth-2d ES31-CTS.texture_gather.offset-gather-depth-2darray ES31-CTS.layout_binding.sampler2D_layout_binding_texture_ComputeShader ES31-CTS.layout_binding.sampler2DArray_layout_binding_texture_ComputeShader ES31-CTS.explicit_uniform_location.uniform-loc-types-samplers ES31-CTS.compute_shader.resources-texture Some of them were actually passing by luck on some generations even though we weren't uploading sampler state tables explicitly for the compute stage, most likely because they relied on the cached sampler state left from previous rendering to be close enough. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92589 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93312 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93325 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93407 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93725 Reported-by: Marta Lofstedt <[email protected]> Reviewed-by: Marta Lofstedt <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Trigger CS state reemission when new sampler state is uploaded.Francisco Jerez2016-01-192-1/+2
| | | | | | | | | | This reuses the NEW_SAMPLER_STATE_TABLE state bit (currently only used on pre-Gen7 hardware) to signal that the sampler state tables have changed in order to make sure that the GPGPU interface descriptor is updated. Reviewed-by: Marta Lofstedt <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* glsl: Don't abbreviate tessellation shader stage names.Kenneth Graunke2016-01-191-2/+2
| | | | | | | | | | | | | | | I have a patch that writes shaders as .shader_test files, and it uses this function to create the headers (i.e. [vertex shader]). [tess ctrl shader] isn't a valid shader_runner header - it's spelled out as [tessellation control shader]. There's no real reason to abbreviate it, so spell it out. v2: Rebase on Rob's patches to move the code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* mesa: remove link validation that should be done elsewhereTimothy Arceri2016-01-201-60/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even if re-linking fails rendering shouldn't fail as the previous succesfully linked program will still be available. It also shouldn't be possible to have an unlinked program as part of the current rendering state. This fixes a subtest in: ES31-CTS.sepshaderobjs.StateInteraction This change should improve performance on CPU limited benchmarks as noted in commit d6c6b186cf308f. >From Section 7.3 (Program Objects) of the OpenGL 4.5 spec: "If a program object that is active for any shader stage is re-linked unsuccessfully, the link status will be set to FALSE, but any existing executables and associated state will remain part of the current rendering state until a subsequent call to UseProgram, UseProgramStages, or BindProgramPipeline removes them from use. If such a program is attached to any program pipeline object, the existing executables and associated state will remain part of the program pipeline object until a subsequent call to UseProgramStages removes them from use. An unsuccessfully linked program may not be made part of the current rendering state by UseProgram or added to program pipeline objects by UseProgramStages until it is successfully re-linked." "void UseProgram(uint program); ... An INVALID_OPERATION error is generated if program has not been linked, or was last linked unsuccessfully. The current rendering state is not modified." V2: apply the rule to both core and compat. Cc: Tapani Pälli <[email protected]> Cc: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: allow multiple layout qualifiers for a single declarationTimothy Arceri2016-01-203-19/+28
| | | | | | | | | | | | | | | | | | | | From the ARB_shading_language_420pack spec: "More than one layout qualifier may appear in a single declaration. If the same layout-qualifier-name occurs in multiple layout qualifiers for the same declaration, the last one overrides the former ones." The parser was already failing correctly when the extension is not available but testing for duplicates within a single layout qualifier was still causing this to fail when available as both cases share the same function for merging. Here we add a parameter to differentiate between the two uses and apply it to the duplicate test. Acked-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: update parser to allow duplicate default layout qualifiersTimothy Arceri2016-01-203-15/+73
| | | | | | | | | | | | | | | | In order to only create a single node for each default declaration we add a new boolean parameter to the in/out merge function to only create one once we reach the rightmost layout qualifier. From the ARB_shading_language_420pack spec: "More than one layout qualifier may appear in a single declaration. If the same layout-qualifier-name occurs in multiple layout qualifiers for the same declaration, the last one overrides the former ones." Acked-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: move default layout qualifier rules out of the parserTimothy Arceri2016-01-202-28/+23
| | | | | Acked-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: split layout_defaults into specific typesTimothy Arceri2016-01-201-4/+22
| | | | | | | | This will allow merging of duplicate layout qualifiers as allowed by ARB_shading_language_420pack Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* glsl: allow duplicate layout-qualifier-namesTimothy Arceri2016-01-201-1/+2
| | | | | | | | | | | | | | | This is added by ARB_enhanced_layouts although it doesn't fit into any of the six main changes so we enable this independently. From the ARB_enhanced_layouts spec: "More than one layout qualifier may appear in a single declaration. Additionally, the same layout-qualifier-name can occur multiple times within a layout qualifier or across multiple layout qualifiers in the same declaration" Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vec4: Spaces around operators.Matt Turner2016-01-191-1/+1
|
* i965: Inform compiler of variable range to silence warning.Matt Turner2016-01-191-1/+2
| | | | | | | Extends commit 6531ccb70 to silence the warning in release builds as well. Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: Restore Mesa-style to shader_enums.c/h.Matt Turner2016-01-192-16/+24
|
* st/va: fix motion adaptive deinterlacingChristian König2016-01-191-1/+1
| | | | Signed-off-by: Christian König <[email protected]>
* util/u_pstipple.c: copy immediates during transformationNicolai Hähnle2016-01-191-0/+1
| | | | | | | | | | | Apparently, nobody has combined stippling with a fragment shader containing immediates in almost five years... Fixes a bug in Kodi with radeonsi reported by Christian König. Cc: "11.0 11.1" <[email protected]> Tested-by: Christian König <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: Move sanity check of BindVertexBuffer for OpenGL ES 3.1Marta Lofstedt2016-01-192-1/+5
| | | | | | | | | Sanity check of BindVertexBuffer for OpenGL ES in _mesa_handle_bind_buffer_gen breaks OpenGL ES 2 conformance. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93426 Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* glsl: fix interface block error messageTimothy Arceri2016-01-191-1/+1
| | | | | | | | Print the stream value not the pointer to the expression, also use the unsigned format specifier. Cc: 11.1 <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: swap the least-ref'd source into src1 when both const/immIlia Mirkin2016-01-181-10/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The whole point of inlining sources is to reduce loads. We can end up in a situation where one value is used a lot of times, and one value is used only once per instruction. The once-per-instruction one is the one that should get inlined, but with the previous algorithm, it was given no preference. This flips things around to preferring putting less-referenced values into src1 which increases the likelihood of them being inlined. While we're at it, adjust the heuristic to not treat 0 as an immediate, as well as (effectively) check for situations where LIMMs can't be loaded. All this yields improvements on nvc0: total instructions in shared programs : 6261157 -> 6255985 (-0.08%) total gprs used in shared programs : 945082 -> 943417 (-0.18%) total local used in shared programs : 30372 -> 30288 (-0.28%) total bytes used in shared programs : 50089256 -> 50047880 (-0.08%) local gpr inst bytes helped 21 822 3332 3332 hurt 0 278 565 565 And more importantly avoids generating really bad code with SSBOs, where we end up checking a lot of different values (usually immediates) against the length. On nv50 we get comparable results, and even improve packing (bytes went down more than instructions): total instructions in shared programs : 6346564 -> 6341277 (-0.08%) total gprs used in shared programs : 728719 -> 725131 (-0.49%) total local used in shared programs : 3552 -> 3552 (0.00%) total bytes used in shared programs : 43995688 -> 43932928 (-0.14%) local gpr inst bytes helped 0 1380 3252 3774 hurt 0 287 1710 1365 Signed-off-by: Ilia Mirkin <[email protected]>
* st/mesa: restore the stObj's size if it was cleared outIlia Mirkin2016-01-181-0/+6
| | | | | | | | | | | | | An issue could still occur if the base level is set, but fixing that would require a lot more logic. This fixes the recently-failing texelFetch 3D tests because the mipmaps were no longer being generated, which in turn caused the copying logic to be hit, which in turn didn't work because of the broken width/height/depth. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* freedreno/a4xx: use smaller threadsize for more registersRob Clark2016-01-181-2/+5
| | | | | | | | Once we go past half of the "GPR" register file, it seems like we need to run frag shader with smaller threadsize. (The vertex shader already runs at TWO_QUADS, which is the minimum.) Signed-off-by: Rob Clark <[email protected]>
* freedreno: per-generation OUT_IB packetRob Clark2016-01-189-6/+43
| | | | | | | | | | Some a4xx firmware doesn't implement the "PFD" (prefetch-disabled) version of the CP_INDIRECT_BUFFER packet. So allow for PFD vs PFE per generation. Switch a3xx and a4xx over to using prefetch-enabled version (which is also what blob does.. it seems only on a2xx we cannot use PFE). Signed-off-by: Rob Clark <[email protected]>
* gallium: bundle the compat header u_pwr8.h in the tarballEmil Velikov2016-01-181-0/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* mapi: include gl.xml in the tarballEmil Velikov2016-01-181-0/+1
| | | | Signed-off-by: Emil Velikov <[email protected]>
* i965: adding missing headers to the dist tarballEmil Velikov2016-01-181-0/+2
| | | | Signed-off-by: Emil Velikov <[email protected]>
* st/va: add motion adaptive deinterlacing v2Christian König2016-01-184-7/+82
| | | | | | v2: minor cleanup Signed-off-by: Christian König <[email protected]>
* gallium/radeon: Rename do_invalidate_resource to invalidate_bufferMichel Dänzer2016-01-181-4/+6
| | | | | | | And only call it from r600_invalidate_resource for buffer resources. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st/dri: Don't call invalidate_resource for NULL depth/stencil buffersMichel Dänzer2016-01-181-2/+4
| | | | | | | Fixes crash in 4 EGL piglit tests with radeonsi. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Avoid warning about LLVM generating R_0286D0_SPI_PS_INPUT_ADDRMichel Dänzer2016-01-181-0/+3
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* radeonsi: Print "LLVM emitted unknown config register" warning only onceMichel Dänzer2016-01-181-2/+9
| | | | | | | Say "LLVM" instead of "Compiler" for clarity. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: use vpkswss when dst is signedOded Gabbay2016-01-181-16/+15
| | | | | | | | | | | | | | | | | | | This patch fixes a bug when building a pack instruction. For POWER (altivec), in case the destination is signed and the src width is 32, we need to use vpkswss. The original code used vpkuwus, which emits an unsigned result. This fixes the following piglit tests on ppc64le: - spec@arb_color_buffer_float@gl_rgba8-drawpixels - shaders@glsl-fs-fogscale I've also corrected some coding style issues in the function. v2: Returned else statements to vmware style Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* glsl: fix subroutine lowering reusing actual parmatersDave Airlie2016-01-181-5/+19
| | | | | | | | | | | | | | | One of the oglconform tests was crashing here, and it was due to not cloning the actual parameters before creating the new call. This makes a call clone function that does the right things to make sure we clone all the needed info, and points the callee at it. (It differs from ->clone due to this). this may fix https://bugs.freedesktop.org/show_bug.cgi?id=93722, I had this patch in my cts fixes tree, but hadn't had time to make sure I liked it. Cc: "11.0 11.1" <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: remove special case for detecting stream duplicatesTimothy Arceri2016-01-181-5/+0
| | | | | | | Any duplicates in a single declaration will already fail the generic duplicates test due to the explicit_stream flag being set. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* glsl: add missing explicit_stream flag to has_layout()Timothy Arceri2016-01-181-1/+2
| | | | | | | | | This will allow the ARB_shading_language_420pack rules in glsl_parser.yy for catching duplicate layout qualifiers to be triggered for the stream identifier rather than relying on the code meant to catch duplicates within a single layout(...) Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* mesa: fix segfault in glUniformSubroutinesuiv()Timothy Arceri2016-01-181-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | From Section 7.9 (SUBROUTINE UNIFORM VARIABLES) of the OpenGL 4.5 Core spec: "The command void UniformSubroutinesuiv(enum shadertype, sizei count, const uint *indices); will load all active subroutine uniforms for shader stage shadertype with subroutine indices from indices, storing indices[i] into the uniform at location i. The indices for any locations between zero and the value of ACTIVE_SUBROUTINE_UNIFORM_LOCATIONS minus one which are not used will be ignored." V2: simplify NULL check suggested by Jason. Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Cc: "11.0 11.1" [email protected] https://bugs.freedesktop.org/show_bug.cgi?id=93731
* glsl: fix segfault linking subroutine uniform with explicit locationTimothy Arceri2016-01-181-1/+1
| | | | | Reviewed-by: Dave Airlie <[email protected]> Cc: "11.0 11.1" [email protected]
* gm107/ir: don't do indirect frag shader inputs on GM107Ilia Mirkin2016-01-171-0/+1
| | | | | | | | Apparently the IPA op decided to stop working with offsets. Need to figure out if we need to do an AL2P situation or something similar. For now just turn it back off. Signed-off-by: Ilia Mirkin <[email protected]>
* tgsi: initialize Atomic field in tgsi_default_declarationIlia Mirkin2016-01-171-0/+1
| | | | | | | | Spotted by Coverity. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0: bsp_bo can't be nullIlia Mirkin2016-01-171-1/+1
| | | | | | | We already deref it earlier. And these are all allocated on load. Spotted by Coverity. Signed-off-by: Ilia Mirkin <[email protected]>
* llvmpipe: fix arguments order given to vec_andcOded Gabbay2016-01-172-1/+7
| | | | | | | | | | | | | | | | This patch fixes a classic "confuse the enemy" bug. _mm_andnot_si128 (SSE) and vec_andc (VMX) do the same operation, but the arguments are opposite. _mm_andnot_si128 performs "r = (~a) & b" while vec_andc performs "r = a & (~b)" To make sure this error won't return in another place, I added a wrapper function, vec_andnot_si128, in u_pwr8.h, which makes the swap inside. Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* freedreno/ir3: fix mad 3rd src delay calcRob Clark2016-01-171-1/+1
| | | | | | | In fad158a0 ("freedreno/ir3: array rework") the src # (n) shifted by one, but missed updating delay-slot calc. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: better array register allocationRob Clark2016-01-162-9/+51
| | | | | | | Detect arrays which don't conflict with each other and allow overlapping register allocation. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: array offset can be negativeRob Clark2016-01-165-12/+13
| | | | | | | | | | | | | | | | | | | | | | It at least happens with some piglit tests, like $piglit/bin/vp-address-01 VERT DCL IN[0] DCL IN[1] DCL OUT[0], POSITION DCL OUT[1], COLOR DCL CONST[0..7] DCL ADDR[0] 0: ARL ADDR[0].x, IN[1].xxxx 1: MOV_SAT OUT[1], CONST[ADDR[0].x-1] 2: DP4 OUT[0].x, CONST[4], IN[0] 3: DP4 OUT[0].y, CONST[5], IN[0] 4: DP4 OUT[0].z, CONST[6], IN[0] 5: DP4 OUT[0].w, CONST[7], IN[0] 6: END Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: workaround bug/featureRob Clark2016-01-161-0/+9
| | | | | | | | | | Seems like in certain cases, we cannot use c<a0.x+0> as the third src to cat3 instructions. This may be slightly conservative, we may only have this restriction when the first src is also const. This fixes, for example, +24/-0 of the variable-indexing piglit tests. Signed-off-by: Rob Clark <[email protected]>
* ttn: use writemask for store_varRob Clark2016-01-161-26/+2
| | | | | | | Only user is freedreno, and after array-rework it can cope. Avoids generating loads for a store. Signed-off-by: Rob Clark <[email protected]>