summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Avoid loading from the texture during non-utile-aligned glTexImage().Eric Anholt2016-10-131-12/+34
| | | | | | | | | | | | | Previously, the plan was "if the width/height we have to load/store isn't the size the user is planning on writing, then we need to load the old contents out beforehand to prevent writing back undefined". However, when we're doing glTexImage() we often end up aligning the width/height into the padding of the texture, and we don't actually need to read out that padding. Improves x11perf -aatrapezoid100 performance from ~460/sec to ~700/sec.
* st/nine: Fix possible segfault in surface ctorAxel Davy2016-10-131-2/+2
| | | | | | | | | | | | | | | | Regression introduced by ba0274c7d6c3b77a36bbe1b444f427b0c873e2f3 Check the resource exists before assigning it a flag (and use This->base.resource instead of pResource, since the former may have a newly allocate resource, while the latter would be NULL). This should reintroduce the behaviour of previous code. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Remove useless code in nine_shaderAxel Davy2016-10-131-5/+0
| | | | | | | | | | | | | Since 1604efa6fda9b780e8537a131ad77f3e83e5a67a, lconsti and lconstb don't need to be initialized. Remove some leftovers from the previous code (which has now invalid use of ARRAY_SIZE on a pointer instead of an array). Reported by Coverity. Signed-off-by: Axel Davy <[email protected]>
* gallium/os: Use unsigned integers for size computationAxel Davy2016-10-131-2/+2
| | | | | | | | Use uint64_t instead of int64_t in the calculation, as the result is uint64_t. Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nvc0: enable ARB_enhanced_layoutsSamuel Pitoiset2016-10-131-1/+1
| | | | | | | | All ARB_enhanced_layouts piglit tests pass without any changes in our compiler. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radv: fix the wayland wsi busy bitDave Airlie2016-10-141-1/+1
| | | | Signed-off-by: Dave Airlie <[email protected]>
* anv: fix the wayland wsi busy flag settingDave Airlie2016-10-141-1/+1
| | | | | Cc: "12.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: Use new image load/store intrinsic signatures v2Tom Stellard2016-10-141-25/+108
| | | | | | | | | | These were changed in LLVM r284024. v2: - Only use float types for vdata of llvm.amdgcn.image.store. LLVM doesn't support integer types for this intrinsic. Signed-off-by: Dave Airlie <[email protected]>
* radv: Fix incorrect commentTom Stellard2016-10-141-2/+2
| | | | Signed-off-by: Dave Airlie <[email protected]>
* radv: fix identity swizzle handlingDave Airlie2016-10-141-8/+10
| | | | | | | | | | The identity swizzle should operate exactly like an .r = R, .g = G, .b = B, .a = A swizzle. This fixes a bunch of the 16-bit BGRA blit tests dEQP-VK.api.copy_and_blit.blit_image.all_formats.b4g4r4a4* Signed-off-by: Dave Airlie <[email protected]>
* anv/wsi: fix apps that acquire multiple images up frontDave Airlie2016-10-142-0/+2
| | | | | | | | | | | | This fix was found in the radv codebase when running dota2, no idea if anyone has reported it on anv, but the same problem occurs. Once an image is acquired we need to mark it busy. Acked-by: Edward O'Callaghan <[email protected]> Cc: "12.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv/wsi: fix app that acquire multiple images up frontDave Airlie2016-10-142-0/+2
| | | | | | | | | | | | | | dota2 does multiple acquires followed by multiple queues, this bug manifested itself as a hang in the xshmfence code randomly when dota2 was doing it's menus. It also occured when running dota2 under phoronix-test-suite. The fix is once the image is acquired to mark it busy then so nobody else can acquire. We have to trust vulkan apps that they will eventually submit it. Acked-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* anv: initialise and increment send_sbcDave Airlie2016-10-141-0/+2
| | | | | | | At least set this to not be uninitialised memory. Cc: "12.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settingsMarek Olšák2016-10-132-22/+32
| | | | | | | The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable ReZMarek Olšák2016-10-131-7/+4
| | | | | | | | | This is a serious performance fix. Discovered by luck. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354 Cc: 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement TC-compatible HTILEMarek Olšák2016-10-139-24/+185
| | | | | | | | | | | | | | | | | | | | | so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_RESOURCE_FLAG_TEXTURING_MORE_LIKELYMarek Olšák2016-10-132-1/+3
| | | | | | | | | | | | | | For performance tuning in drivers. It filters out window system framebuffers and OpenGL renderbuffers. radeonsi will use this to guess whether a depth buffer will be read by a shader. There is no guarantee about what will actually happen. This is a departure from PIPE_BIND flags which are defined to be strict but they are useless in practice. Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix regression in image atomicsNicolai Hähnle2016-10-131-1/+1
| | | | Caused by a bad rebase when pushing commit 76a940893.
* st/mesa: fix vertex elements setup for doublesNicolai Hähnle2016-10-131-48/+50
| | | | | | | | | | | | Whether one or two slots are taken up by one API array depends on the vertex shader, not on how the array is configured. When an array is set up with fewer components than the shader expects, the high components are undefined. Fixes GL45-CTS.vertex_attrib_binding.basic-inputL-case1. Cc: [email protected] Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: remove unnecessary ir_instruction argument from get_opcodeNicolai Hähnle2016-10-131-3/+3
| | | | | | Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: fix textureGatherOffset with indirectly loaded offsetsNicolai Hähnle2016-10-131-1/+17
| | | | | | | Cc: [email protected] Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: simplify translate_tex_offsetNicolai Hähnle2016-10-131-50/+14
| | | | | | | | | | | | This fixes a bug with offsets from uniforms which seems to have only been noticed as a crash in piglit's arb_gpu_shader5/compiler/builtin-functions/fs-gatherOffset-uniform-offset.frag on radeonsi. Cc: [email protected] Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.*Nicolai Hähnle2016-10-131-2/+7
| | | | | | | Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic* Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radv: Return correct result in EnumeratePhysicalDevicesNicolas Koch2016-10-131-0/+2
| | | | | | | | | If pPhysicalDevices is too small for all physical devices, the driver must return VK_INCOMPLETE. Since only a single physical device is supported, this is only the case when pPhysicalDeviceCount == 0 && pPhysicalDevices != NULL. Signed-off-by: Dave Airlie <[email protected]>
* st/mesa: only flip stipple pattern for winsys fbo'sIlia Mirkin2016-10-121-3/+7
| | | | | | | | | | | | | Gallium is completely oblivious to whether the fbo is flipped or not. Only flip the stipple pattern when the fbo is flipped as well. Otherwise the driver has no idea when to unflip the pattern. Fixes bin/gl-2.1-polygon-stipple-fs -fbo Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* swr: automake: add ar_eventhandlerfile_h.template to the tarballEmil Velikov2016-10-121-1/+2
| | | | Signed-off-by: Emil Velikov <[email protected]>
* radv: add all headers to the sources listEmil Velikov2016-10-121-1/+12
| | | | | | Otherwise they'll be missing from the tarball and the build will fail. Signed-off-by: Emil Velikov <[email protected]>
* nvc0/ir: fix textureGather with a single offsetIlia Mirkin2016-10-121-2/+2
| | | | | | | | | | | Recent fix for non-const offsets broke the case of a single offset (vs 4 offsets). The later code relies on the offs array to contain null values to tell whether they should be added onto the srcs list. Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset") Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: copy over value's register id when resolving merge of a phiIlia Mirkin2016-10-121-1/+3
| | | | | | | | | | The offset needs to be properly copied over to the phi value, otherwise it will get assigned to the base of the merge instead of the proper location. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* st/mesa: enable ARB_enhanced_layouts and turn the cap onNicolai Hähnle2016-10-124-3/+10
| | | | | | | v2: mark llvmpipe & softpipe properly as well (Jason Wood) Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: adjust swizzles and writemasks for explicit componentsNicolai Hähnle2016-10-121-19/+49
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: explicitly track all input and output declarationNicolai Hähnle2016-10-121-154/+171
| | | | | | | | | | | | | In order to be able to emit overlapping input and output array declarations, we flip the logic of emitting those declarations on its head: rather than iterating over slots and emitting the corresponding declarations, we iterate over the declarations from GLSL and emit those. v2: fix some regressions related to structs v3: fix a regression in geometry and tessellation shader array handling Acked-by: Edward O'Callaghan <[email protected]> (v2) Reviewed-by: Dave Airlie <[email protected]> (v2)
* st/glsl_to_tgsi: mark "gaps" in input/output arrays as usedNicolai Hähnle2016-10-121-8/+24
| | | | | | | | | | | | | | | | In some cases, a shader may have an input/output array but not use some entries in the middle. This happens with eON games, for example. We emit declarations that cover the entire array range even if there are some unused gaps. This patch now reflects that in the InputsRead etc. fields to ensure the various input/outputMapping arrays are actually correct, which will be important when we re-jiggle the way declarations are emitted. v2: fix a typo (Edward O'Callaghan) Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: disable on-the-fly peephole for 64-bit operationsNicolai Hähnle2016-10-121-0/+4
| | | | | | | | | | | | | | | | | | | | | This optimization is incorrect with 64-bit operations, because the channel-splitting logic in emit_asm ends up being applied twice to the source operands. A lucky coincidence of how the writemask test works resulted in this optimization basically never being applied anyway. As far as I can tell, the only case where it would (incorrectly) have been applied is something like dvec2 d; float x = (float)d.y; which nobody seems to have ever done. But the moral equivalent does occur in one of the component layout piglit test. Cc: [email protected] Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: simpler fixup of empty writemasksNicolai Hähnle2016-10-121-27/+10
| | | | | | | | | Empty writemasks mean "copy everything", so we can always just use the number of vector elements (which uses the GLSL meaning here, i.e. each double is a single element/writemask bit). Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/glsl_to_tgsi: explicit handling of writemask for depth/stencil exportNicolai Hähnle2016-10-121-8/+17
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* glsl: dump explicit location when printing IRNicolai Hähnle2016-10-121-3/+7
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* tgsi/ureg: add ureg_DECL_output_layoutNicolai Hähnle2016-10-122-13/+38
| | | | | | | | | For specifying an exact location/component. v2: change the order of parameters (Dave) Reviewed-by: Edward O'Callaghan <[email protected]> (v1) Reviewed-by: Dave Airlie <[email protected]> (v1)
* tgsi/ureg: add layout/component input declarationsNicolai Hähnle2016-10-122-12/+76
| | | | | | | v2: change the order of parameters (Dave) Reviewed-by: Edward O'Callaghan <[email protected]> (v1) Reviewed-by: Dave Airlie <[email protected]> (v1)
* tgsi/scan: fix num_inputs/num_outputs for shaders with overlapping arraysNicolai Hähnle2016-10-121-8/+2
| | | | | | | v2: remove a tautological left-over assert (Marek) Reviewed-by: Edward O'Callaghan <[email protected]> (v1) Reviewed-by: Dave Airlie <[email protected]> (v1)
* gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTSNicolai Hähnle2016-10-1217-0/+24
| | | | | | | | This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: Use the new image load/store intrinsic signaturesTom Stellard2016-10-121-14/+45
| | | | | | This patch requires LLVM r284024 or newer. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Add function for converting LLVM type to intrinsic stringTom Stellard2016-10-121-10/+32
| | | | | | The existing function only worked for integer types. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Refactor image store/load intrinsic name creationTom Stellard2016-10-121-11/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: fix infinite loop w/ RADEON_NOOP=1 caused by unsubmitted fencesMarek Olšák2016-10-121-2/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix R600_DEBUG=precompile for shader-dbMarek Olšák2016-10-121-0/+6
| | | | | | | radeonsi no longer supports pixel shaders without interpolation optimizations, which led to assertion failures in si_shader_ps when running shader-db. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use TC write-back instead of full cache invalidationMarek Olšák2016-10-123-13/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement TC L2 write-back (flush) without cache invalidationMarek Olšák2016-10-122-28/+74
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't invalidate VMEM L1 for memory barriers for index buffersMarek Olšák2016-10-121-3/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)Samuel Pitoiset2016-10-121-0/+87
| | | | | | | | | | | | | total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>