summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* tgsi: fix comment typo in tgsi_ureg.cBrian Paul2016-10-131-1/+1
| | | | Trivial.
* vc4: Avoid loading from the texture during non-utile-aligned glTexImage().Eric Anholt2016-10-131-12/+34
| | | | | | | | | | | | | Previously, the plan was "if the width/height we have to load/store isn't the size the user is planning on writing, then we need to load the old contents out beforehand to prevent writing back undefined". However, when we're doing glTexImage() we often end up aligning the width/height into the padding of the texture, and we don't actually need to read out that padding. Improves x11perf -aatrapezoid100 performance from ~460/sec to ~700/sec.
* st/nine: Fix possible segfault in surface ctorAxel Davy2016-10-131-2/+2
| | | | | | | | | | | | | | | | Regression introduced by ba0274c7d6c3b77a36bbe1b444f427b0c873e2f3 Check the resource exists before assigning it a flag (and use This->base.resource instead of pResource, since the former may have a newly allocate resource, while the latter would be NULL). This should reintroduce the behaviour of previous code. Signed-off-by: Axel Davy <[email protected]>
* st/nine: Remove useless code in nine_shaderAxel Davy2016-10-131-5/+0
| | | | | | | | | | | | | Since 1604efa6fda9b780e8537a131ad77f3e83e5a67a, lconsti and lconstb don't need to be initialized. Remove some leftovers from the previous code (which has now invalid use of ARRAY_SIZE on a pointer instead of an array). Reported by Coverity. Signed-off-by: Axel Davy <[email protected]>
* gallium/os: Use unsigned integers for size computationAxel Davy2016-10-131-2/+2
| | | | | | | | Use uint64_t instead of int64_t in the calculation, as the result is uint64_t. Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nvc0: enable ARB_enhanced_layoutsSamuel Pitoiset2016-10-131-1/+1
| | | | | | | | All ARB_enhanced_layouts piglit tests pass without any changes in our compiler. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: adjust and clean up Z_ORDER and EXEC_ON_x settingsMarek Olšák2016-10-132-22/+32
| | | | | | | The table was copied from the Vulkan driver. The comment lines are as long as the table for cosmetic reasons. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable ReZMarek Olšák2016-10-131-7/+4
| | | | | | | | | This is a serious performance fix. Discovered by luck. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94354 Cc: 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement TC-compatible HTILEMarek Olšák2016-10-139-24/+185
| | | | | | | | | | | | | | | | | | | | | so that decompress blits aren't needed and depth texturing needs less memory bandwidth. Z16 and Z24 are promoted to Z32_FLOAT by the driver, because TC-compatible HTILE only supports Z32_FLOAT. This doubles memory footprint for Z16. The format promotion is not visible to state trackers. This is part of TC-compatible renderbuffer compression, which has 3 parts: DCC, HTILE, FMASK. Only TC-compatible FMASK compression is missing now. I don't see a measurable increase in performance though. (I tested Talos Principle and DiRT: Showdown, the latter is improved by 0.5%, which is almost noise, and it originally used layered Z16, so at least we know that Z16 promoted to Z32F isn't slower now) Tested-by: Edmondo Tommasina <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium: add PIPE_RESOURCE_FLAG_TEXTURING_MORE_LIKELYMarek Olšák2016-10-131-0/+1
| | | | | | | | | | | | | | For performance tuning in drivers. It filters out window system framebuffers and OpenGL renderbuffers. radeonsi will use this to guess whether a depth buffer will be read by a shader. There is no guarantee about what will actually happen. This is a departure from PIPE_BIND flags which are defined to be strict but they are useless in practice. Acked-by: Roland Scheidegger <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix regression in image atomicsNicolai Hähnle2016-10-131-1/+1
| | | | Caused by a bad rebase when pushing commit 76a940893.
* radeonsi: fix the coordinate overloading of llvm.amdgcn.image.atomic.cmpswap.*Nicolai Hähnle2016-10-131-2/+7
| | | | | | | Fixes GL45-CTS.shader_image_load_store.basic-allTargets-atomic* Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* swr: automake: add ar_eventhandlerfile_h.template to the tarballEmil Velikov2016-10-121-1/+2
| | | | Signed-off-by: Emil Velikov <[email protected]>
* nvc0/ir: fix textureGather with a single offsetIlia Mirkin2016-10-121-2/+2
| | | | | | | | | | | Recent fix for non-const offsets broke the case of a single offset (vs 4 offsets). The later code relies on the offs array to contain null values to tell whether they should be added onto the srcs list. Fixes: 5239bd592 ("nvc0/ir: fix overwriting of value backing non-constant gather offset") Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* nv50/ir: copy over value's register id when resolving merge of a phiIlia Mirkin2016-10-121-1/+3
| | | | | | | | | | The offset needs to be properly copied over to the phi value, otherwise it will get assigned to the base of the merge instead of the proper location. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: [email protected]
* st/mesa: enable ARB_enhanced_layouts and turn the cap onNicolai Hähnle2016-10-123-3/+3
| | | | | | | v2: mark llvmpipe & softpipe properly as well (Jason Wood) Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* tgsi/ureg: add ureg_DECL_output_layoutNicolai Hähnle2016-10-122-13/+38
| | | | | | | | | For specifying an exact location/component. v2: change the order of parameters (Dave) Reviewed-by: Edward O'Callaghan <[email protected]> (v1) Reviewed-by: Dave Airlie <[email protected]> (v1)
* tgsi/ureg: add layout/component input declarationsNicolai Hähnle2016-10-122-12/+76
| | | | | | | v2: change the order of parameters (Dave) Reviewed-by: Edward O'Callaghan <[email protected]> (v1) Reviewed-by: Dave Airlie <[email protected]> (v1)
* tgsi/scan: fix num_inputs/num_outputs for shaders with overlapping arraysNicolai Hähnle2016-10-121-8/+2
| | | | | | | v2: remove a tautological left-over assert (Marek) Reviewed-by: Edward O'Callaghan <[email protected]> (v1) Reviewed-by: Dave Airlie <[email protected]> (v1)
* gallium: add PIPE_CAP_TGSI_ARRAY_COMPONENTSNicolai Hähnle2016-10-1217-0/+24
| | | | | | | | This is a screen cap because drivers are expected to support it either for all shader types or for none of them. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: Use the new image load/store intrinsic signaturesTom Stellard2016-10-121-14/+45
| | | | | | This patch requires LLVM r284024 or newer. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Add function for converting LLVM type to intrinsic stringTom Stellard2016-10-121-10/+32
| | | | | | The existing function only worked for integer types. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: Refactor image store/load intrinsic name creationTom Stellard2016-10-121-11/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* winsys/amdgpu: fix infinite loop w/ RADEON_NOOP=1 caused by unsubmitted fencesMarek Olšák2016-10-121-2/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix R600_DEBUG=precompile for shader-dbMarek Olšák2016-10-121-0/+6
| | | | | | | radeonsi no longer supports pixel shaders without interpolation optimizations, which led to assertion failures in si_shader_ps when running shader-db. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use TC write-back instead of full cache invalidationMarek Olšák2016-10-123-13/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: implement TC L2 write-back (flush) without cache invalidationMarek Olšák2016-10-122-28/+74
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't invalidate VMEM L1 for memory barriers for index buffersMarek Olšák2016-10-121-3/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50/ir: optimize ADD(SHL(a, b), c) to SHLADD(a, b, c)Samuel Pitoiset2016-10-121-0/+87
| | | | | | | | | | | | | total instructions in shared programs :2286901 -> 2284473 (-0.11%) total gprs used in shared programs :335256 -> 335273 (0.01%) total local used in shared programs :31968 -> 31968 (0.00%) local gpr inst bytes helped 0 41 852 852 hurt 0 44 23 23 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* draw: initialize shader inputsRoland Scheidegger2016-10-121-0/+7
| | | | | | | | | This should make the code more robust if a shader tries to use inputs which aren't defined by the vertex element layout (which usually shouldn't happen). No piglit change. Reviewed-by: Brian Paul <[email protected]>
* trace: add invalidate_resource callbackIlia Mirkin2016-10-111-0/+21
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: emit TA_CS_BC_BASE_ADDR on SI only if the kernel allows itMarek Olšák2016-10-111-1/+6
| | | | | | | Reviewed-by: Edmondo Tommasina <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: [rasterizer archrast] update proto fileTim Rowley2016-10-111-2/+56
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer archrast] add support for stats filesTim Rowley2016-10-114-20/+57
| | | | | | Only stat and counter events are saved to the event files. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] remove architecture overrideTim Rowley2016-10-111-41/+1
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] adjust jitmanager assertTim Rowley2016-10-111-1/+4
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer] eliminate unused label warnings on gccTim Rowley2016-10-112-0/+8
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] implement depth bounds testTim Rowley2016-10-116-9/+101
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] update/add formatsTim Rowley2016-10-117-1705/+2592
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] SwrStoreTiles api changeTim Rowley2016-10-117-19/+27
| | | | | | | SwrStoreTiles now takes a mask of surfaces to store. Reduces overhead when storing multiple render targets. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer scripts] add ENABLE_ASSERT_DIALOGS knob for windowsTim Rowley2016-10-111-0/+8
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer archrast] add mako templateTim Rowley2016-10-114-2/+117
| | | | | | Add template for generating code to save events to a file. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] disable cull for rect_listTim Rowley2016-10-111-0/+8
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] add support for "RAW" surface formatTim Rowley2016-10-112-0/+29
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] align Macrotile FIFO memory to SIMD sizeTim Rowley2016-10-114-9/+23
| | | | | | | | | Align and use streaming store instructions for BE fifo queues. Provides slightly faster enqueue and doesn't pollute the caches. Add appropriate memory fences to ensure streaming writes are globally visible. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer common] remove threadviz codeTim Rowley2016-10-113-94/+0
| | | | Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer memory] split load/store for compile speedTim Rowley2016-10-1117-1836/+2290
| | | | Signed-off-by: Tim Rowley <[email protected]>
* i915g: fix incorrect gl_FragCoord valueNicholas Bishop2016-10-101-1/+6
| | | | | | | | | | | | | | | | | | | | | | | On Intel Pineview M hardware, the i915 gallium driver doesn't output the correct gl_FragCoord. It seems to always have an X coord of 0.0 and a Y coord of the window's height in pixels, e.g. 600.0f or such. I believe this is a regression caused in part by this commit: afa035031ff9e0c07a2297d864e46c76f7bfff58 The old behavior used the output at index zero, while the new behavior uses actual zeroes. In the case of gl_FragCoord the output at index zero happened to be the correct one, so the behavior appeared correct although the code already had a bug. Fixed by checking for I915_SEMANTIC_POS when setting up texCoords. If the generic_mapping is I915_SEMANTIC_POS, look for the TGSI_SEMANTIC_POSITION instead of a TGSI_SEMANTIC_GENERIC output. https://bugs.freedesktop.org/show_bug.cgi?id=97477 Reviewed-by: Stéphane Marchesin <[email protected]> Tested-by: Stéphane Marchesin <[email protected]>
* st/nine: More checks for GetRenderTargetDataAxel Davy2016-10-101-0/+2
| | | | | | | Fixes a wine test crash Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Patrick Rudolph <[email protected]>
* st/nine: Add debug output for lost devicesPatrick Rudolph2016-10-101-0/+2
| | | | | | | Add debug output to ease debugging. Signed-off-by: Patrick Rudolph <[email protected]> Reviewed-by: Axel Davy <[email protected]>