summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nir/algebraic: Make an optimization more specificJason Ekstrand2018-12-161-1/+1
| | | | | | | | Later in this series, bool is not going to imply 32-bit. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir: Drop support for lower_b2fJason Ekstrand2018-12-162-7/+1
| | | | | | | | | | This was originally added for the out-of-tree Mali driver but I think we've all agreed it's easy enough for them to just do in their back-end. Reviewed-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir/algebraic: Optimize x2b(xneg(a)) -> aJason Ekstrand2018-12-161-0/+2
| | | | | | | | | | | | | | | Shader-db results on Kaby Lake: total instructions in shared programs: 15072525 -> 15072525 (0.00%) instructions in affected programs: 0 -> 0 helped: 0 HURT: 0 This helps prevent regressions in later commits. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir/constant_folding: Fix source bit size logicJason Ekstrand2018-12-161-1/+2
| | | | | | | | | | | | | Instead of looking at input_sizes[i] which contains the number of components for each source, we look at the bit size of input_types[i]. This fixes a regression in the 1-bit boolean series though I have no idea how we haven't seen it before now. Fixes: 35baee5dce5 "nir/constant_folding: fix incorrect bit-size check" Fixes: 9076c4e289d "nir: update opcode definitions for different bit sizes" Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Tested-by: Bas Nieuwenhuizen <[email protected]>
* nir/tgsi: Use nir_bany in ttn_kill_ifJason Ekstrand2018-12-161-3/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* nir/lower_idiv: Use ilt instead of bit twiddlingJason Ekstrand2018-12-161-1/+1
| | | | | | | | | The previous code was creating a boolean by doing an arithmetic right- shift by 31 which produces a boolean which is true if the argument is negative. This is the same as the expression r < 0 which is much simpler and doesn't depend on NIR's representation of booleans. Reviewed-by: Eric Anholt <[email protected]>
* v3d: Use the original bit size when scalarizing uniform loads.Eric Anholt2018-12-161-1/+2
| | | | | | Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <[email protected]>
* vc4: Use the original bit size when scalarizing uniform loads.Eric Anholt2018-12-161-1/+2
| | | | | | Prevents a regression in jekstrand's 1-bit series. Reviewed-by: Jason Ekstrand <[email protected]>
* ac: split 16-bit ssbo loads that may not be dword alignedRhys Perry2018-12-161-0/+2
| | | | | | | Fixes: 7e7ee826982 ('ac: add support for 16bit buffer loads') Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108114 Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: refactor visit_load_bufferRhys Perry2018-12-162-44/+42
| | | | | | | This is so that we can split different types of loads more easily. Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* nir: fix constness in nir_intrinsic_align()Rhys Perry2018-12-161-1/+1
| | | | | Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* clover: Fix build after clang r348827Jan Vesely2018-12-161-1/+6
| | | | | | | | | | CodeGenOptions were moved to Basic. Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Aaron Watry <[email protected]> Tested-by: Aaron Watry <[email protected]> Reviewed-by: Kai Wasserbäch <[email protected]> CC: [email protected]
* glx: Fix compilation with GLX_USE_WINDOWSGLJon Turney2018-12-151-2/+4
| | | | | | | | | | | | | | | | Sadly, the GLX_USE_APPLEGL and GLX_USE_WINDOWSGL cases are not identical (because GLX_USE_WINDOWSGL uses vtables rather than a maze of ifdefs) Include <sys/time.h> again, as functions prototyped by it are used in the GLX_USE_WINDOWSGL path. Make the include guard around the __glxGetMscRate() definition match the one at it's declaration again, as it's referenced from dri_common.c which is built for GLX_USE_WINDOWSGL. Fixes: a95ec138 ("glx: mandate xf86vidmode only for "drm" dri platforms") Signed-off-by: Jon Turney <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* v3d: Drop in a bunch of notes about performance improvement opportunities.Eric Anholt2018-12-145-2/+74
| | | | | | These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.
* v3d: Do uniform pretty-printing in the QPU dump.Eric Anholt2018-12-143-1/+62
| | | | | If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.
* v3d: Use the uniform pretty-printer in v3d_write_uniforms()'s debug code.Eric Anholt2018-12-141-1/+3
| | | | | This will be a lot easier than my usual "38400.000000? that looks like a viewport scale" decoding strategy.
* v3d: Move uniform pretty-printing to its own helper function.Eric Anholt2018-12-142-71/+77
| | | | I want to reuse it in the QPU dump.
* v3d: Move uinfo->data[] dereference to the top of v3d_write_uniforms().Eric Anholt2018-12-141-15/+13
| | | | | | Follows 3954331aff23 ("vc4: Pull uinfo->data[i] dereference out to the top of the loop.") which showed a large performance win for vc4, but also cleans up the code a decent bit.
* v3d: Avoid assertion failures when removing end-of-shader instructions.Eric Anholt2018-12-141-0/+6
| | | | | | | | | After generating VIR, we leave c->cursor pointing at the end of the shader. If the shader had dead code at the end (for example from preamble instructions in a shader with no side effects), we would assertion fail that we were leaving the cursor pointing at freed memory. Since anything following DCE should be setting up a new cursor anyway, just clear the cursor at the start.
* v3d: Add support for draw indirect for GLES3.1.Eric Anholt2018-12-143-2/+70
| | | | | | In trying to enable compute shaders, I found that a bunch of deqp-gles31's compute stuff wanted to interact with indirect dispatch. This was easy to do on its own.
* v3d: Add missing flagging of SYNCB as a TSY op.Eric Anholt2018-12-141-0/+1
| | | | Fixes: f2e41daac577 ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")
* v3d: Make sure that a thrsw doesn't split a multop from its umul24.Eric Anholt2018-12-141-0/+1
| | | | | | | The thrsw will invalidate rtop, just like accumulators and flags. Caught by simulator assertions in CS imulextended/umulextended tests. Fixes: 90269ba35333 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
* v3d: Add safety checks for resource_create().Eric Anholt2018-12-141-0/+6
| | | | This should ease my debugging next time I screw it up.
* v3d: Add support for texturing from linear.Eric Anholt2018-12-146-3/+110
| | | | | | | Just like vc4, we have to support linear shared BOs for X11 on arbitrary displays. When we're faced with a request to texture from one of those, make a shadow image that we copy using the TFU at the start of the draw call.
* v3d: Add support for using the TFU to do some blits.Eric Anholt2018-12-141-42/+129
| | | | This will be useful in particular for blits from raster to UIF for X11.
* v3d: Don't forget to bump the number of writes when doing TFU ops.Eric Anholt2018-12-141-0/+2
| | | | | | generatemipmap is just filling out the rest of the mipmap that's already been written (by a mapping or a draw call), so it didn't matter. As I reuse the TFU code for linear-to-UIF conversions, it'll start mattering.
* v3d: Set up the right stride for raster TFU.Eric Anholt2018-12-141-1/+1
| | | | | I didn't have any raster images in the generatemipmap path, so the pixels-vs-bytes mixup didn't matter here.
* v3d: Don't forget to wait for our TFU job before rendering from it.Eric Anholt2018-12-141-0/+8
| | | | | | | | Otherwise we may race to read old contents. This didn't show up in the CTS and piglit for me, but it did once I started using the TFU to do linear->UIF blits for X11. Fixes: 2ebca177dc18 ("v3d: Use the TFU to do generatemipmap.")
* nvc0: always keep TSC slot 0 bound to fix TXFIlia Mirkin2018-12-142-0/+21
| | | | | | | | | | | | Same as on nv50, the TXF op always uses the TSC bound to slot 0, returning blank values if nothing is bound. An earlier change arranges for the TSC entries list to always have valid data at entry 0, so here we just make use of it. Fixes arb_texture_buffer_object-subdata-sync among others. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: replace use of explicit default_tsc with entry 0Ilia Mirkin2018-12-146-22/+25
| | | | | | | | | | | This was used for implementing FBFETCH. However that uses TXF, which doesn't do much with a TSC. The only important bit is that sRGB-decoding works as expected, which we can achieve since all samplers we ever generate enable sRGB-decoding. Always point to entry 0 in the TSC table, and ensure that even before it ever gets initialized, the sRGB-decoding enable bit is set. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a6xx: fix corrupted uniformsRob Clark2018-12-141-1/+2
| | | | | | | | | | For older gen's fd_wfi() is used to conditionally insert a WFI if there hasn't already been one since last draw. But this doesn't work out well with stateobj since the order the stateobj is evaluated might not be what you expect. (Ie. stateobj might not be evaluated until a later draw if there is no geometry from the current draw in a given tile.) Signed-off-by: Rob Clark <[email protected]>
* pci_ids: add new vega20 pci idAlex Deucher2018-12-141-0/+1
| | | | | | Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* pci_ids: add new vega10 pci idsAlex Deucher2018-12-141-1/+7
| | | | | | Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
* i965/gen9: Add workarounds for object preemption.Rafael Antognolli2018-12-141-0/+63
| | | | | | | | | | | | | | | | | | | | | Gen9 hardware requires some workarounds to disable preemption depending on the type of primitive being emitted. We implement this by adding a function that checks the primitive type and number of instances right before the 3DPRIMITIVE. For now, we just ignore blorp. The only primitive it emits is 3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can safely leave preemption enabled when it happens. Or it will be disabled by a previous 3DPRIMITIVE, which should be fine too. v3: - Apply missing workarounds for instanced rendering and line loop (Ken) - Move workaround code to brw_draw_single_prim() Signed-off-by: Rafael Antognolli <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen10+: Enable object level preemption.Rafael Antognolli2018-12-144-1/+36
| | | | | | | | | | | | Set bit when initializing context. v3: - Always toggle preemption bool to false before enabling it for the first time, so the state gets emitted (Chris Wilson). - Emit end of pipe sync with PIPE_CONTROL_RENDER_TARGET_FLUSH (Ken) Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/genxml: Add register for object preemption.Rafael Antognolli2018-12-143-0/+24
| | | | | Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* util/slab: Rename slab_mempool typed parameters to mempoolIan Romanick2018-12-142-14/+14
| | | | | | | Now everything with type 'struct slab_child_pool *' is name pool, and everything with type 'struct slab_mempool *' is named mempool. Signed-off-by: Ian Romanick <[email protected]>
* nir/phi_builder: Internal users should use ↵Ian Romanick2018-12-141-2/+2
| | | | | | | nir_phi_builder_value_set_block_def too Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* etnaviv: drop redundant ctx function parameterChristian Gmeiner2018-12-141-4/+3
| | | | | | | | There is no need to have an extra ctx paramter as all the other parameters carry all the needed information. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Lucas Stach <[email protected]>
* genxml: Consistently use a numeric "MOCS" fieldKenneth Graunke2018-12-1416-260/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When we first started using genxml, we decided to represent MOCS as an actual structure, and pack values. However, in many places, it was more convenient to use a numeric value rather than treating it as a struct, so we added secondary setters in a bunch of places as well. We were not entirely consistent, either. Some places only had one. Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens only had the struct-based setters. The names were sometimes "Constant Buffer Object Control State" instead of "Memory", making it harder to find. Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer packet...which is a bit redundant. On modern hardware, MOCS is simply an index into a table, but we were still carrying around the structure with an "Index to MOCS Table" field, in addition to the direct numeric setters. This is clunky - we really just want a number on new hardware. This patch eliminates the struct-based setters, and makes the numeric setters be consistently called "MOCS". We leave the struct definition around on Gen7-8 for reference purposes, but it is unused. v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9 Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir: fix opt_if_loop_last_continue()Timothy Arceri2018-12-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | The pass did not correctly handle loops ending in: if ssa_7 { block block_8: /* preds: block_7 */ continue /* succs: block_1 */ } else { block block_9: /* preds: block_7 */ break /* succs: block_11 */ } The break will get eliminated by another opt but if this pass gets called first (as it does on RADV) we ended up inserting instructions after the break. Fixes: 5921a19d4b0c ("nir: add if opt opt_if_loop_last_continue()") Reviewed-by: Dave Airlie <[email protected]>
* freedreno/a6xx: fix resource_copy_region()Rob Clark2018-12-131-9/+24
| | | | | | | | | | | | | | pctx->resource_copy_region() needs to fall back to sw copy for non-renderable formats. But previously for things that we could not use the blitter for, would fall back to 3d. Which won't work if 3d can't render to the dst format either. Instead rework things to fallback to fd_resource_copy_region(), which will try 3d core and then fall back to memcpy(). Fixes (for example) dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot Signed-off-by: Rob Clark <[email protected]>
* freedreno: move fd_resource_copy_region()Rob Clark2018-12-133-62/+73
| | | | | | Code-motion prep for next patch. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: more blitter fixesRob Clark2018-12-131-10/+22
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2018-12-138-30/+39
| | | | Signed-off-by: Rob Clark <[email protected]>
* gallium/aux: add is_unorm() helperRob Clark2018-12-132-0/+24
| | | | | | We already had one for is_snorm() but not unorm. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: fix blitter crashRob Clark2018-12-131-0/+17
| | | | | | | | | | Fixes a crash with unsupported formats in dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot Also fixes gpu hangs with some formats that are supported, but which we don't know what internal-format to use for the blitter, for ex dEQP-GLES3.functional.texture.format.sized.2d_array.rgb10_a2_pot Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: don't remove unused input componentsRob Clark2018-12-131-1/+7
| | | | | Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix crashRob Clark2018-12-131-14/+8
| | | | | | | Fixes a crash in dEQP-GLES3.functional.shaders.fragdepth.compare.fragcoord_z Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <[email protected]>
* freedreno: also set DUMP flag on shadersRob Clark2018-12-135-20/+22
| | | | | | | | If we emit shader as a pointer to a GEM object, also set the RELOC_DUMP flag as a hint to kernel that this is a useful buffer to snapshot for debug dumps. Signed-off-by: Rob Clark <[email protected]>