summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* clover: Fix build after clang r348827Jan Vesely2018-12-161-1/+6
| | | | | | | | | | CodeGenOptions were moved to Basic. Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Aaron Watry <[email protected]> Tested-by: Aaron Watry <[email protected]> Reviewed-by: Kai Wasserbäch <[email protected]> CC: [email protected]
* glx: Fix compilation with GLX_USE_WINDOWSGLJon Turney2018-12-151-2/+4
| | | | | | | | | | | | | | | | Sadly, the GLX_USE_APPLEGL and GLX_USE_WINDOWSGL cases are not identical (because GLX_USE_WINDOWSGL uses vtables rather than a maze of ifdefs) Include <sys/time.h> again, as functions prototyped by it are used in the GLX_USE_WINDOWSGL path. Make the include guard around the __glxGetMscRate() definition match the one at it's declaration again, as it's referenced from dri_common.c which is built for GLX_USE_WINDOWSGL. Fixes: a95ec138 ("glx: mandate xf86vidmode only for "drm" dri platforms") Signed-off-by: Jon Turney <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* v3d: Drop in a bunch of notes about performance improvement opportunities.Eric Anholt2018-12-145-2/+74
| | | | | | These have all been floating in my head, and while I've thought about encoding them in issues on gitlab once they're enabled, they also make sense to just have in the area of the code you'll need to work in.
* v3d: Do uniform pretty-printing in the QPU dump.Eric Anholt2018-12-143-1/+62
| | | | | If you're trying to trace what's going on in a QPU dump, this will definitely help you find your way.
* v3d: Use the uniform pretty-printer in v3d_write_uniforms()'s debug code.Eric Anholt2018-12-141-1/+3
| | | | | This will be a lot easier than my usual "38400.000000? that looks like a viewport scale" decoding strategy.
* v3d: Move uniform pretty-printing to its own helper function.Eric Anholt2018-12-142-71/+77
| | | | I want to reuse it in the QPU dump.
* v3d: Move uinfo->data[] dereference to the top of v3d_write_uniforms().Eric Anholt2018-12-141-15/+13
| | | | | | Follows 3954331aff23 ("vc4: Pull uinfo->data[i] dereference out to the top of the loop.") which showed a large performance win for vc4, but also cleans up the code a decent bit.
* v3d: Avoid assertion failures when removing end-of-shader instructions.Eric Anholt2018-12-141-0/+6
| | | | | | | | | After generating VIR, we leave c->cursor pointing at the end of the shader. If the shader had dead code at the end (for example from preamble instructions in a shader with no side effects), we would assertion fail that we were leaving the cursor pointing at freed memory. Since anything following DCE should be setting up a new cursor anyway, just clear the cursor at the start.
* v3d: Add support for draw indirect for GLES3.1.Eric Anholt2018-12-143-2/+70
| | | | | | In trying to enable compute shaders, I found that a bunch of deqp-gles31's compute stuff wanted to interact with indirect dispatch. This was easy to do on its own.
* v3d: Add missing flagging of SYNCB as a TSY op.Eric Anholt2018-12-141-0/+1
| | | | Fixes: f2e41daac577 ("broadcom/vc5: Update QPU instruction pack/unpack for v4.2.")
* v3d: Make sure that a thrsw doesn't split a multop from its umul24.Eric Anholt2018-12-141-0/+1
| | | | | | | The thrsw will invalidate rtop, just like accumulators and flags. Caught by simulator assertions in CS imulextended/umulextended tests. Fixes: 90269ba35333 ("broadcom/vc5: Use THRSW to enable multi-threaded shaders.")
* v3d: Add safety checks for resource_create().Eric Anholt2018-12-141-0/+6
| | | | This should ease my debugging next time I screw it up.
* v3d: Add support for texturing from linear.Eric Anholt2018-12-146-3/+110
| | | | | | | Just like vc4, we have to support linear shared BOs for X11 on arbitrary displays. When we're faced with a request to texture from one of those, make a shadow image that we copy using the TFU at the start of the draw call.
* v3d: Add support for using the TFU to do some blits.Eric Anholt2018-12-141-42/+129
| | | | This will be useful in particular for blits from raster to UIF for X11.
* v3d: Don't forget to bump the number of writes when doing TFU ops.Eric Anholt2018-12-141-0/+2
| | | | | | generatemipmap is just filling out the rest of the mipmap that's already been written (by a mapping or a draw call), so it didn't matter. As I reuse the TFU code for linear-to-UIF conversions, it'll start mattering.
* v3d: Set up the right stride for raster TFU.Eric Anholt2018-12-141-1/+1
| | | | | I didn't have any raster images in the generatemipmap path, so the pixels-vs-bytes mixup didn't matter here.
* v3d: Don't forget to wait for our TFU job before rendering from it.Eric Anholt2018-12-141-0/+8
| | | | | | | | Otherwise we may race to read old contents. This didn't show up in the CTS and piglit for me, but it did once I started using the TFU to do linear->UIF blits for X11. Fixes: 2ebca177dc18 ("v3d: Use the TFU to do generatemipmap.")
* nvc0: always keep TSC slot 0 bound to fix TXFIlia Mirkin2018-12-142-0/+21
| | | | | | | | | | | | Same as on nv50, the TXF op always uses the TSC bound to slot 0, returning blank values if nothing is bound. An earlier change arranges for the TSC entries list to always have valid data at entry 0, so here we just make use of it. Fixes arb_texture_buffer_object-subdata-sync among others. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: replace use of explicit default_tsc with entry 0Ilia Mirkin2018-12-146-22/+25
| | | | | | | | | | | This was used for implementing FBFETCH. However that uses TXF, which doesn't do much with a TSC. The only important bit is that sRGB-decoding works as expected, which we can achieve since all samplers we ever generate enable sRGB-decoding. Always point to entry 0 in the TSC table, and ensure that even before it ever gets initialized, the sRGB-decoding enable bit is set. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/a6xx: fix corrupted uniformsRob Clark2018-12-141-1/+2
| | | | | | | | | | For older gen's fd_wfi() is used to conditionally insert a WFI if there hasn't already been one since last draw. But this doesn't work out well with stateobj since the order the stateobj is evaluated might not be what you expect. (Ie. stateobj might not be evaluated until a later draw if there is no geometry from the current draw in a given tile.) Signed-off-by: Rob Clark <[email protected]>
* i965/gen9: Add workarounds for object preemption.Rafael Antognolli2018-12-141-0/+63
| | | | | | | | | | | | | | | | | | | | | Gen9 hardware requires some workarounds to disable preemption depending on the type of primitive being emitted. We implement this by adding a function that checks the primitive type and number of instances right before the 3DPRIMITIVE. For now, we just ignore blorp. The only primitive it emits is 3DPRIM_RECTLIST, and since it's not listed in the workarounds, we can safely leave preemption enabled when it happens. Or it will be disabled by a previous 3DPRIMITIVE, which should be fine too. v3: - Apply missing workarounds for instanced rendering and line loop (Ken) - Move workaround code to brw_draw_single_prim() Signed-off-by: Rafael Antognolli <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen10+: Enable object level preemption.Rafael Antognolli2018-12-144-1/+36
| | | | | | | | | | | | Set bit when initializing context. v3: - Always toggle preemption bool to false before enabling it for the first time, so the state gets emitted (Chris Wilson). - Emit end of pipe sync with PIPE_CONTROL_RENDER_TARGET_FLUSH (Ken) Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/genxml: Add register for object preemption.Rafael Antognolli2018-12-143-0/+24
| | | | | Signed-off-by: Rafael Antognolli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* util/slab: Rename slab_mempool typed parameters to mempoolIan Romanick2018-12-142-14/+14
| | | | | | | Now everything with type 'struct slab_child_pool *' is name pool, and everything with type 'struct slab_mempool *' is named mempool. Signed-off-by: Ian Romanick <[email protected]>
* nir/phi_builder: Internal users should use ↵Ian Romanick2018-12-141-2/+2
| | | | | | | nir_phi_builder_value_set_block_def too Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* etnaviv: drop redundant ctx function parameterChristian Gmeiner2018-12-141-4/+3
| | | | | | | | There is no need to have an extra ctx paramter as all the other parameters carry all the needed information. Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Lucas Stach <[email protected]>
* genxml: Consistently use a numeric "MOCS" fieldKenneth Graunke2018-12-1416-260/+177
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When we first started using genxml, we decided to represent MOCS as an actual structure, and pack values. However, in many places, it was more convenient to use a numeric value rather than treating it as a struct, so we added secondary setters in a bunch of places as well. We were not entirely consistent, either. Some places only had one. Gen6 had both kinds of setters for STATE_BASE_ADDRESS, but newer gens only had the struct-based setters. The names were sometimes "Constant Buffer Object Control State" instead of "Memory", making it harder to find. Many had prefixes like "Vertex Buffer MOCS"...in a vertex buffer packet...which is a bit redundant. On modern hardware, MOCS is simply an index into a table, but we were still carrying around the structure with an "Index to MOCS Table" field, in addition to the direct numeric setters. This is clunky - we really just want a number on new hardware. This patch eliminates the struct-based setters, and makes the numeric setters be consistently called "MOCS". We leave the struct definition around on Gen7-8 for reference purposes, but it is unused. v2: Drop bonus "Depth Buffer MOCS" fields on Gen7.5 and Gen9 Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian H. Kristensen <[email protected]>
* nir: fix opt_if_loop_last_continue()Timothy Arceri2018-12-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | The pass did not correctly handle loops ending in: if ssa_7 { block block_8: /* preds: block_7 */ continue /* succs: block_1 */ } else { block block_9: /* preds: block_7 */ break /* succs: block_11 */ } The break will get eliminated by another opt but if this pass gets called first (as it does on RADV) we ended up inserting instructions after the break. Fixes: 5921a19d4b0c ("nir: add if opt opt_if_loop_last_continue()") Reviewed-by: Dave Airlie <[email protected]>
* freedreno/a6xx: fix resource_copy_region()Rob Clark2018-12-131-9/+24
| | | | | | | | | | | | | | pctx->resource_copy_region() needs to fall back to sw copy for non-renderable formats. But previously for things that we could not use the blitter for, would fall back to 3d. Which won't work if 3d can't render to the dst format either. Instead rework things to fallback to fd_resource_copy_region(), which will try 3d core and then fall back to memcpy(). Fixes (for example) dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot Signed-off-by: Rob Clark <[email protected]>
* freedreno: move fd_resource_copy_region()Rob Clark2018-12-133-62/+73
| | | | | | Code-motion prep for next patch. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: more blitter fixesRob Clark2018-12-131-10/+22
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2018-12-138-30/+39
| | | | Signed-off-by: Rob Clark <[email protected]>
* gallium/aux: add is_unorm() helperRob Clark2018-12-132-0/+24
| | | | | | We already had one for is_snorm() but not unorm. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: fix blitter crashRob Clark2018-12-131-0/+17
| | | | | | | | | | Fixes a crash with unsupported formats in dEQP-GLES3.functional.texture.format.sized.2d.rgb9_e5_pot Also fixes gpu hangs with some formats that are supported, but which we don't know what internal-format to use for the blitter, for ex dEQP-GLES3.functional.texture.format.sized.2d_array.rgb10_a2_pot Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: don't remove unused input componentsRob Clark2018-12-131-1/+7
| | | | | Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix crashRob Clark2018-12-131-14/+8
| | | | | | | Fixes a crash in dEQP-GLES3.functional.shaders.fragdepth.compare.fragcoord_z Fixes: 0d240c22141 freedreno/ir3: don't fetch unused tex components Signed-off-by: Rob Clark <[email protected]>
* freedreno: also set DUMP flag on shadersRob Clark2018-12-135-20/+22
| | | | | | | | If we emit shader as a pointer to a GEM object, also set the RELOC_DUMP flag as a hint to kernel that this is a useful buffer to snapshot for debug dumps. Signed-off-by: Rob Clark <[email protected]>
* freedreno: debug GEM obj namesRob Clark2018-12-1313-21/+91
| | | | | | | With a recent enough kernel, set debug names for GEM BOs, which will show up in $debugfs/gem Signed-off-by: Rob Clark <[email protected]>
* freedreno/drm: sync uapi and enable softpinRob Clark2018-12-136-25/+30
| | | | | | | | | Pull in updated UAPI and use kernel API version to enable softpin. Since MSM_SUBMIT_BO_DUMP flag was added at same time, use that to signal to kernel that cmdstream buffers are useful to dump for debugging/cmdstream-traces. Signed-off-by: Rob Clark <[email protected]>
* nir: Move intel's half-float image store lowering to to nir_format.h.Eric Anholt2018-12-132-8/+15
| | | | | | | | I needed the same function for v3d. This was originally in d3e046e76c06 ("nir: Pull some of intel's image load/store format conversion to nir_format.h") before we made am istake about simplifying the function. Reviewed-by: Jason Ekstrand <[email protected]>
* Revert "intel: Simplify the half-float packing in image load/store lowering."Eric Anholt2018-12-131-2/+8
| | | | | | | | This reverts commit 06fbcd2cd5cc5702c9039c26d20082a99bc157bf. nir_pack_half_2x16_split *isn't* vectorizable, it's 1-component only, thus why we had this split-scalar code in the first place. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Print the format of image variables.Eric Anholt2018-12-131-0/+47
| | | | | | | | This helps a lot when debugging image load/store lowering on large testcases. Unfortunately the Mesa enum name stuff is under src/mesa and we can't get at it from the compiler. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa/st: Expose compute shaders when NIR support is advertised.Eric Anholt2018-12-132-8/+14
| | | | | | | | We have a NIR path, and V3D doesn't have TGSI input for compute (only what TTN can handle for the various gallium-internal shaders). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* radv/xfb: fix counter buffer bounds checks.Dave Airlie2018-12-131-2/+2
| | | | | | | | | | | | If we gave this function 0 counter buffers, we'd still try and access pCounterBuffers[0] as this check was incorrect. Fixes crash with ext_transform_feedback-pipeline-basic-primgen on zink on radv. Fixes: 677b496b6 (radv: fix begin/end transform feedback with 0 counter buffers.) Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* i965: Enable nir_opt_idiv_const for 32 and 64-bit integersJason Ekstrand2018-12-131-1/+3
| | | | | | | | | | | | | | | | | | | | | The pass should work for all bit sizes but it's less clear that the extra instructions are worth it on small integers. Also, the hardware doesn't do mul_high on anything other than 32-bit integers and, absent any decent mechanism for testing the pass on 8 and 16-bit types, it's probably best to just leave it disabled for now. Shader-db results on Sky Lake: total instructions in shared programs: 15105795 -> 15111403 (0.04%) instructions in affected programs: 72774 -> 78382 (7.71%) helped: 0 HURT: 265 Note that hurt here actually means helped because we're getting rid of integer quotient operations (which are a send on some platforms!) and replacing them with fairly cheap ALU ops. Reviewed-by: Ian Romanick [email protected]
* i965/vec4: Implement nir_op_uadd_satJason Ekstrand2018-12-131-0/+6
| | | | Reviewed-by: Ian Romanick [email protected]
* i965/fs: Implement nir_op_uadd_satIan Romanick2018-12-131-0/+5
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a pass for lowering integer division by constantsJason Ekstrand2018-12-134-0/+219
| | | | | | | | | | | It's a reasonably well-known fact in the world of compilers that integer divisions by constants can be replaced by a multiply, an add, and some shifts. This commit adds such an optimization to NIR for easiest case of udiv. Other division operations will be added in following commits. In order to provide some additional driver control, the pass takes a minimum bit size to optimize. Reviewed-by: Ian Romanick [email protected]
* nir: Add a saturated unsigned integer add opcodeIan Romanick2018-12-131-0/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir/lower_int64: Add support for [iu]mul_highJason Ekstrand2018-12-132-0/+67
| | | | Reviewed-by: Ian Romanick [email protected]