summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* radv: set ACCESS_NON_READABLE on stores for copy/fill/clear meta shadersSamuel Pitoiset2019-04-152-0/+3
| | | | | | | The compiler will emit GLC=1. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: Use local buffers for the global bo list.Bas Nieuwenhuizen2019-04-153-2/+8
| | | | | | | | | | | | | | Even if we don't use local buffers in general. Turns out that even though the performance is not the best the kernel still does it better than our own list. We still have to keep the radv bo list for buffers that are shared externally. This improves Talos on lowest quality setting (so as CPU bound as possible) by ~10% if the global bo list is enabled. Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: Move has_local_buffers disable to radeonsi.Bas Nieuwenhuizen2019-04-152-3/+5
| | | | | | | | | | | | | | In radv we had a separate flag to actually use it + an env option to experimentally use it. The common code setting has_local_buffers to false of course broke that experimental option. Also the "enable on APU" did not make sense for RADV as it is still disabled by default. Fixes: b21a4efb553 "radv/winsys: allow local BOs on APUs" Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: Add bolist RADV_PERFTEST flag.Bas Nieuwenhuizen2019-04-152-0/+3
| | | | | | To test global_bo_list performance. Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: fix incorrect bindless atomic code in visit_image_atomicMarek Olšák2019-04-151-3/+3
| | | | | | | | | Coverity: CID 1444664 Fixes: d62d434fe920 ("ac/nir_to_llvm: add image bindless support") Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* nir,ac/nir: fix cube_face_coordRhys Perry2019-04-152-8/+24
| | | | | | | | Seems it was missing the "/ ma + 0.5" and the order was swapped. Fixes: a1a2a8dfda7b9cac7e ('nir: add AMD_gcn_shader extended instructions') Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* anv: Update to use the new features struct namesJason Ekstrand2019-04-151-6/+6
| | | | | | | | These were updated in version 1.1.106 of vulkan.h to make more sense with the extension names. We may as well keep with the times. Acked-by: Dave Airlie <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* vulkan: Update the XML and headers to 1.1.106Jason Ekstrand2019-04-151-18/+64
| | | | | Acked-by: Dave Airlie <[email protected]> Acked-by: Lionel Landwerlin <[email protected]>
* nir: fix packing components with arraysTimothy Arceri2019-04-151-1/+2
| | | | | | | | | | | | | When gathering info for unmovable types we need to handle arrays. While we dont support packing/moving arrays we do support packing scalar components with these arrays. Fixes piglit: tests/spec/arb_enhanced_layouts/execution/component-layout/vs-fs-array-interleave-range.shader_test Fixes: 5eb17506e159 ("nir: do not pack varying with different types") Reviewed-by: Samuel Pitoiset <[email protected]>
* radv: enable VK_KHR_shader_float16_int8Samuel Pitoiset2019-04-152-1/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* spirv: add SpvCapabilityFloat16 supportSamuel Pitoiset2019-04-152-1/+5
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* intel: Emit 3DSTATE_VF_STATISTICS dynamicallyKenneth Graunke2019-04-146-11/+35
| | | | | | | | | | | | | | | | | | | | | Pipeline statistics queries should not count BLORP's rectangles. (23) How do operations like Clear, TexSubImage, etc. affect the results of the newly introduced queries? DISCUSSION: Implementations might require "helper" rendering commands be issued to implement certain operations like Clear, TexSubImage, etc. RESOLVED: They don't. Only application submitted rendering commands should have an effect on the results of the queries. Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when the driver is hacked to always perform glBufferData via a GPU staging copy (for debugging purposes). Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* nir/validate: Require unused bits of nir_const_value to be zeroJason Ekstrand2019-04-142-20/+41
| | | | Reviewed-by: Karol Herbst <[email protected]>
* nir/load_const_to_scalar: Get rid of a bit size switch statementJason Ekstrand2019-04-141-19/+1
| | | | | | | Now that nir_const_value is a scalar, we don't need the switch on bit size in order to pluck off components properly. Reviewed-by: Karol Herbst <[email protected]>
* spirv: Drop some unneeded bit size switch statementsJason Ekstrand2019-04-141-62/+4
| | | | | | | Now that nir_const_value is a scalar, we don't need the switch on bit size in order copy components around properly. Reviewed-by: Karol Herbst <[email protected]>
* nir/constant_folding: Get rid of a bit size switch statementJason Ekstrand2019-04-141-19/+1
| | | | | | | Now that nir_const_value is a scalar, we don't need the switch on bit size in order to swizzle them properly. Reviewed-by: Karol Herbst <[email protected]>
* nir: make nir_const_value scalarKarol Herbst2019-04-1443-416/+470
| | | | | | | | | v2: remove & operator in a couple of memsets add some memsets v3: fixup lima Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (v2)
* spirv: reduce array size in vtn_handle_constantKarol Herbst2019-04-141-1/+1
| | | | | | | | we already assert above that there are no more than 3 sources, so it doesn't make sense to use an array of 4 sources Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/loop_analyze: use nir_const_value.b for boolean results, not u32Karol Herbst2019-04-141-1/+1
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/print: Use nir_src_as_int for array indicesJason Ekstrand2019-04-141-3/+2
| | | | Reviewed-by: Karol Herbst <[email protected]>
* nir/builder: Add a nir_imm_zero helperJason Ekstrand2019-04-144-17/+18
| | | | | | v2: replace nir_zero_vec with nir_imm_zero (Karol Herbst) Reviewed-by: Karol Herbst <[email protected]>
* nir/builder: Move nir_imm_vec2 from blorp into the builderKarol Herbst2019-04-142-12/+12
| | | | | | | | While we're here, fix a typo which caused it to actually return a vec4 with the third and fourth components zero. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* lima: use nir_src_as_floatKarol Herbst2019-04-142-9/+2
| | | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* freedreno/ir3: use nir_src_as_uint in a few placesKarol Herbst2019-04-145-51/+20
| | | | | | | | v2 (Jason Ekstrand): - Add even more places Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/nir: use nir_src_is_const and nir_src_as_uintKarol Herbst2019-04-141-6/+4
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/nir: Take a nir_tex_instr and src index in brw_texture_offsetJason Ekstrand2019-04-144-27/+21
| | | | | This makes things a bit simpler and it's also more robust because it no longer has a hard dependency on the offset being a 32-bit value.
* radv: use nir constant helpersKarol Herbst2019-04-142-20/+10
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* amd/nir: some cleanupsKarol Herbst2019-04-141-20/+9
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* panfrost/midgard: Use shared nir_lower_viewport_transformAlyssa Rosenzweig2019-04-141-101/+4
| | | | | | | v2: Run before lowering I/O. Signed-off-by: Alyssa Rosenzweig <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* nir: Add nir_lower_viewport_transformAlyssa Rosenzweig2019-04-144-0/+105
| | | | | | | | | | | | | | | | | | | | | | | | | On Mali hardware (supported by Panfrost and Lima), the fixed-function transformation from world-space to screen-space coordinates is done in the vertex shader prior to writing out the gl_Position varying, rather than in dedicated hardware. This commit adds a shared NIR pass for implementing coordinate transformation and lowering gl_Position writes into screen-space gl_Position writes. v2: Run directly on derefs before io/vars are lowered to cleanup the code substantially. Thank you to Qiang for this suggestion! v3: Bikeshed continues. v4: Add to Makefile.sources (per Jason's comment). Bikeshed comment. Ian and Qiang's reviews are from v3, but no real functional changes from v4. Rob's review is from v4. Signed-off-by: Alyssa Rosenzweig <[email protected]> Suggested-by: Qiang Yu <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* panfrost: Cleanup indexed draw handlingAlyssa Rosenzweig2019-04-141-52/+28
| | | | | | | | | | As part of this cleanup, we use the newly-exposed u_vbuf_get_minmax_index, deduplicating quite a bit of bookkeeping. We also centralize the draw_flags tracking to make this code cleaner / futureproofed; we have already had bugs regarding this field so we might as well get it right now. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Drop dependence on mesa/stAlyssa Rosenzweig2019-04-142-9/+1
| | | | | | | This was used as a workaround for uniform sizing which was fixed in 771adffe ("st: Lower uniforms in st in the...") Signed-off-by: Alyssa Rosenzweig <[email protected]>
* draw: fix building error in draw_gs_init()Mauro Rossi2019-04-141-1/+1
| | | | | | | | | | | | | | Fixes the following building error happening with Android build system: external/mesa/src/gallium/auxiliary/draw/draw_gs.c:740:79: error: address of array 'draw->gs.tgsi.machine->PrimitiveOffsets' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion] if (!draw->gs.tgsi.machine->Primitives[i] || !draw->gs.tgsi.machine->PrimitiveOffsets) ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~ 1 error generated. Fixes: 7720ce3 ("draw: add support to tgsi paths for geometry streams. (v2)") Signed-off-by: Mauro Rossi <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* lima/gpir: fix alu check miss last store slotQiang Yu2019-04-141-2/+2
| | | | | | Fixes: 92d7ca4b1cd "gallium: add lima driver" Signed-off-by: Qiang Yu <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima/gpir: fix compile fail when two slot nodeQiang Yu2019-04-143-3/+25
| | | | | | | | Come from glmark2-es2 jellyfish test. Fixes: 92d7ca4b1cd "gallium: add lima driver" Signed-off-by: Qiang Yu <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]>
* lima: add support for depth/stencil fbo attachments and texturesVasily Khoruzhick2019-04-147-24/+120
| | | | | | | | | Hardware supports writing back Z/S buffers and sampling from them, so add support for that. Signed-off-by: Vasily Khoruzhick <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Tested-by: Icenowy Zheng <[email protected]>
* lima: use individual tile heap for each GP job.Vasily Khoruzhick2019-04-145-19/+15
| | | | | | | | | Looks like it's somehow used by subsequent PP job, so we have to preserve its contents until PP job is done. Signed-off-by: Vasily Khoruzhick <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Tested-by: Icenowy Zheng <[email protected]>
* nir: add lower_ftruncChristian Gmeiner2019-04-132-0/+3
| | | | | | | Port TGSI TRUNC lowering to nir Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: leave the top 4Gb of the high heap VMA unusedLionel Landwerlin2019-04-131-5/+5
| | | | | | | | | | | | | In 628c9ca9089789 I forgot to apply the same -4Gb of the high address of the high heap VMA. This was previously computed in the HIGH_HEAP_MAX_ADDRESS. Many thanks to James for pointing this out. Signed-off-by: Lionel Landwerlin <[email protected]> Reported-by: Xiong, James <[email protected]> Fixes: 628c9ca9089789 ("anv: store heap address bounds when initializing physical device") Reviewed-by: Jason Ekstrand <[email protected]>
* v3d: Use the new lower_to_scratch implementation for indirects on temps.Eric Anholt2019-04-128-11/+193
| | | | | | | | | | | | | We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.
* nir: Add a pass for selectively lowering variables to scratch spaceJason Ekstrand2019-04-129-1/+216
| | | | | | | | | | This commit adds new nir_load/store_scratch opcodes which read and write a virtual scratch space. It's up to the back-end to figure out what to do with it and where to put the actual scratch data. v2: Drop const_index comments (by anholt) Reviewed-by: Eric Anholt <[email protected]>
* v3d: Detect the correct number of QPUs and use it to fix the spill size.Eric Anholt2019-04-123-4/+13
| | | | | We were missing a * 4 even if the particular hardware matched our assumption.
* v3d: Add missing dumping for the spill offset/size uniforms.Eric Anholt2019-04-121-0/+8
|
* v3d: Add missing base offset to CS shared memory accesses.Eric Anholt2019-04-121-9/+20
| | | | | This code is so touchy, trying to emit the minimum amount of address math. Some day we'll move it all to NIR, I hope.
* v3d: Add Compute Shader compilation support.Eric Anholt2019-04-129-83/+302
| | | | | | | | While waiting for the CSD UABI to get reviewed, I keep having to rebase the CS patch. Just land the compiler side for now to keep it from diverging. For now this covers just GLES 3.1 compute shaders, not CL kernels.
* v3d: Replace the old shader-db env var output with the ARB_debug_output.Eric Anholt2019-04-123-30/+4
| | | | | | | | | We're using ARB_debug_output for the main shader-db, but I had this env var left around from the shader-db-2 support (vc4 apitrace-based). Keep the env var around since it's nice sometimes to get the stats on a shader you're optimizing without having to do a shader-db run, but drop the old formatting that's not useful and keeps tricking me when I go to add another measurement to the shader-db output.
* v3d: Include the number of max temps used in the shader-db output.Eric Anholt2019-04-121-1/+29
| | | | | This gives us finer-grained feedback on how we're doing on register pressure than "did we trigger a new shader to spill or drop thread count?"
* v3d: Drop a note for the future about PIPE_CAP_PACKED_UNIFORMS.Eric Anholt2019-04-121-0/+7
|
* v3d: Add and use a define for the number of channels in a QPU invocation.Eric Anholt2019-04-123-3/+9
| | | | | A shader invocation always executes 16 channels together, so we often end up multiplying things by this magic 16 number. Give it a name.
* nir: Add a comment about how intrinsic definitions work.Eric Anholt2019-04-121-0/+11
| | | | | | I was thinking about a refactor, and needed to read this first. Reviewed-by: Jason Ekstrand <[email protected]>