summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/a6xx: enable tiled imagesRob Clark2019-02-211-5/+2
| | | | | | | Turns out we can write to tiled images as well as read. This avoids having to linearize or do the tiling in the shader. Signed-off-by: Rob Clark <[email protected]>
* nir, glsl: move pixel_center_integer/origin_upper_left to shader_info.fsAlejandro Piñeiro2019-02-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On GLSL that info is set as a layout qualifier when redeclaring gl_FragCoord, so somehow tied to a specific variable. But in practice, they behave as a global of the shader. On ARB programs they are set using a global OPTION (defined at ARB_fragment_coord_conventions), and on SPIR-V using ExecutionModes, that are also not tied specifically to the builtin. This patch moves that info from nir variable and ir variable to nir shader and gl_program shader_info respectively, so the map is more similar to SPIR-V, and ARB programs, instead of more similar to GLSL. FWIW, shader_info.fs already had pixel_center_integer, so this change also removes some redundancy. Also, as struct gl_program also includes a shader_info, we removed gl_program::OriginUpperLeft and PixelCenterInteger, as it would be superfluous. This change was needed because recently spirv_to_nir changed the order in which execution modes and variables are handled, so the variables didn't get the correct values. Now the info is set on the shader itself, and we don't need to go back to the builtin variable to set it. Fixes: e68871f6a ("spirv: Handle constants and types before execution modes") v2: (Jason) * glsl_to_nir: get the info before glsl_to_nir, while all the rest of the info gathering is happening * prog_to_nir: gather the info on a general info-gathering pass, not on variable setup. v3: (Jason) * Squash with the patch that removes that info from ir variable * anv: assert that OriginUpperLeft is true. It should be already set by spirv_to_nir. * blorp: set origin_upper_left on its core "compile fragment shader", not just on some specific places (for this we added an helper on a previous patch). * prog_to_nir: no need to gather specifically this fragcoord modes as the full gl_program shader_info is copied. * spirv_to_nir: assert that we are a fragment shader when handling this execution modes. v4: (reported by failing gitlab pipeline #18750) * state_tracker: update too due changes on ir.h/gl_program v5: * blorp: minor change after change on previous patch * radeonsi: update due this change. v6: (Timothy Arceri) * prog_to_nir: remove extra whitespace * shader_info: don't use :1 on origin_upper_left * glsl: program.fs.origin_upper_left/pixel_center_integer can be move out of the shader list loop
* panfrost: Verify and print brx condition in disasmAlyssa Rosenzweig2019-02-211-1/+10
| | | | | | | | | | | | The condition code in extended branches is repeated 8 times for unclear reasons; accordingly, the code would be disassembled as "unknown5555", "unknownAAAA", etc. This patch correctly masks off the lower two bits to find the true code to print, verifying that the code is repeated as believed to be necessary (providing some assurance for compiler quality and an assert trip in case we encounter a shader in the wild that breaks the convention). Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Dynamically set discard branch targetsAlyssa Rosenzweig2019-02-211-20/+46
| | | | | | | | | | | | | discard and discard_if are both implemented with the branching pipeline on Midgard; essentially, we branch to the end of the fragment shader in a special "discard" mode, setting the condition as necessary. Previously, we hardcoded the form of this instruction, which worked for very simple shaders but was incorrect for anything remotely interesting. This patch instead emits logical branches in the IR, which are flattened to real discard ops the same way other branches are, allowing targets to be computed correctly. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Emit extended branchesAlyssa Rosenzweig2019-02-212-17/+84
| | | | | | | | | | | | | | | Previously, we only emitted compact branches; however, the offset range of these branches is too small for many real world shaders. This patch implements support for emitting extended branches and switches to always using them for control flow. This incurs a code size and possibly performance penalty, but expands the range of working shaders and provides opportunity for further optimization. Support for emitting compact branches is retained but this code path is presently unused. In the future, we'll want to heuristically determine which type of branch should be emitted for optimal codegen. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Rectify doubleplusungood extended branchAlyssa Rosenzweig2019-02-212-8/+8
| | | | | | | | | | | | | | Midgard features "compact branches" and "extended branches", i.e. corresponds to short jumps and far jumps. The form of the extended branch was previously incorrect in the ISA headers; this patch corrects it and updates the disassembler (simultaneous to preserve bisectability). Additionally, we fix some a corner case in the disassembly of extended branches, and we now prefix extended branches with "brx", to visually differentiate from compact branches prefixed with "br". Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Fix nested/chained if-elseAlyssa Rosenzweig2019-02-211-3/+3
| | | | | | | | | | | | An if-else statement is compiled to a conditional branch (from the start to the second block) and an unconditional branch (from the end of the first block to the end of the else). We previously incorrectly computed the block index of the unconditional branch to be exactly one after that of the conditional branch, valid for a single if-else statement but nothing fancier. This patch correctly computes the unconditional branch target, fixing more complex if-else chains. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost/midgard: Refactor tag lookahead codeAlyssa Rosenzweig2019-02-211-27/+31
| | | | | | | | | | | | Each Midgard instruction is scheduled to a particular instruction type ("tag"). Presumably the hardware prefetches memory based on tag, so it is required to report out the first tag to the command stream and the next tag of a branch target. This procedure was implemented in two separate parts of the compiler (one time with a slight bug relating to empty blocks); this patch refactors to unite the two routines and solve the bug when branching to empty blocks. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Implement pantrace (command stream dump)Alyssa Rosenzweig2019-02-215-0/+188
| | | | | | | | | | Historically, Panfrost debugging entailed the use of the LD_PRELOADable `panwrap` tool. This setup is a tad fragile; Panfrost can be traced directly without the intermediate layer. pantrace implements the quivalent functionality of panwrap into Panfrost proper, allowing dumps to work regardless of the kernel layer in use. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Add pandecode (command stream debugger)Alyssa Rosenzweig2019-02-216-3/+2289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The `panwrap` utility can be LD_PRELOAD'd into a GLES app, intercepting communication between the driver and the kernel. Modern panwrap versions do no processing of their own; instead, they create a trace directory. This directory contains the following files: - control.log: a line-by-line plain text file, denoting important syscalls (mmaps and job submits) along with their arguments - memory_*.bin, shader_*.bin: binary dumps of mapped memory Together, these files contain enough information to reconstruct the command stream and shaders of (at minimum) a single frame. The `pandecode` utility takes this directory structure as input, reconstructing the mapped memory and using the job submit command as an entrypoint. It then walks the descriptors as the hardware would, parsing and pretty-printing. Its final output is the pretty-printed command stream interleaved with the disassembled shaders, suitable for driver debugging. For instance, the behaviour of two driver versions (one working, one broken) can be compared by diff'ing their decoded logs. pandecode/decode.c was originally a part of `panwrap`; it is the oldest living code in the project. Its history is generally not worth preserving. panwrap itself will continue to live downstream for the foreseeable future, as it is specifically written for the vendor kernel. It is possible, however, to produce equivalent traces directly from Panfrost, bypassing the intermediate wrapping layer for well-behaved drivers. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Stub out separate stencil functionsAlyssa Rosenzweig2019-02-212-4/+25
| | | | | | | This is not yet functional, but it resolves a crash in various apps and provides a framework for further work. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* radeonsi: use SDMA for uploading data through const_uploaderMarek Olšák2019-02-205-28/+143
| | | | | | | | v2: use tc.stream_uploader in si buffer_transfer_map if not called from the driver thread Reviewed-by: Nicolai Hähnle <[email protected]> (v1) Tested-by: Dieter Nützel <[email protected]>
* gallium/u_upload_mgr: allow use of FLUSH_EXPLICIT with persistent mappingsMarek Olšák2019-02-202-6/+31
| | | | | | | for radeonsi Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* gallium/u_threaded: always unmap const_uploaderMarek Olšák2019-02-201-0/+1
| | | | | | | | radeonsi will require this. It's a no-op for drivers supporting persistent mappings. Reviewed-by: Nicolai Hähnle <[email protected]> Tested-by: Dieter Nützel <[email protected]>
* freedreno/a6xx: samplerBuffer fixesRob Clark2019-02-202-6/+6
| | | | | | | | | | | | | | Use the 'UNK31' bit (which should probably be called 'BUFFER') for samplerBuffer case, which increases the size of supported buffer texture beyond 2^15 elements. Also need to fix the 2nd coord injected to handle the tex instructions that take integer coords. Fixes dEQP-GLES31.functional.texture.texture_buffer.render.as_fragment_texture.buffer_size_131071 and similar Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: 3d and cube image fixesRob Clark2019-02-201-4/+31
| | | | | | | Fixes dEQP-GLES31.functional.image_load_store.{3d,cube}.store.* and a bunch more Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix crash in compile fail caseRob Clark2019-02-202-5/+8
| | | | | | | The variant will be NULL if RA failed. Which isn't ideal, but at least lets not segfault and bring down the rest of the dEQP run with us. Signed-off-by: Rob Clark <[email protected]>
* kmsro: Add the rest of the current set of tinydrm drivers.Eric Anholt2019-02-204-7/+33
| | | | | | | | | | | While I haven't tested them all, given that they're all using the same allocation paths and modifiers in the kernel they should be fine to use in the same way. v2: Rebase on other kmsro changes. v3: Skip repeated '[with_gallium_kmsro,' in the meson build. Acked-by: Alyssa Rosenzweig <[email protected]>
* freedreno/a6xx: Support MSAA resolve blits on blitterKristian H. Kristensen2019-02-201-3/+23
| | | | | | | | | | | | | | | This gets stencil and depth resolves working properly. Fixes: dEQP-GLES3.functional.fbo.msaa.2_samples.depth32f_stencil8 dEQP-GLES3.functional.fbo.msaa.2_samples.depth24_stencil8 dEQP-GLES3.functional.fbo.msaa.4_samples.depth32f_stencil8 dEQP-GLES3.functional.fbo.msaa.4_samples.depth24_stencil8 dEQP-GLES3.functional.fbo.invalidate.whole.unbind_blit_msaa_color dEQP-GLES3.functional.fbo.invalidate.sub.unbind_blit_msaa_color Signed-off-by: Kristian H. Kristensen <[email protected]>
* freedreno/a6xx: Copy stencil as R8_UINTKristian H. Kristensen2019-02-201-4/+14
| | | | | | | | | | | Blitter does support it after all. Previous attempt to use R8_UINT failed because we overwrote the a6xx format in emit_blit_texture(), but some of the later setup still looked at the gallium format. If we overwrite it in the pipe_blit_info before we even call into emit_blit_texture() it works properly. Signed-off-by: Kristian H. Kristensen <[email protected]>
* radeonsi: Go back to using llvm.pow intrinsic for nir_op_fpowKenneth Graunke2019-02-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ARB_vertex_program and ARB_fragment_program define 0^0 = 1 (while GLSL leaves it undefined). Performing fpow lowering in NIR would break this behavior, preventing us from using prog_to_nir. According to llvm/lib/Target/AMDGPU/SIInstructions.td, POW_common expands to <V_LOG_F32_e32, V_EXP_F32_e32, V_MUL_LEGACY_F32_e32>, which presumably does a zero-wins multiply. Lowering in NIR results in a non-legacy multiply, where: pow(0, 0) = 2^(log2(0) * 0) = 2^(-INF * 0) = 2^(-NaN) = -NaN which isn't the desired result. This reverts: - commit d6b75392067712908bdc372f1007e085439bf9f5 (ac/nir: remove emission of nir_op_fpow) - commit 22430224fec31591432d4a3e65c6f457ba1c1653 (radeonsi/nir: enable lowering of fpow) and prevents a regression in gl-1.0-spot-light with AMD_DEBUG=nir after enabling prog_to_nir in st/mesa later in this series. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi/nir: set shader_buffers_declared properlyTimothy Arceri2019-02-201-10/+22
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: set colors_read properlyTimothy Arceri2019-02-201-7/+10
| | | | | | | | | | | | | | | | | | shader-db results for VEGA64: Totals from affected shaders: SGPRS: 1976 -> 1976 (0.00 %) VGPRS: 1240 -> 1144 (-7.74 %) Spilled SGPRs: 145 -> 145 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 34632 -> 34604 (-0.08 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 261 -> 285 (9.20 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: set input_usage_mask properlyTimothy Arceri2019-02-201-11/+36
| | | | | | | | | | | | | | | | | | shader-db results for VEGA64: Totals from affected shaders: SGPRS: 791528 -> 792616 (0.14 %) VGPRS: 421624 -> 410784 (-2.57 %) Spilled SGPRs: 1639 -> 1674 (2.14 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 16103516 -> 16063696 (-0.25 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 136307 -> 137830 (1.12 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Use uniform location when calculating const_file_max.Timur Kristóf2019-02-201-6/+6
| | | | | | | | | | | The nine state tracker can produce NIR uniform variables whose location is explicitly set. radeonsi did not take that into account when calculating const_file_max, resulting in rendering glitches. This patch fixes that. Signed-Off-By: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: add driconf option radeonsi_enable_nirMarek Olšák2019-02-192-1/+3
| | | | | Cc: 18.3 19.0 <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* tegra/autotools: add missing libdrm cflagsEric Engestrom2019-02-191-0/+1
| | | | | | | Fixes: f1374805a86d0d506557 "drm-uapi: use local files, not system libdrm" Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109647 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* tegra/meson: add missing dep_libdrmEric Engestrom2019-02-191-0/+1
| | | | | | | Fixes: f1374805a86d0d506557 "drm-uapi: use local files, not system libdrm" Bug: https://bugs.freedesktop.org/show_bug.cgi?id=109645 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* v3d: Stop tracking num_inputs for VPM loads.Eric Anholt2019-02-182-2/+2
| | | | | It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.
* v3d: Use the early_fragment_tests flag for the shader's disable-EZ field.Eric Anholt2019-02-181-2/+7
| | | | | | | | | | | Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <[email protected]> Fixes: 051a41d3d56e ("v3d: Add support for the early_fragment_tests flag.")
* v3d: Sync indirect draws on the last rendering.Eric Anholt2019-02-181-2/+2
| | | | | | Fixes intermittent fails in dEQP-GLES31.functional.draw_indirect.compute_interop.separate.drawelements_compute_cmd_and_data_and_indices and others (particularly when run as part of a CTS run)
* v3d: Clear the GMP on initialization of the simulator.Eric Anholt2019-02-181-0/+1
| | | | | | | Otherwise, we might have pages accessible that shouldn't be and miss out on errors. This is unlikely for most tests since v3d_hw_get_mem() is big enough that it'll be a freshly zeroed mmap, but if screens are destroyed and recreated then we'd be reusing the old v3d_hw_get_mem() contents.
* freedreno/a6xx: fix helper_invocation (sampler mask/id)Rob Clark2019-02-181-1/+6
| | | | | | | | | | | | | Since gl_HelperInvocation is lowered to: !((1 << sample_id) & sample_mask_in)) Not setting these enable bits was causing it be broken. (And probably a bunch of other stuff too.) Fixes dEQP-GLES31.functional.shaders.helper_invocation.* Signed-off-by: Rob Clark <[email protected]>
* panfrost: Fix clipping regionAlyssa Rosenzweig2019-02-181-4/+11
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Preserve w sign in perspective divisionAlyssa Rosenzweig2019-02-181-2/+4
| | | | | | | This fixes issues where polygons that should be culled (due to negative w, for instance) may not be. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Cleanup mali_viewport (clipping) codeAlyssa Rosenzweig2019-02-182-17/+19
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Swap order of tiled texture (de)allocAlyssa Rosenzweig2019-02-181-6/+6
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Free imported BOsAlyssa Rosenzweig2019-02-183-0/+12
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Fix various leaks unmapping resourcesAlyssa Rosenzweig2019-02-182-9/+14
| | | | | | v2: Don't check for NULL before free() Signed-off-by: Alyssa Rosenzweig <[email protected]>
* freedreno/a6xx: cache flush harderRob Clark2019-02-165-7/+37
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: compute supportRob Clark2019-02-168-40/+290
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: image/ssbo state emitRob Clark2019-02-166-215/+259
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: border-color offset helperRob Clark2019-02-162-13/+31
| | | | | | | Soon we'll need this logic to deal w/ image/SSBO case, so split out a helper rather than duplicate the logic. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add image/ssbo <-> ibo/tex mappingRob Clark2019-02-163-67/+32
| | | | | | | | | | Images and SSBOs don't map directly to the hw. They end up being part texture and part something else. Starting with a6xx, the hack used for a5xx to smash the image tex state into hw texture state starting from MAX counting down won't work, because we start using tex state also for SSBO read. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: clean up some open-coded bitsRob Clark2019-02-161-2/+4
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a6xx: move stream-out emit to helperRob Clark2019-02-161-64/+72
| | | | | | | Split out of the main fd6_emit() code, since it was already getting to be a pretty giant function. Signed-off-by: Rob Clark <[email protected]>
* swr/rast: Add translation support to streamoutAlok Hota2019-02-1512-37/+106
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: simdlib cleanup, clipper stack space fixesAlok Hota2019-02-1513-135/+127
| | | | | | | Reduce stack space used by clipper, which had lead to crashes in some versions for MSVC Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: convert DWORD->uint32_t, QWORD->uint64_tAlok Hota2019-02-155-25/+25
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Refactor scratch space variable namesAlok Hota2019-02-154-14/+14
| | | | Reviewed-by: Bruce Cherniak <[email protected]>