summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* llvmpipe: add fragment shader image supportDave Airlie2019-08-2711-8/+334
| | | | Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: introduce image jit type to fragment shader jit.Dave Airlie2019-08-272-2/+67
| | | | | | This adds the image type to the fragment shader jit context Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: move the fragment shader variant key to dynamic length.Dave Airlie2019-08-272-22/+46
| | | | | | | | | | This mirrors the vs/gs keys, and will be needed when adding images support. The const changes also mirror how the draw code work (as is needed when we add images) Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: handle early test property.Dave Airlie2019-08-271-2/+6
| | | | | | Also handle setting late for shaders that use stores Reviewed-by: Roland Scheidegger <[email protected]>
* gallivm: move first/last level jit texture members.Dave Airlie2019-08-272-10/+10
| | | | | | | This lets us create an image structure with the same basic types as the texture one. Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: refactor jit type creationDave Airlie2019-08-271-76/+87
| | | | | | | This just cleans the code up so the texture/sampler type creation can be reused. Reviewed-by: Roland Scheidegger <[email protected]>
* virgl: fix format conversion for recent gallium changes.Dave Airlie2019-08-266-16/+303
| | | | | | | | | | | | | | | | | | The virgl formats are fixed in time snapshots of the gallium ones, we just need to provide a translation table between them when we enter the hardware. This fixes a regression since Eric renumbered the gallium table. Fixes: c45c33a5a2 (gallium: Remove manual defining of PIPE_FORMAT enum values.) Bugzilla: https://bugs.freedesktop.org/111454 v1 by Dave Airlie <[email protected]> v2: virgl: Add a number of formats to the table that are used, e.g. for vertex attributes v3: cover some more missing formats from a piglit run Signed-off-by: Gert Wollny <[email protected]>
* lima/ppir: enable vectorize optimizationErico Nunes2019-08-251-0/+5
| | | | | | | | | | pp has vector units and some operations can be optimized when bundled together. Benchmarking this with piglit shaders shows that the instruction count can be greatly reduced on many examples with vectorize. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima/ppir: lower selects to scalarsErico Nunes2019-08-251-0/+5
| | | | | | | | | nir vec4 fcsel assumes that each component of the condition will be used to select the same component from the options, but pp can't implement that since it only has 1 component for the condition. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* lima: fix ppir spill stack allocationErico Nunes2019-08-254-9/+25
| | | | | | | | | | | | The previous spill stack was fixed and too small, and caused instability in programs requiring spilling for roughly more than one value. This patch adds a dynamic calculation of the buffer size based on stack utilization and switches it to a separate allocation at flush time that will fit the shader that requires the largest buffer. Signed-off-by: Erico Nunes <[email protected]> Reviewed-by: Vasily Khoruzhick <[email protected]> Reviewed-by: Qiang Yu <[email protected]>
* intel/fs: Drop the gl_program from fs_visitorJason Ekstrand2019-08-251-2/+2
| | | | | | | | | It's not used by anything anymore now that so much lowering has been moved into NIR. Sadly, we still need on in brw_compile_gs() for geometry shaders on Sandy Bridge. Short of a lot of pointless work, that one's probably not going away. Reviewed-by: Kenneth Graunke <[email protected]>
* lima: move format handling to unified placeQiang Yu2019-08-258-103/+190
| | | | | | | | | Create a unified table to handle pipe format to texture and render target format lookup. Reviewed-by: Vasily Khoruzhick <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Qiang Yu <[email protected]>
* lima/ppir: print register index and components number for spilled registerVasily Khoruzhick2019-08-241-1/+3
| | | | | | | | | It can be useful for debugging purposes Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: add control flow supportVasily Khoruzhick2019-08-246-23/+168
| | | | | | | | | | This commit adds support for nir_jump_instr, if and loop nir_cf_nodes. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: add better liveness analysisVasily Khoruzhick2019-08-245-74/+225
| | | | | | | | | | | Add better liveness analysis that was modelled after one in vc4. It uses live ranges and is aware of multiple blocks which is prerequisite for adding CF support Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: validate shader outputsVasily Khoruzhick2019-08-241-0/+13
| | | | | | | | Mali4x0 supports only gl_FragColor. gl_FragDepth is not supported. Check that we don't get anything but gl_FragColor in shader outputs. Reviewed-by: Qiang Yu <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: turn store_color into ALU nodeVasily Khoruzhick2019-08-234-61/+27
| | | | | | | | | | | | | We don't have a special OP to store color in PP, all we need to do is to store gl_FragColor into reg0, thus it's just a mov and therefore ALU node. Yet we still need to indicate that it's store_color op so regalloc ignores its destination. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: create ppir block for each corresponding NIR blockVasily Khoruzhick2019-08-232-4/+49
| | | | | | | | | | | Create ppir block for each corresponding NIR block and populate its successors. It will be used later in liveness analysis and in CF support Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: add dummy opVasily Khoruzhick2019-08-233-5/+21
| | | | | | | | | | | | | | | | We can get following from NIR: (1) r1 = r2 (2) r2 = ssa1 Note that r2 is read before it's assigned, so there's no node for it in comp->var_nodes. We need to create a dummy node in this case which sole purpose is to hold ppir_dest with reg in it. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: add write after read deps for registersVasily Khoruzhick2019-08-231-2/+25
| | | | | | | | | | | | | | | For cases like: (1) r1 = r2 (2) r2 = ssa1 We need to add (1) as dependency of (2), otherwise scheduler may reorder them. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: fix ordering depsVasily Khoruzhick2019-08-231-6/+8
| | | | | | | | | | | | | | | | | There can be several root nodes, i.e.: (1) r0 = r1 (2) r2 = r3 (3) branch if (ssa1) We need to make (3) depend on (1) and (2), old code added dependency only for (2), and (1) was kept as root node since there is no branch/discard or store color between two movs. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: set write mask for texture loads if dest is regVasily Khoruzhick2019-08-231-1/+5
| | | | | | | | | | Destination for texture load can be a reg, so we need to set write mask in this case Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: add support for unconditional branches and condition negationVasily Khoruzhick2019-08-234-8/+34
| | | | | | | | | | | | | | | We need 'negate' modifier for branch condition to minimize branching. Idea is to generate following: current_block: { ...; if (!statement) branch else_block; } then_block: { ...; branch after_block; } else_block: { ... } after_block: { ... } Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: clone ld_{uni,tex,var} into each blockVasily Khoruzhick2019-08-234-5/+103
| | | | | | | | | | | | | | | ppir_lower_load() and ppir_lower_load_texture() assume that node is in the same block as its successors, fix it by cloning each ld_uni and ld_tex to every block. It also reduces register pressure since values never cross block boundaries and thus never appear in live_in or live_out of any block, so do it for varyings as well. Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* lima/ppir: refactor const loweringVasily Khoruzhick2019-08-236-128/+99
| | | | | | | | | | | Const nodes are now cloned for each user, i.e. const is guaranteed to have exactly one successor, so we can use ppir_do_one_node_to_instr() and drop insert_to_each_succ_instr() Tested-by: Andreas Baierl <[email protected]> Reviewed-by: Qiang Yu <[email protected]> Reviewed-by: Erico Nunes <[email protected]> Signed-off-by: Vasily Khoruzhick <[email protected]>
* iris: Guard GEN9-only function in Iris state to avoid warningCaio Marcelo de Oliveira Filho2019-08-231-0/+2
| | | | | Acked-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Fix large timeout handling in rel2abs()Kenneth Graunke2019-08-231-13/+14
| | | | | | | | | | | ...by copying the implementation of anv_get_absolute_timeout(). Appears to fix a CTS test with 32-bit builds: GTF-GL46.gtf32.GL3Tests.sync.sync_functionality_clientwaitsync_flush Fixes: f459c56be6b ("iris: Add fence support using drm_syncobj") Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* iris: Set MOCS in all STATE_BASE_ADDRESS commandsKenneth Graunke2019-08-231-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | Rafael Antognolli tracked down a performance gap between i965 and iris in Synmark2's OglCSDof microbenchmark, noting that iris was performing substantially more memory reads and writes, with substantially fewer L3 hits. He suggested that something might be wrong with MOCS, or L3 configs, at which point I came up with a theory... It would appear that the STATE_BASE_ADDRESS command updates the MOCS settings for various base addresses even if you don't specify the "Modify Enable" bit for that address. Until now, we had been setting only the MOCS for bases we intended to change, leaving the others "blank" which is MOCS table entry 0, which is uncached. Most data access has a more specific MOCS (e.g. in SURFACE_STATE), but scratch access uses the Stateless Data Port Access MOCS from STATE_BASE_ADDRESS. So this meant all scratch access was uncached. Improves performance in Synmark2's OglCSDof by 2x, bringing iris on par with the existing i965 driver. Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radeonsi/nir: Rewrite output scanningConnor Abbott2019-08-231-126/+150
| | | | | | | | | Similarly to before, this didn't properly handle varying structs with doubles in them. This doesn't fix any tests, but was noticed while looking at the code. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Rewrite store intrinsic gatheringConnor Abbott2019-08-231-59/+84
| | | | | | | | | | | The old version wasn't as accurate as it could be, and didn't handle double variables inside structs correctly. Walk the path to compute the actual components affected. In combination with the previous commit fixes KHR-GL45.enhanced_layouts.varying_structure_locations. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Add const_index when loading GS inputsConnor Abbott2019-08-231-1/+1
| | | | | | This fixes loading GS inputs in structures or arrays. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Don't add const offset to indirectConnor Abbott2019-08-231-19/+6
| | | | | | | | | This is already done in get_deref_offset() in the common code. We were adding it twice accidentally. Fixes KHR-GL45.enhanced_layouts.varying_array_locations. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: Don't recompute num_inputs and num_outputsConnor Abbott2019-08-231-24/+3
| | | | | | Don't repeat what mesa/st already does. Reviewed-by: Marek Olšák <[email protected]>
* ac,radv,radeonsi: remove LLVM 7 supportSamuel Pitoiset2019-08-233-18/+3
| | | | | | | | Now that LLVM 9 will be released soon, we will only support LLVM 8, 9 and master (10). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* iris: Avoid unnecessary resolves on transfer mapsKenneth Graunke2019-08-222-16/+31
| | | | | | | | | | We were always resolving the buffer as if we were accessing it via CPU maps, which don't understand any auxiliary surfaces. But we often copy to a temporary using BLORP, which understands compression just fine. So we can avoid the resolve, and accelerate the copy as well. Fixes: 9d1334d2a0f ("iris: Use copy_region and staging resources to avoid transfer stalls") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Drop copy format hacks from copy region based transfer path.Kenneth Graunke2019-08-221-16/+5
| | | | | | | | | | | | | | This doesn't work for compressed formats, as the source texture and temporary texture would have different block sizes. (Forcing the driver to always take the GPU path would expose the bug.) Instead, just use the source format for the temporary, and let blorp_copy deal with overrides. The one case where we can't do this is ASTC, because isl won't let us create a linear ASTC surface. Fall back to the CPU paths there for now. Fixes: 9d1334d2a0f ("iris: Use copy_region and staging resources to avoid transfer stalls") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Update fast clear colors on Gen9 with direct immediate writes.Kenneth Graunke2019-08-222-9/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Gen11 stores the fast clear color in an "indirect clear buffer", as a packed pixel value. Gen9 hardware stores it as a float or integer value, which is interpreted via the format. We were trying to store that in a buffer, for similarity with Icelake, and MI_COPY_MEM_MEM it from there to the actual SURFACE_STATE bytes where it's stored. This unfortunately doesn't work for blorp_copy(), which does bit-for-bit copies, and overrides the format to a CCS-compatible UINT format. This causes the clear color to be interpreted in the overridden format. Normally, we provide the clear color on the CPU, and blorp_blit.c:2611 converts it to a packed pixel value in the original format, then unpacks it in the overridden format, so the clear color we use expands to the bits we originally desired. However, BLORP doesn't support this pack/unpack with an indirect clear buffer, as it would need to do the math on the GPU. On Gen11+, it isn't necessary, as the hardware does the right thing. This patch changes Gen9 to stop using an indirect clear buffer and simply do PIPE_CONTROLs with post-sync write immediate operations to store the new color over the surface states for regular drawing. BLORP continues streaming out surface states, and handles fast clear colors on the CPU. Fixes: 53c484ba8ac ("iris: blorp using resolve hooks") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Fix broken aux.possible/sampler_usages bitmask handlingKenneth Graunke2019-08-221-5/+6
| | | | | | | | | | | | | | | | | | | | | | For renderable surfaces, we allocate SURFACE_STATEs for each bit in res->aux.possible_usages. Sampler views use res->aux.sampler_usages. When pinning buffers, we call surf_state_offset_for_aux() to calculate the offset to the desired surface state. surf_state_offset_for_aux() took an aux_modes parameter, which should be one of those two fields. However...it was not using that parameter. It always used the broader res->aux.possible_usages field directly. One of the callers, update_clear_value(), was passing incorrect masks for this parameter. It iterated through the bits in order, using u_bit_scan(), which destructively modifies the mask. So each time we called it, the count of bits before our selected mode was 0, which would cause us to always update the SURFACE_STATE for ISL_AUX_USAGE_NONE, rather than updating each in turn. This was hidden by the earlier bug where surf_state_offset_for_aux() ignored the parameter. Fixes: 7339660e803 ("iris: Add aux.sampler_usages.") Reviewed-by: Rafael Antognolli <[email protected]>
* iris: Replace devinfo->gen with GEN_GENKenneth Graunke2019-08-221-22/+18
| | | | | | | This is genxml, we can compile out this code. Fixes: 26606672847 ("iris/gen8: Re-emit the SURFACE_STATE if the clear color changed.") Reviewed-by: Rafael Antognolli <[email protected]>
* panfrost: Implement gl_FragCoord correctlyAlyssa Rosenzweig2019-08-224-19/+19
| | | | | | | | Rather than passing through the transformed gl_Position, we can use the hardware-level varying for this, which will correctly handle gl_FragCoord.w Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Remove vertex buffer offset from its sizeAlyssa Rosenzweig2019-08-221-2/+5
| | | | | | | | | | | | | | | | | | | | | The offset is added to the base address, so we need to subtract it from the size to maintain the same end address and thus prevent a buffer overflow: end_address = start_address + size start_address' = start_address + offset size' = size - offset end_address' = start_address' + size' = (start_address + offset) + (size - offset) = (start_address + size) + (offset - offset) = start_address + size = end_address QED. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/decode: Validate MFBD tagsAlyssa Rosenzweig2019-08-221-1/+1
| | | | | | | | | These tags need to match up with what's actually described by the MFBD, so check this. Once this is checked, since the type and contents of the FBD are obvious from printing above, there's no need to explicitly mark off the framebuffer line. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* swr: use LLVM version string instead of re-computing itEric Engestrom2019-08-221-2/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* llvmpipe: use LLVM version string instead of re-computing itEric Engestrom2019-08-221-2/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* iris/android: fix build and link with libmesa_intel_perfTapani Pälli2019-08-222-0/+2
| | | | | | Fixes: 0fd4359733e "iris/perf: implement routines to return counter info" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* panfrost: Fix PIPE_BUFFER spacingAlyssa Rosenzweig2019-08-211-3/+3
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Implement depth range clippingAlyssa Rosenzweig2019-08-211-3/+12
| | | | | | | This should fix glDepthRangef issues. Eventually, something similar should allow implementing the depth bounds test. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Don't bail on PIPE_BUFFERAlyssa Rosenzweig2019-08-211-5/+5
| | | | | | We can handle some of it. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Pass stream_output_info by referenceAlyssa Rosenzweig2019-08-211-7/+7
| | | | | | It's a large structure, apparently. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* panfrost: Guard against NULL rasterizer explicitlyAlyssa Rosenzweig2019-08-211-1/+3
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>