aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* glsl: Add ir_demoteCaio Marcelo de Oliveira Filho2019-09-3011-0/+95
| | | | | | | | | | | | | | | To represent the new `demote` keyword when using EXT_demote_to_helper_invocation extension. Most of the changes are to include it in the visitors. Demote is not considered a control flow, so also include an empty visit member function in ir_control_flow_visitor. Only NIR actually supports `demote`, so assert the translations for TGSI and Mesa's gl_program -- since the demote is not expected to appear for those. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Extension boilerplate for EXT_demote_to_helper_invocationCaio Marcelo de Oliveira Filho2019-09-304-0/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Fix iris_rebind_buffer() for VBOs with non-zero offsets.Kenneth Graunke2019-09-301-2/+6
| | | | | | | | We can't just check for the BO base address, we need to check for the full address including any offset we may have applied. When updating the address, we need to include the offset again. Fixes: 5ad0c88dbe3 ("iris: Replace buffer backing storage and rebind to update addresses.")
* ac/nir: fix GLSL imageSamples()Marek Olšák2019-09-301-24/+4
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: add ac_build_image_get_sample_count from radeonsiMarek Olšák2019-09-303-17/+28
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/surface: don't allocate FMASK if there is no graphicsMarek Olšák2019-09-301-2/+3
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* tgsi_to_nir: handle PIPE_FORMAT_NONE in image opcodesMarek Olšák2019-09-301-0/+3
| | | | | | radeonsi doesn't use the format and internal shaders don't set it. Reviewed-By: Timur Kristóf <[email protected]>
* meson: gallium media state trackers require libdrm with x11Dylan Baker2019-09-304-8/+14
| | | | | | | | v2: - update copyright year in all changed files - rebase on master Cc: 19.1 19.2 <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* iris: Disable CCS_E for 32-bit floating point textures.Kenneth Graunke2019-09-301-1/+23
| | | | | | | | | | | | | | | | | | | | | A while back, Michael Larabel noticed that Paraview's Wavelet Volume case runs significantly slower on iris than i965. It turns out this is because we enable CCS_E for 32-bit floating point formats, while i965 disables it, with an oblique comment saying that we benchmarked it (on what exactly?) and determined that it was a loss. Paraview uses both R32_FLOAT and R32G32B32A32_FLOAT, and I observed large framerate drops when enabling CCS_E for either format. However, several other benchmarks (Aztec Ruins, many Synmark cases) use 16-bit floating point formats, with no apparent ill effects. So, disable compression for 32-bit float formats for now, but leave it enabled for 16-bit float formats as they seem to be working fine. Improves performance in Paraview's Wavelet Volume test by 62% on a Skylake GT4e. Fixes: 3cfc6a207bd ("iris: Fill out res->aux.possible_usages")
* ac: reorder and print all radeon_info fieldsMarek Olšák2019-09-302-19/+53
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: set the number of SDPs same as the number of TCCsMarek Olšák2019-09-301-13/+3
| | | | Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: fix num_good_cu_per_sh for harvested chipsMarek Olšák2019-09-301-0/+6
| | | | | Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix corruption for chips with harvested TCCsMarek Olšák2019-09-301-2/+6
| | | | | Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: add radeon_info::tcc_harvestedMarek Olšák2019-09-302-0/+5
| | | | | Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac: fix incorrect vram_size reported by the kernelMarek Olšák2019-09-301-2/+10
| | | | | Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi/gfx10: fix L2 cache rinse programmingMarek Olšák2019-09-301-5/+17
| | | | | Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* etnaviv: fix bitmask typoEric Engestrom2019-09-301-1/+1
| | | | | | Fixes: d92689c46f0d2da05ae6 ("etnaviv: nir: add native integers (HALTI2+)") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jonathan Marek <[email protected]>
* glx: Log the filename of the drm device if we fail to open itAdam Jackson2019-09-301-1/+1
| | | | | | | | Helps point the user to the specific device that's having issues, since you're increasingly likely to have more than one. Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/107 Reviewed-by: Eric Anholt <[email protected]>
* pan/midgard: Allow scheduling conditions with constantsAlyssa Rosenzweig2019-09-301-4/+10
| | | | | | | | Now that we have constant adjustment logic abstracted, we can do this safely. Along with the csel inversion patch, this allows many more common csel ops to inline their condition in the bundle. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add csel invert optimizationAlyssa Rosenzweig2019-09-303-0/+27
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add mir_flip helperAlyssa Rosenzweig2019-09-303-10/+21
| | | | | | | Useful for various operations on both commutative and anticommutative ops. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Tightly pack 32-bit constantsAlyssa Rosenzweig2019-09-301-16/+113
| | | | | | | If we can reuse constant slots from other instructions, we would like to do so to include more instructions per bundle. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Allow writeout to see into the futureAlyssa Rosenzweig2019-09-301-1/+40
| | | | | | | | If an instruction could be scheduled to vmul to satisfy the writeout conditions, let's do that and save an instruction+cycle per fragment shader. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Allow 6 instructions per bundleAlyssa Rosenzweig2019-09-301-2/+3
| | | | | | We never had a scheduler good enough to hit this case before! :) Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Only one conditional per bundle allowedAlyssa Rosenzweig2019-09-301-0/+16
| | | | | | There's no r32 to save ya after you use up r31 :) Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Schedule to smul/saddAlyssa Rosenzweig2019-09-301-0/+5
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Extend choose_instruction for scalar unitsAlyssa Rosenzweig2019-09-301-0/+4
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Don't double check SCALAR unitsAlyssa Rosenzweig2019-09-301-4/+0
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Use new schedulerAlyssa Rosenzweig2019-09-303-678/+130
| | | | | | | | We still emit in-order but we switch to using the bundles created from the new scheduler, which will allow greater flexibility and room for out-of-order optimization. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add distance metric to choose_instructionAlyssa Rosenzweig2019-09-301-0/+14
| | | | | | | | | | | | | | We require chosen instructions to be "close", to avoid ballooning register pressure. This is a kludge that will go away once we have proper liveness tracking in the scheduler, but for now it prevents a lot of needless spilling. v2: Lower threshold to 6 (from 8). Schedule is hurt, but a few shaders that spilled excessively are fixed. Signed-off-by: Alyssa Rosenzweig <[email protected]> Derp
* pan/midgard: Add mir_choose_alu helperAlyssa Rosenzweig2019-09-301-0/+24
| | | | | | Based on a given unit. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Implement load/store pairingAlyssa Rosenzweig2019-09-301-55/+12
| | | | | | | We can bundle two load/store together. This eliminates the need for explicit load/store pairing in a prepass, as well. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Extend csel_swizzle to branchesAlyssa Rosenzweig2019-09-303-5/+10
| | | | | | | | Conditions for branches don't have a swizzle explicitly in the emitted binary, but they do implicitly get swizzled in whatever instruction wrote r31, so we need to handle that. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add helpers for scheduling conditionalsAlyssa Rosenzweig2019-09-301-0/+146
| | | | | | | | | | | | | Conditional instructions (csel and conditional branches) require their condition to be written to a special condition pipeline register (r31.w for scalar, r31.xyzw for vector). However, pipeline registers are live only for the duration of a single bundle. As such, the logic to schedule conditionals correct is surprisingly complex. Essentially, we see if we could stuff the conditional within the same bundle as the csel/branch without breaking anything; if we can, we do that. If we can't, we add a dummy move to make room. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Implement predicate->unitAlyssa Rosenzweig2019-09-301-0/+9
| | | | | | This allows ALUs to select for each unit of the bundle separately. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add predicate->excludeAlyssa Rosenzweig2019-09-301-4/+14
| | | | | | | | | | | | | | | A bit of a kludge but allows setting an implicit dependency of synthetic conditional moves on the actual condition, fixing code generated like: vmul.feq r0, .. sadd.imov r31, .., r0 vadd.fcsel [...] The imov runs simultaneous with feq so it gets garbage results, but it's too late to add an actual dependency practically speaking, since the new synthetic imov doesn't have a node associated. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add constant intersection filtersAlyssa Rosenzweig2019-09-301-0/+55
| | | | | | | | | | In the future, we will want to keep track of which components of constants of various sizes correspond to which parts of the bundle constants, like in the old scheduler. For now, let's just stub it out for a simple rule of one instruction with embedded constants per bundle. We can eventually do better, of course. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Remove csel constant unit forceAlyssa Rosenzweig2019-09-301-3/+0
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add mir_schedule_texture/ldst/alu helpersAlyssa Rosenzweig2019-09-301-0/+190
| | | | | | | We don't actually do any scheduling here yet, but add per-tag helpers to consume an instruction, print it, pop it off the worklist. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add mir_choose_bundle helperAlyssa Rosenzweig2019-09-301-0/+25
| | | | | | | | | It's not always obvious what the optimal bundle type should be. Let's break out the logic to decide. Currently set for purely in-order operation. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add mir_update_worklist helperAlyssa Rosenzweig2019-09-301-0/+39
| | | | | | | | After we've chosen an instruction, popped it off, and processed it, it's time to update the worklist, removing that instruction from the dependency graph to allow its dependents to be put onto the worklist. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add mir_choose_instruction stubAlyssa Rosenzweig2019-09-301-0/+55
| | | | | | | | | | | | | In the future, this routine will implement the core scheduling logic to decide which instruction out of the worklist will be scheduled next, in a way that minimizes cycle count and register pressure. In the present, we are more interested in replicating in-order scheduling with the much-more-powerful out-of-order model. So rather than discriminating by a register pressure estimate, we simply choose the latest possible instruction in the worklist. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Initialize worklistAlyssa Rosenzweig2019-09-301-0/+17
| | | | | | This flows naturally from the dependency graph Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Calculate dependency graphAlyssa Rosenzweig2019-09-302-0/+131
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add flatten_mir helperAlyssa Rosenzweig2019-09-301-0/+22
| | | | | | | We would like to flatten a linked list of midgard_instructions into an array of midgard_instruction pointers on the heap. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Squeeze indices before schedulingAlyssa Rosenzweig2019-09-301-0/+1
| | | | | | This allows node_count to be correct while scheduling. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Fix component count handling for ldstAlyssa Rosenzweig2019-09-302-37/+37
| | | | | | | It's not based on the writemask and it can't be inferred; it's just intrinsic to the op itself. Signed-off-by: Alyssa Rosenzweig <[email protected]>
* pan/midgard: Add missing parans in SWIZZLE definitionAlyssa Rosenzweig2019-09-301-1/+1
| | | | Signed-off-by: Alyssa Rosenzweig <[email protected]>
* nouveau: set lower_sub = trueDaniel Schürmann2019-09-303-6/+2
| | | | | | Subtractions are already implemented as additions anyway. Reviewed-by: Connor Abbott <[email protected]>
* v3d: Enable the late algebraic optimizations to get real subs.Eric Anholt2019-09-301-0/+16
| | | | | | | | | | | | | | | | | This worked better than my original v3d-local pass for just subs, and is a huge win over not producing subs. total instructions in shared programs: 6408469 -> 6167932 (-3.75%) total threads in shared programs: 153784 -> 154104 (0.21%) total uniforms in shared programs: 2157078 -> 1905823 (-11.65%) total max-temps in shared programs: 904546 -> 895796 (-0.97%) total spills in shared programs: 4959 -> 4993 (0.69%) total fills in shared programs: 6558 -> 6670 (1.71%) total sfu-stalls in shared programs: 25845 -> 25175 (-2.59%) total inst-and-stalls in shared programs: 6434314 -> 6193107 (-3.75%) Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Connor Abbott <[email protected]>