mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	glsl: Add ir_demote	Caio Marcelo de Oliveira Filho	2019-09-30	11	-0/+95
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To represent the new `demote` keyword when using EXT_demote_to_helper_invocation extension. Most of the changes are to include it in the visitors. Demote is not considered a control flow, so also include an empty visit member function in ir_control_flow_visitor. Only NIR actually supports `demote`, so assert the translations for TGSI and Mesa's gl_program -- since the demote is not expected to appear for those. Reviewed-by: Kenneth Graunke <[email protected]>
*	mesa: Extension boilerplate for EXT_demote_to_helper_invocation	Caio Marcelo de Oliveira Filho	2019-09-30	4	-0/+5
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	iris: Fix iris_rebind_buffer() for VBOs with non-zero offsets.	Kenneth Graunke	2019-09-30	1	-2/+6
\| \| \| \| \| \| \| \|	We can't just check for the BO base address, we need to check for the full address including any offset we may have applied. When updating the address, we need to include the offset again. Fixes: 5ad0c88dbe3 ("iris: Replace buffer backing storage and rebind to update addresses.")
*	ac/nir: fix GLSL imageSamples()	Marek Olšák	2019-09-30	1	-24/+4
\| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]>
*	ac: add ac_build_image_get_sample_count from radeonsi	Marek Olšák	2019-09-30	3	-17/+28
\| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]>
*	ac/surface: don't allocate FMASK if there is no graphics	Marek Olšák	2019-09-30	1	-2/+3
\| \| \| \|	Reviewed-by: Samuel Pitoiset <[email protected]>
*	tgsi_to_nir: handle PIPE_FORMAT_NONE in image opcodes	Marek Olšák	2019-09-30	1	-0/+3
\| \| \| \| \| \|	radeonsi doesn't use the format and internal shaders don't set it. Reviewed-By: Timur Kristóf <[email protected]>
*	meson: gallium media state trackers require libdrm with x11	Dylan Baker	2019-09-30	4	-8/+14
\| \| \| \| \| \| \| \|	v2: - update copyright year in all changed files - rebase on master Cc: 19.1 19.2 <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
*	iris: Disable CCS_E for 32-bit floating point textures.	Kenneth Graunke	2019-09-30	1	-1/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A while back, Michael Larabel noticed that Paraview's Wavelet Volume case runs significantly slower on iris than i965. It turns out this is because we enable CCS_E for 32-bit floating point formats, while i965 disables it, with an oblique comment saying that we benchmarked it (on what exactly?) and determined that it was a loss. Paraview uses both R32_FLOAT and R32G32B32A32_FLOAT, and I observed large framerate drops when enabling CCS_E for either format. However, several other benchmarks (Aztec Ruins, many Synmark cases) use 16-bit floating point formats, with no apparent ill effects. So, disable compression for 32-bit float formats for now, but leave it enabled for 16-bit float formats as they seem to be working fine. Improves performance in Paraview's Wavelet Volume test by 62% on a Skylake GT4e. Fixes: 3cfc6a207bd ("iris: Fill out res->aux.possible_usages")
*	ac: reorder and print all radeon_info fields	Marek Olšák	2019-09-30	2	-19/+53
\| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac: set the number of SDPs same as the number of TCCs	Marek Olšák	2019-09-30	1	-13/+3
\| \| \| \|	Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac: fix num_good_cu_per_sh for harvested chips	Marek Olšák	2019-09-30	1	-0/+6
\| \| \| \| \|	Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radeonsi/gfx10: fix corruption for chips with harvested TCCs	Marek Olšák	2019-09-30	1	-2/+6
\| \| \| \| \|	Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac: add radeon_info::tcc_harvested	Marek Olšák	2019-09-30	2	-0/+5
\| \| \| \| \|	Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	ac: fix incorrect vram_size reported by the kernel	Marek Olšák	2019-09-30	1	-2/+10
\| \| \| \| \|	Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	radeonsi/gfx10: fix L2 cache rinse programming	Marek Olšák	2019-09-30	1	-5/+17
\| \| \| \| \|	Cc: 19.2 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
*	etnaviv: fix bitmask typo	Eric Engestrom	2019-09-30	1	-1/+1
\| \| \| \| \| \|	Fixes: d92689c46f0d2da05ae6 ("etnaviv: nir: add native integers (HALTI2+)") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jonathan Marek <[email protected]>
*	glx: Log the filename of the drm device if we fail to open it	Adam Jackson	2019-09-30	1	-1/+1
\| \| \| \| \| \| \| \|	Helps point the user to the specific device that's having issues, since you're increasingly likely to have more than one. Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/107 Reviewed-by: Eric Anholt <[email protected]>
*	pan/midgard: Allow scheduling conditions with constants	Alyssa Rosenzweig	2019-09-30	1	-4/+10
\| \| \| \| \| \| \| \|	Now that we have constant adjustment logic abstracted, we can do this safely. Along with the csel inversion patch, this allows many more common csel ops to inline their condition in the bundle. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add csel invert optimization	Alyssa Rosenzweig	2019-09-30	3	-0/+27
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add mir_flip helper	Alyssa Rosenzweig	2019-09-30	3	-10/+21
\| \| \| \| \| \| \|	Useful for various operations on both commutative and anticommutative ops. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Tightly pack 32-bit constants	Alyssa Rosenzweig	2019-09-30	1	-16/+113
\| \| \| \| \| \| \|	If we can reuse constant slots from other instructions, we would like to do so to include more instructions per bundle. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Allow writeout to see into the future	Alyssa Rosenzweig	2019-09-30	1	-1/+40
\| \| \| \| \| \| \| \|	If an instruction could be scheduled to vmul to satisfy the writeout conditions, let's do that and save an instruction+cycle per fragment shader. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Allow 6 instructions per bundle	Alyssa Rosenzweig	2019-09-30	1	-2/+3
\| \| \| \| \| \|	We never had a scheduler good enough to hit this case before! :) Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Only one conditional per bundle allowed	Alyssa Rosenzweig	2019-09-30	1	-0/+16
\| \| \| \| \| \|	There's no r32 to save ya after you use up r31 :) Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Schedule to smul/sadd	Alyssa Rosenzweig	2019-09-30	1	-0/+5
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Extend choose_instruction for scalar units	Alyssa Rosenzweig	2019-09-30	1	-0/+4
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Don't double check SCALAR units	Alyssa Rosenzweig	2019-09-30	1	-4/+0
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Use new scheduler	Alyssa Rosenzweig	2019-09-30	3	-678/+130
\| \| \| \| \| \| \| \|	We still emit in-order but we switch to using the bundles created from the new scheduler, which will allow greater flexibility and room for out-of-order optimization. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add distance metric to choose_instruction	Alyssa Rosenzweig	2019-09-30	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	We require chosen instructions to be "close", to avoid ballooning register pressure. This is a kludge that will go away once we have proper liveness tracking in the scheduler, but for now it prevents a lot of needless spilling. v2: Lower threshold to 6 (from 8). Schedule is hurt, but a few shaders that spilled excessively are fixed. Signed-off-by: Alyssa Rosenzweig <[email protected]> Derp
*	pan/midgard: Add mir_choose_alu helper	Alyssa Rosenzweig	2019-09-30	1	-0/+24
\| \| \| \| \| \|	Based on a given unit. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Implement load/store pairing	Alyssa Rosenzweig	2019-09-30	1	-55/+12
\| \| \| \| \| \| \|	We can bundle two load/store together. This eliminates the need for explicit load/store pairing in a prepass, as well. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Extend csel_swizzle to branches	Alyssa Rosenzweig	2019-09-30	3	-5/+10
\| \| \| \| \| \| \| \|	Conditions for branches don't have a swizzle explicitly in the emitted binary, but they do implicitly get swizzled in whatever instruction wrote r31, so we need to handle that. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add helpers for scheduling conditionals	Alyssa Rosenzweig	2019-09-30	1	-0/+146
\| \| \| \| \| \| \| \| \| \| \| \| \|	Conditional instructions (csel and conditional branches) require their condition to be written to a special condition pipeline register (r31.w for scalar, r31.xyzw for vector). However, pipeline registers are live only for the duration of a single bundle. As such, the logic to schedule conditionals correct is surprisingly complex. Essentially, we see if we could stuff the conditional within the same bundle as the csel/branch without breaking anything; if we can, we do that. If we can't, we add a dummy move to make room. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Implement predicate->unit	Alyssa Rosenzweig	2019-09-30	1	-0/+9
\| \| \| \| \| \|	This allows ALUs to select for each unit of the bundle separately. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add predicate->exclude	Alyssa Rosenzweig	2019-09-30	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A bit of a kludge but allows setting an implicit dependency of synthetic conditional moves on the actual condition, fixing code generated like: vmul.feq r0, .. sadd.imov r31, .., r0 vadd.fcsel [...] The imov runs simultaneous with feq so it gets garbage results, but it's too late to add an actual dependency practically speaking, since the new synthetic imov doesn't have a node associated. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add constant intersection filters	Alyssa Rosenzweig	2019-09-30	1	-0/+55
\| \| \| \| \| \| \| \| \| \|	In the future, we will want to keep track of which components of constants of various sizes correspond to which parts of the bundle constants, like in the old scheduler. For now, let's just stub it out for a simple rule of one instruction with embedded constants per bundle. We can eventually do better, of course. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Remove csel constant unit force	Alyssa Rosenzweig	2019-09-30	1	-3/+0
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add mir_schedule_texture/ldst/alu helpers	Alyssa Rosenzweig	2019-09-30	1	-0/+190
\| \| \| \| \| \| \|	We don't actually do any scheduling here yet, but add per-tag helpers to consume an instruction, print it, pop it off the worklist. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add mir_choose_bundle helper	Alyssa Rosenzweig	2019-09-30	1	-0/+25
\| \| \| \| \| \| \| \| \|	It's not always obvious what the optimal bundle type should be. Let's break out the logic to decide. Currently set for purely in-order operation. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add mir_update_worklist helper	Alyssa Rosenzweig	2019-09-30	1	-0/+39
\| \| \| \| \| \| \| \|	After we've chosen an instruction, popped it off, and processed it, it's time to update the worklist, removing that instruction from the dependency graph to allow its dependents to be put onto the worklist. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add mir_choose_instruction stub	Alyssa Rosenzweig	2019-09-30	1	-0/+55
\| \| \| \| \| \| \| \| \| \| \| \| \|	In the future, this routine will implement the core scheduling logic to decide which instruction out of the worklist will be scheduled next, in a way that minimizes cycle count and register pressure. In the present, we are more interested in replicating in-order scheduling with the much-more-powerful out-of-order model. So rather than discriminating by a register pressure estimate, we simply choose the latest possible instruction in the worklist. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Initialize worklist	Alyssa Rosenzweig	2019-09-30	1	-0/+17
\| \| \| \| \| \|	This flows naturally from the dependency graph Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Calculate dependency graph	Alyssa Rosenzweig	2019-09-30	2	-0/+131
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add flatten_mir helper	Alyssa Rosenzweig	2019-09-30	1	-0/+22
\| \| \| \| \| \| \|	We would like to flatten a linked list of midgard_instructions into an array of midgard_instruction pointers on the heap. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Squeeze indices before scheduling	Alyssa Rosenzweig	2019-09-30	1	-0/+1
\| \| \| \| \| \|	This allows node_count to be correct while scheduling. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Fix component count handling for ldst	Alyssa Rosenzweig	2019-09-30	2	-37/+37
\| \| \| \| \| \| \|	It's not based on the writemask and it can't be inferred; it's just intrinsic to the op itself. Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	pan/midgard: Add missing parans in SWIZZLE definition	Alyssa Rosenzweig	2019-09-30	1	-1/+1
\| \| \| \|	Signed-off-by: Alyssa Rosenzweig <[email protected]>
*	nouveau: set lower_sub = true	Daniel Schürmann	2019-09-30	3	-6/+2
\| \| \| \| \| \|	Subtractions are already implemented as additions anyway. Reviewed-by: Connor Abbott <[email protected]>
*	v3d: Enable the late algebraic optimizations to get real subs.	Eric Anholt	2019-09-30	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This worked better than my original v3d-local pass for just subs, and is a huge win over not producing subs. total instructions in shared programs: 6408469 -> 6167932 (-3.75%) total threads in shared programs: 153784 -> 154104 (0.21%) total uniforms in shared programs: 2157078 -> 1905823 (-11.65%) total max-temps in shared programs: 904546 -> 895796 (-0.97%) total spills in shared programs: 4959 -> 4993 (0.69%) total fills in shared programs: 6558 -> 6670 (1.71%) total sfu-stalls in shared programs: 25845 -> 25175 (-2.59%) total inst-and-stalls in shared programs: 6434314 -> 6193107 (-3.75%) Reviewed-by: Daniel Schürmann <[email protected]> Reviewed-by: Connor Abbott <[email protected]>