summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* gallium/radeon: assign the highest priority to scratch; make rings secondMarek Olšák2016-08-172-4/+6
| | | | | | | just FYI, the kernel receives priority/4 Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/winsys: re-number winsys priority flagsMarek Olšák2016-08-171-16/+13
| | | | | | | free 60..63, move CP_DMA up Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: mark shader rings as highest-priority buffersMarek Olšák2016-08-175-7/+7
| | | | | | | and rename the enum Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: set SHADER_RW_BUFFER priority for streamout buffersMarek Olšák2016-08-172-4/+6
| | | | | Acked-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use current context for DCC feedback-loop decompress, fixes ElementalMarek Olšák2016-08-174-16/+38
| | | | | | | | | | This is just a workaround. The problem is described in the code. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96541 v2: say that it's only between the current context and aux_context Reviewed-by: Nicolai Hähnle <[email protected]> (v1)
* radeonsi: simplify CB_TARGET_MASK logicMarek Olšák2016-08-171-14/+7
| | | | | | we can now rely on CB_COLORn_INFO to disable empty slots. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set CB_COLOR1_INFO for dual src blendingMarek Olšák2016-08-171-7/+0
| | | | | | | | | Vulkan doesn't do this. The reason may be that CB_COLOR1_INFO.SOURCE_FORMAT from NI was moved to SPI_SHADER_COL_FORMAT for SI. I asked CB guys about this 2 days ago and they still haven't replied. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: eliminate PS OUT[1] if dual src blending is off and CB1 is not boundMarek Olšák2016-08-172-11/+7
| | | | | | All VP DX9 ports benefit from this. Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: use unflushed fences for PIPE_QUERY_GPU_FINISHEDMarek Olšák2016-08-171-2/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: use lp_build_alloca_undefNicolai Hähnle2016-08-171-13/+4
| | | | | | | | Avoid building all those store 0 / store undef instruction pairs that end up getting removed anyway. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: add lp_build_alloca_undefNicolai Hähnle2016-08-172-0/+24
| | | | | Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallivm: add create_builder_at_entry helper functionNicolai Hähnle2016-08-171-23/+22
| | | | | | | Reduces code duplication. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: protect against out of bounds temporary array accessesNicolai Hähnle2016-08-171-0/+15
| | | | | | | They can lead to VM faults and worse, which goes against the GL robustness promises. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: add radeon_llvm_bound_index for bounds checkingNicolai Hähnle2016-08-173-18/+34
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: reduce alloca of temporaries based on usagemaskNicolai Hähnle2016-08-172-10/+54
| | | | | | v2: take actual writemasks into account Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: use tgsi_scan_arrays for temp arraysNicolai Hähnle2016-08-173-5/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: allocate temps array info in radeon_llvm_context_initNicolai Hähnle2016-08-173-36/+47
| | | | | | | | | Also, prepare for using tgsi_array_info. This also opens the door for properly handling allocation failures, but I'm leaving that for a separate change. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: always do the full store in store_value_to_arrayNicolai Hähnle2016-08-171-49/+28
| | | | | | | | | Doing the write-back of the temporary vector in radeon_llvm_emit_store makes no sense. This also allows us to get rid of get_alloca_for_array. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: extract common getelementptr logic into get_pointer_into_arrayNicolai Hähnle2016-08-171-39/+66
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: pass indirect register info into get_alloca_for_arrayNicolai Hähnle2016-08-171-5/+6
| | | | | | To have the same signature as get_array_range. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: extract common lookup code into get_temp_array functionNicolai Hähnle2016-08-171-33/+40
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: clarify the comment on the array alloca heuristicNicolai Hähnle2016-08-171-10/+19
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: more descriptive names for LLVM temporaries in debug buildsNicolai Hähnle2016-08-171-2/+12
| | | | | Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: simplify radeon_llvm_emit_store for direct array addressingNicolai Hähnle2016-08-171-7/+0
| | | | | | | We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: simplify radeon_llvm_emit_fetch for direct array addressingNicolai Hähnle2016-08-171-5/+0
| | | | | | | We can use the pointer stored in the temps array directly. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: clean up emit_declaration for temporariesNicolai Hähnle2016-08-171-9/+18
| | | | | | | | In the alloca'd array case, no longer create redundant and unused allocas for the individual elements; create getelementptrs instead. Reviewed-by: Tom Stellard <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* st_glsl_to_tgsi: use calloc the way it's meant to be usedNicolai Hähnle2016-08-171-1/+1
| | | | Reviewed-by: Marek Olšák <[email protected]>
* tgsi/scan: add tgsi_scan_arraysNicolai Hähnle2016-08-172-0/+93
| | | | Reviewed-by: Marek Olšák <[email protected]>
* glsl: Add missing ir_quadop_vector constant evaluation for Boolean typesIan Romanick2016-08-171-0/+3
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Fix typo in ir_unop_f2u implementationIan Romanick2016-08-171-1/+1
| | | | | | | This won't affect the output, but it was, technically, wrong. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Fix typo in ir_unop_b2i implementationIan Romanick2016-08-171-1/+1
| | | | | | | This won't affect the output, but it was, technically, wrong. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't support integer types for operations that can't handle themIan Romanick2016-08-172-14/+2
| | | | | | | | ir_unop_fract already forbade integer types in ir_validate. ir_unop_rcp, ir_unop_rsq, and ir_unop_sqrt should also forbid them in ir_validate. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: Don't support ir_unop_abs or ir_unop_sign for unsigned integersIan Romanick2016-08-172-6/+9
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir/algebraic: Optimize common array indexing sequenceIan Romanick2016-08-171-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some shaders include code that looks like: uniform int i; uniform vec4 bones[...]; foo(bones[i * 3], bones[i * 3 + 1], bones[i * 3 + 2]); CSE would do some work on this: x = i * 3 foo(bones[x], bones[x + 1], bones[x + 2]); The compiler may then add '<< 4 + base' to the index calculations. This results in expressions like x = i * 3 foo(bones[x << 4], bones[(x + 1) << 4], bones[(x + 2) << 4]); Just rearranging the math to produce (i * 48) + 16 saves an instruction, and it allows CSE to do more work. x = i * 48; foo(bones[x], bones[x + 16], bones[x + 32]); So, ~6 instructions becomes ~3. Some individual shader-db results look pretty bad. However, I have a really, really hard time believing the change in estimated cycles in, for example, 3dmmes-taiji/51.shader_test after looking that change in the generated code. G45 total instructions in shared programs: 4020840 -> 4010070 (-0.27%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 98829000 -> 98784990 (-0.04%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Ironlake total instructions in shared programs: 6418887 -> 6408117 (-0.17%) instructions in affected programs: 177460 -> 166690 (-6.07%) helped: 894 HURT: 0 total cycles in shared programs: 143504542 -> 143460532 (-0.03%) cycles in affected programs: 3936648 -> 3892638 (-1.12%) helped: 894 HURT: 0 Sandy Bridge total instructions in shared programs: 8357887 -> 8339251 (-0.22%) instructions in affected programs: 432715 -> 414079 (-4.31%) helped: 2795 HURT: 0 total cycles in shared programs: 118284184 -> 118207412 (-0.06%) cycles in affected programs: 6114626 -> 6037854 (-1.26%) helped: 2478 HURT: 317 Ivy Bridge total instructions in shared programs: 7669390 -> 7653822 (-0.20%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68381982 -> 68263684 (-0.17%) cycles in affected programs: 1972658 -> 1854360 (-6.00%) helped: 2458 HURT: 307 Haswell total instructions in shared programs: 7082636 -> 7067068 (-0.22%) instructions in affected programs: 388234 -> 372666 (-4.01%) helped: 2795 HURT: 0 total cycles in shared programs: 68282020 -> 68164158 (-0.17%) cycles in affected programs: 1891820 -> 1773958 (-6.23%) helped: 2459 HURT: 261 Broadwell total instructions in shared programs: 9002466 -> 8985875 (-0.18%) instructions in affected programs: 658784 -> 642193 (-2.52%) helped: 2795 HURT: 5 total cycles in shared programs: 78503092 -> 78450404 (-0.07%) cycles in affected programs: 2873304 -> 2820616 (-1.83%) helped: 2275 HURT: 415 Skylake total instructions in shared programs: 9156978 -> 9140387 (-0.18%) instructions in affected programs: 682625 -> 666034 (-2.43%) helped: 2795 HURT: 5 total cycles in shared programs: 75591392 -> 75550574 (-0.05%) cycles in affected programs: 3192120 -> 3151302 (-1.28%) helped: 2271 HURT: 425 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Thomas Helland <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glx: Don't use current context in __glXSendErrorMichel Dänzer2016-08-171-3/+1
| | | | | | | | | | | | | | There's no guarantee that there is one, and we don't need one anyway. Fixes piglit tests: glx@glx-fbconfig-bad glx@glx_ext_import_context@import context, multi process glx@glx_ext_import_context@import context, single process Fixes: 2e3f067458e4 ("glx: fix error code when there is no context bound") Cc: "11.2" <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nv50/ir: fix bb positions after exit instructionsIlia Mirkin2016-08-161-3/+10
| | | | | | | | | | | It's fairly rare that the BB layout puts BBs after the exit block, which is likely the reason these issues lingered for so long. This fixes a fraction of issues with the giant pixmark piano shader. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Cc: <[email protected]>
* nv50/ir: properly clear upper bits of a bitset fillIlia Mirkin2016-08-161-2/+2
| | | | | | | Found by inspection. In practice, val is always == 0, so this never got triggered. Signed-off-by: Ilia Mirkin <[email protected]>
* i965/fs: Estimate maximum sampler message execution size more accurately.Francisco Jerez2016-08-161-37/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current logic used to determine the execution size of sampler messages was based on special-casing several argument and opcode combinations, which unsurprisingly missed the possibility that some messages could exceed the payload size limit or not depending on the number of coordinate components present. In particular: - The TXL, TXB and TEX messages (the latter on non-FS stages only) would attempt to use SIMD16 on Gen7+ hardware even if a shadow reference was present and the texture was a cubemap array, causing it to overflow the maximum supported sampler payload size and crash. - The TG4_OFFSET message with shadow comparison was falling back to SIMD8 regardless of the number of coordinate components, which is unnecessary when two coordinates or less are present. Both cases have been handled incorrectly ever since cubemap arrays and texture gather were respectively enabled (the current logic used by the SIMD lowering pass is almost unchanged from the previous no16 fall-back logic used pre-SIMD lowering times). Fixes the following GL4.5 conformance test on Gen7-8 (the bug also affects Gen9+ in principle, but SKL passes the test by luck because it manages to use the TXL_LZ message instead of TXL): GL45-CTS.texture_cube_map_array.sampling Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97267 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Return zero from fs_inst::components_read for non-present sources.Francisco Jerez2016-08-161-2/+5
| | | | | | | | | This makes it easier for the caller to find out how many scalar components are actually read by the instruction. As a bonus we no longer need to special-case BAD_FILE in the implementation of fs_inst::regs_read. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Lower TEX to TXL during NIR translation.Francisco Jerez2016-08-162-14/+6
| | | | | | | | This simplifies the code slightly and will allow the SIMD lowering pass to find out easily what the actual texturing opcode is in order to determine the maximum execution size of texturing instructions. Reviewed-by: Kenneth Graunke <[email protected]>
* freedreno/a3xx: fix generic clear pathRob Clark2016-08-161-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* st/mesa: use pipe var instead of st->pipe in st_create_context_priv()Brian Paul2016-08-161-4/+4
| | | | | | As is done in most other places in the function. Reviewed-by: Marek Olšák <[email protected]>
* gallium: remove unused u_clear.h fileBrian Paul2016-08-162-65/+0
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/i915: inline the util_clear() code into i915_clear_blitter()Brian Paul2016-08-161-3/+21
| | | | | | This is the only place the util_clear() function was used. Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: minor reformatting in u_box.hBrian Paul2016-08-161-29/+13
| | | | Reviewed-by: Marek Olšák <[email protected]>
* svga: remove unused var in svga_mark_surfaces_dirty()Brian Paul2016-08-161-1/+0
| | | | Signed-off-by: Brian Paul <[email protected]>
* svga: avoid a calloc in svga_buffer_transfer_map()Brian Paul2016-08-161-1/+3
| | | | | | Just initialize the two other pipe_transfer fields explicitly. Reviewed-by: Charmaine Lee <[email protected]>
* svga: don't call os_get_time() when not needed by Gallium HUDBrian Paul2016-08-165-11/+26
| | | | | | | The calls to os_get_time() were showing up higher than expected in profiles. Reviewed-by: Charmaine Lee <[email protected]>
* svga: remove unneeded memset() call in draw_vgpu10()Brian Paul2016-08-161-2/+1
| | | | | | | | | All three fields of the vbuffer_attrs[] array are assigned in the following loop. The remaining elements of the array are not used. Tested with full Piglit run, Heaven 4.0, etc. Reviewed-by: Charmaine Lee <[email protected]>
* svga: reduce looping in svga_mark_surfaces_dirty()Brian Paul2016-08-161-1/+1
| | | | | | | | | We don't need to loop over the max number of color buffers, just the current number (which is usually one). Tested with full Piglit run, Heaven 4.0, etc. Reviewed-by: Charmaine Lee <[email protected]>