summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* gallium/r600: Replace ALIGN_DIVUP with DIV_ROUND_UPKrzysztof Sobiecki2016-01-063-3/+2
| | | | | | | | ALIGN_DIVUP is a driver specific(r600g) macro that duplicates DIV_ROUND_UP functionality. Replacing it with DIV_ROUND_UP eliminates this problems. Signed-off-by: Krzysztof A. Sobiecki <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* vc4: Fix driver build from last minute rebase fix.Eric Anholt2016-01-061-21/+20
| | | | | | | I had the driver all tested for the last series, and in my last build I noticed that get_swizzled_channel was unused now, and removed it... apparently without testing to find that I removed the wrong channel swizzle function.
* vc4: Optimize out a comparison for bcsel based on an ALU comparisonEric Anholt2016-01-061-14/+59
| | | | | | | | | | | | | | | | | We routinely have code like: vec1 ssa_220 = fge ssa_104, ssa_61 vec1 ssa_199 = bcsel ssa_220, ssa_106, ssa_105 and we would compare fge's args and choose between ~0 and 0 to generate ssa_220, then compare ssa_220 to 0 and choose between bcsel's args. Instead, try to notice the pattern and compare between fge's args to select between bcsel's args. total instructions in shared programs: 88019 -> 87574 (-0.51%) instructions in affected programs: 9985 -> 9540 (-4.46%) total estimated cycles in shared programs: 245752 -> 245237 (-0.21%) estimated cycles in affected programs: 17232 -> 16717 (-2.99%)
* vc4: Add missing sRGB decode to texel fetches.Eric Anholt2016-01-061-0/+5
| | | | | We only see txf on MSAA textures, currently, and apparently this didn't impact any of our piglit tests.
* vc4: Add support for GL_ARB_texture_swizzle.Eric Anholt2016-01-061-1/+1
| | | | | We already had the code supporting it, since it's needed for the depth mode when doing shadow comparisons.
* vc4: Use NIR texture lowering for texture swizzling.Eric Anholt2016-01-062-57/+63
| | | | | | | | | | | | | | We can't use its other features currently (mostly because we don't want Newton-Raphson on rcps for texture coordinates), but it gets us started. This eliminates some comparisons with constants in GLB2.7 and ETQW traces at the QIR level by moving the comparisons into NIR, where they get constant-folded out. instructions in affected programs: 165 -> 156 (-5.45%) total uniforms in shared programs: 32087 -> 32085 (-0.01%) total estimated cycles in shared programs: 245762 -> 245752 (-0.00%) estimated cycles in affected programs: 461 -> 451 (-2.17%)
* vc4: Replace the SSA-style SEL operators with conditional MOVs.Eric Anholt2016-01-066-201/+128
| | | | | | | | | | | | | I'm moving away from QIR being SSA (since NIR is doing lots of SSA optimization for us now) and instead having QIR just be QPU operations with virtual registers. By making our SELs be composed of two MOVs, we could potentially coalesce the registers for the MOV's src and dst and eliminate the MOV. total instructions in shared programs: 88448 -> 88028 (-0.47%) instructions in affected programs: 39845 -> 39425 (-1.05%) total estimated cycles in shared programs: 246306 -> 245762 (-0.22%) estimated cycles in affected programs: 162887 -> 162343 (-0.33%)
* vc4: Don't try the SF coalescing unless it's on a def.Eric Anholt2016-01-061-3/+3
| | | | | | If you want the SF of the value of a register produced from a series of packing MOVs or conditional MOVs, we can't just SF on the last MOV into the register.
* gallium/drivers/svga: Use unsigned for loop indexEdward O'Callaghan2016-01-062-7/+7
| | | | | | | | Fix a 's/unsigned int/unsigned/' consistency case while here. Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/drivers/r600: Use unsigned for loop indexEdward O'Callaghan2016-01-061-9/+9
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/drivers/ilo: Use unsigned for loop indexEdward O'Callaghan2016-01-064-16/+16
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Use unsigned for loop indexEdward O'Callaghan2016-01-061-3/+3
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium/drivers: Remove unnecessary semicolonsEdward O'Callaghan2016-01-0610-11/+11
| | | | | | Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* gallium: Remove unnecessary semicolonsEdward O'Callaghan2016-01-068-8/+9
| | | | | | | | | Fix silly issue with MSVC case fall-though support to need a extra 'break;' Found-by: Coccinelle Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: Optimize lp_rast_triangle_32_3_16 for POWER8Oded Gabbay2016-01-061-1/+141
| | | | | | | | | | | | | | | | | | | | | This patch converts the SSE-optimized lp_rast_triangle_32_3_16() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ openarena 16.35 16.7 2.14% xonotic 4.707 4.97 5.57% glmark2 didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Optimize BUILD_MASK(_LINEAR) for POWER8Oded Gabbay2016-01-061-40/+110
| | | | | | | | | | | | | | | | | | | | | This patch converts the SSE-optimized build_mask_32() and build_mask_linear_32() to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 139.8 142.7 2.07% openarena and xonotic didn't show a significant (more than 1%) difference. v2: Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: Optimize do_triangle_ccw for POWER8Oded Gabbay2016-01-061-0/+100
| | | | | | | | | | | | | | | | | | | | | | | This patch converts the SSE optimization done in do_triangle_ccw to VMX/VSX. I measured the results on POWER8 machine with 32 cores at 3.4GHz and 16GB of RAM. FPS/Score Name Before After Delta ------------------------------------------------ glmark2 (score) 136.6 139.8 2.34% openarena 16.14 16.35 1.30% xonotic 4.655 4.707 1.11% v2: - Convert loads to use aligned loads - Make sure code is build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* llvmpipe: add POWER8 portability file - u_pwr8.hOded Gabbay2016-01-061-0/+310
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file provides a portability layer that will make it easier to convert SSE-based functions to VMX/VSX-based functions. All the functions implemented in this file are prefixed using "vec_". Therefore, when converting from SSE-based function, one needs to simply replace the "_mm_" prefix of the SSE function being called to "vec_". Having said that, not all functions could be converted as such, due to the differences between the architectures. So, when doing such conversion hurt the performance, I preferred to implement a more ad-hoc solution. For example, converting the _mm_shuffle_epi32 needed to be done using ad-hoc masks instead of a generic function. All the functions in this file support both little-endian and big-endian but currently the file is build only on POWER8 LE machine. All of the functions are implemented using the Altivec/VMX intrinsics, except one where I needed to use inline assembly (due to missing intrinsic). v2: - Use vec_vgbbd instead of __builtin_vec_vgbbd - Add an aligned load function - Don't use typeof() - Make file build only on POWER8 LE machine Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: minor indentation fixBrian Paul2016-01-051-1/+1
|
* util: add debug_dump_ubyte_rgba_bmp()Brian Paul2016-01-052-0/+63
| | | | | | Like debug_dump_float_rgba_bmp() but takes ubyte values. Reviewed-by: Charmaine Lee <[email protected]>
* svga: fix test for SVGA_NEW_STIPPLEBrian Paul2016-01-052-4/+8
| | | | | | | | | | | | | We only want to set the SVGA_NEW_STIPPLE dirty flag when the polygon stipple state changes. Before, we only set the flag when we were enabling stipple, but not disabling. We don't really have to add SVGA_NEW_STIPPLE to the dirty FS state set since it's a subset of SVGA_NEW_RAST, but let's be explicit. This doesn't fix any known bugs. Reviewed-by: Charmaine Lee <[email protected]>
* svga: add some comments in svga_state_vs.cBrian Paul2016-01-051-0/+3
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: change svga_hw_view_state::dirty to booleanBrian Paul2016-01-051-1/+1
| | | | | | Since it's a true/false value. Reviewed-by: Charmaine Lee <[email protected]>
* svga: avoid emitting redundant SetVertexBuffers() commandsBrian Paul2016-01-052-5/+26
| | | | Reviewed-by: Charmaine Lee <[email protected]>
* svga: check for no-ops in svga_bind_sampler_states()Brian Paul2016-01-051-1/+15
| | | | | | | and svga_set_sampler_views(). If there's no change, return early and don't set a SVGA_NEW_x dirty state flag. Reviewed-by: Charmaine Lee <[email protected]>
* build: enable st/va with nouveau driverJulien Isorce2016-01-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vainfo fails in vaDriverInit because "dd_create_screen" does not reach strcmp(driver_name, "nouveau") code. Indeed when compiling the va target.c, the macro GALLIUM_NOUVEAU is not defined. This patch define the macro the same it is done for dri and vdpau targets. Tested with: ./autogen.sh --enable-glx --enable-gles2 --enable-egl --enable-vdpau --enable-glx-tls=yes --enable-va --with-gallium-drivers=swrast,nouveau --with-dri-drivers=swrast,nouveau --with-egl-platforms=x11 LIBVA_DRIVER_NAME=gallium vainfo Output: vainfo: Driver version: mesa gallium vaapi vainfo: Supported profile and entrypoints VAProfileMPEG2Simple : VAEntrypointVLD VAProfileMPEG2Main : VAEntrypointVLD VAProfileMPEG4Simple : VAEntrypointVLD VAProfileMPEG4AdvancedSimple : VAEntrypointVLD VAProfileVC1Simple : VAEntrypointVLD VAProfileVC1Main : VAEntrypointVLD VAProfileVC1Advanced : VAEntrypointVLD VAProfileH264Baseline : VAEntrypointVLD VAProfileH264Main : VAEntrypointVLD VAProfileH264High : VAEntrypointVLD VAProfileNone : VAEntrypointVideoProc Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: add support for st/vaJulien Isorce2016-01-053-50/+133
| | | | | | | | | | | - split nvc0_decoder_bsp in begin/next/end - preserve content buffer when calling nvc0_decoder_bsp_next - implement pipe_video_codec::begin_frame/end_frame https://bugs.freedesktop.org/show_bug.cgi?id=89969 Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: split nouveau_vp3_bsp in begin/next/endJulien Isorce2016-01-054-41/+77
| | | | | | | | | | | | | It allows to call nouveau_vp3_bsp_next multiple times between one begin/end. It is required to support st/va. https://bugs.freedesktop.org/show_bug.cgi?id=89969 Signed-off-by: Julien Isorce <[email protected]> [imirkin: create strparm_bsp function, simplified w0 calculation] Signed-off-by: Ilia Mirkin <[email protected]>
* st/va: count number of slicesJulien Isorce2016-01-055-0/+25
| | | | | | | | The counter was not set but used by the nouveau driver. It is required otherwise visual output is garbage. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian Koenig <[email protected]>
* nvc0: scale up inter_bo size so that it's 16M for a 4K videoIlia Mirkin2016-01-041-2/+5
| | | | | | | | Experimentally, 4M causes corruption and slowness, try to ramp it up with size instead. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50,nvc0: fix crash when increasing bsp bo size for h264Ilia Mirkin2016-01-042-4/+4
| | | | | | | | H264 doesn't have a bitplane bo. We just need a device reference, so use the one from the client. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* radeonsi: remove unused parameter from si_shader_binary_read_configMarek Olšák2016-01-033-10/+7
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_shader_binary_upload out of si_shader_binary_readMarek Olšák2016-01-033-11/+10
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: dump LLVM module outside of radeon_llvm_compileMarek Olšák2016-01-034-9/+12
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: always add +DumpCode to the LLVM target machine for LLVM <= 3.5Marek Olšák2016-01-034-6/+5
| | | | | | | It's the same behavior that we use for later LLVM. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* gallium/radeon: r600_can_dump_shader should get TGSI processor type directlyMarek Olšák2016-01-034-15/+10
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: pass TGSI processor type to si_shader_binary_read for dumpingMarek Olšák2016-01-033-4/+5
| | | | | | | the parameter will be used later Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: pass TGSI processor type to si_compile_llvm for dumpingMarek Olšák2016-01-033-5/+5
| | | | | | | the parameter will be used later Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: rename shader parameter definitions and variables for more clarityMarek Olšák2016-01-033-43/+43
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nvc0/ir: add support for PK2H/UP2HIlia Mirkin2016-01-034-2/+28
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_TGSI_PACK_HALF_FLOAT to indicate UP2H/PK2H supportIlia Mirkin2016-01-0316-0/+17
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* tgsi: update PK2H/UP2H channel behavior infoIlia Mirkin2016-01-031-8/+8
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: document PK2H/UP2HIlia Mirkin2016-01-031-2/+14
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* freedreno/ir3: use NIR_PASS helper macrosRob Clark2016-01-031-19/+28
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: we require block_index metadataRob Clark2016-01-031-0/+2
| | | | | | | | Found during NIR_TEST_CLONE=1 piglit run. We were using block->index but forgetting to require it. Causing things to not work with a cloned shader which didn't preserve block_index. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: refactor NIR IR handlingRob Clark2016-01-037-111/+202
| | | | | | | | | Immediately convert into NIR and do an initial key-agnostic lowering/ optimization pass. This should let us share most of the per-variant transformations between each variant, and hopefully minimize the draw- time variant creation part of the compilation process. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: drop unnecessary unreachable() caseRob Clark2016-01-031-2/+0
| | | | | | | It will still hit a compile_assert() in emit_tex, which has the advantage of dumping out the offending shader. Signed-off-by: Rob Clark <[email protected]>
* gallium/tests: fix build with clang compilerSamuel Pitoiset2016-01-031-273/+330
| | | | | | | | | | | | Nested functions are supported as an extension in GNU C, but Clang don't support them. This fixes compilation errors when (manually) building compute.c, or by setting --enable-gallium-tests to the configure script. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75165 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* nv50,nvc0: optimize coherent buffer checking at draw timeSamuel Pitoiset2016-01-036-68/+82
| | | | | | | | | | Instead of iterating over all the buffer resources looking for coherent buffers, we keep track of a context-wide count. This will save some iterations (and CPU cycles) in 99.99% case because usually coherent buffers are not so used. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Fix build from upload changes.Eric Anholt2016-01-021-1/+1
|