summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* glsl: fix warning in release buildGrazvydas Ignotas2016-04-251-1/+1
| | | | | | | | | Mark variable MAYBE_UNUSED to avoid unused-but-set-variable warning in release build. Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* util: add MAYBE_UNUSED for config dependent variablesGrazvydas Ignotas2016-04-251-0/+2
| | | | | | | | | | | | | | | | | This is mostly for variables that are only used in asserts and cause unused-but-set-variable warnings in release builds. Could just use UNUSED directly, but MAYBE_UNUSED should be less confusing and is similar to what the Linux kernel has. And yes __attribute__((unused)) can be used on variables on both GCC 4.2 (oldest supported by mesa) and clang 3.0 (just some random old version, not sure what's the minimum for mesa). Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nouveau: codegen: combineLd/St do not combine indirect loadsHans de Goede2016-04-251-0/+7
| | | | | | | | | | | | | | | | | | | | | combineLd/St would combine, i.e. : st u32 # g[$r2+0x0] $r2 st u32 # g[$r2+0x4] $r3 into: st u64 # g[$r2+0x0] $r2d But this is only valid if r2 contains an 8 byte aligned address, which is not guaranteed for compute shaders This commit checks for src0 dim 0 not being indirect when combining loads / stores as combining indirect loads / stores may break alignment rules. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: relax restriction in groupingRob Clark2016-04-241-3/+5
| | | | | | | | | | Currently we were two restrictive, and would insert an output move in cases like: MOV OUT[0], IN[0].xyzw Loosen the restriction to allow the current instruction to appear in the neighbor list but only at it's current possition. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix small memory leakRob Clark2016-04-241-0/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix small RA bugRob Clark2016-04-241-1/+2
| | | | | | | Normally the offset in the group would be the same, but not always. For example, in a sam(w) which only writes the 4th component. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: better workaround for astc+srgbRob Clark2016-04-2410-22/+195
| | | | | | | | | | | | | | | This *seems* like a hw bug, and maybe only applies to certain a4xx variants/revisions. But setting the SRGB bit in sampler view state (texconst0) causes invalid alpha for ASTC textures. Work around this setting up a second texture state and using that to sample alpha separately. This way, srgb->linear conversion happens in hw *prior* to interpolation. This fixes 546 dEQP tests: dEQP-GLES3.functional.texture.*astc*srgb* Signed-off-by: Rob Clark <[email protected]>
* Revert "freedreno/a4xx: lower srgb in shader for astc textures"Rob Clark2016-04-247-62/+6
| | | | | | Better workaround in the following patch. This reverts commit 899bd63acefd49a668e11c42d2ad92fa55aa157d.
* freedreno/a4xx: blend state no longer depends on fb stateRob Clark2016-04-241-4/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* Revert "st/dri: add 32-bit RGBX/RGBA formats"Marek Olšák2016-04-242-10/+0
| | | | | | | | | This reverts commit ccdcf91104a5f07127b5b8d8570b5c4bbcf86647. It breaks most KDE apps, because DRI doesn't support the RGBA component ordering. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95071
* genxml: use PYTHON3Jonathan Gray2016-04-232-1/+3
| | | | | | | | | | | Allows the build to work when the python3 binary is not "python3". v2: remove x bit from the script at Emil's suggestion Signed-off-by: Jonathan Gray <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/tex_image: Flush certain subnormal ASTC channel valuesNanley Chery2016-04-231-0/+87
| | | | | | | | | | | | | | | When uploading a linear, void-extent, ASTC LDR block on Skylake, we are required to flush to zero the UNORM16 channel values that would be denormalized. This is specifically required for the values: 1, 2, and 3. Fixes the 14 failing tests in: dEQP-GLES3.functional.texture.compressed.astc.void_extent_ldr.* v2: Split out flushing function (Kristian Høgsberg) v3: Map with READ instead of INVALIDATE (Kenneth Graunke) Signed-off-by: Nanley Chery <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* configure.ac: search for and set PYTHON3Jonathan Gray2016-04-231-0/+2
| | | | | | | | | | src/intel/genxml/gen_pack_header.py requires python3. v2: check for python3.5 as well Signed-off-by: Jonathan Gray <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Enable for buffer resolvesTopi Pohjolainen2016-04-231-1/+1
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94181 Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Enable for normal color clearsTopi Pohjolainen2016-04-231-0/+9
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Fix clear code for ignoring colormask for XRGB formats on Gen9+Topi Pohjolainen2016-04-231-7/+26
| | | | | | | | | | | | | This is equivalent of 73b01e2711ff45a1f313d5372d6c8fa4fe55d4d2 for blorp. v2 (Ken): No need to call _mesa_format_has_color_component() now that the number of components is gotten from _mesa_base_format_component_count(). Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/formats: Take luminance into account in component countTopi Pohjolainen2016-04-231-0/+1
| | | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/blorp: Do not trigger re-emission of base state addressTopi Pohjolainen2016-04-232-2/+0
| | | | | | | | In case blorp needs to configure it will be just as if render or compute pipeline had configured it. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Reconfigure base state address only if neededTopi Pohjolainen2016-04-233-3/+7
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Use BRW_NEW_BLORP instead of trashing all state bitsTopi Pohjolainen2016-04-232-5/+2
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-2362-46/+179
| | | | Reviewed-by: Topi Pohjolainen <[email protected]
* i965: Introduce state flag for blorpTopi Pohjolainen2016-04-232-0/+3
| | | | | | | | | | | | | | | | | | | | | | | In the past, BLORP has clobbered all BRW_NEW_* state flags, to trigger re-emission of the entire 3D pipeline on the next draw. However, there are some packets BLORP simply leaves alone, so there's no need to re-emit them. Trying to reduce the set of dirty bits flagged after BLORP runs is tricky. Instead, we introduce a BRW_NEW_BLORP flag. This should be set on any atom which emits a packet that BLORP also emits. When BLORP runs, it will flag BRW_NEW_BLORP, causing those packets to get re-emitted. This also makes it easy to avoid re-emitting specific atoms - we can simply drop the BRW_NEW_BLORP flag on those. To start, we assume that all packets need to be re-emitted. This is the safest approach and closest to the existing code's behavior. Many of these are obviously not required, and can be dropped in subsequent patches. Signed-off-by: Topi Pohjolainen <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965/blorp/gen6: Use normal base state address setupTopi Pohjolainen2016-04-233-54/+5
| | | | | | | | | | | | This is identical to the blorp version which only differs in case fragment shader isn't used. In that case blorp would reset batch buffer address to zero. This is not really needed, and having blorp to use base state address setup that is compatible with normal upload allows one to skip resetting it. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove pointers to non-existing atomsTopi Pohjolainen2016-04-231-8/+0
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radeonsi: Implement ddx/ddy on VI using ds_bpermuteTom Stellard2016-04-221-12/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ds_bpermute instruction allows threads to transfer data directly to or from the vgprs of other threads. These instructions use the LDS hardware to transfer data, but do not read or write LDS memory. DDX BEFORE: | DDX AFTER: | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 v_lshlrev_b32_e32 v4, 2, v2 | v_and_b32_e32 v2, 60, v2 v_and_b32_e32 v2, 60, v2 | v_lshlrev_b32_e32 v2, 2, v2 v_lshlrev_b32_e32 v3, 2, v2 | ds_bpermute_b32 v3, v2, v0 s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 ds_write_b32 v4, v0 | s_waitcnt lgkmcnt(0) s_waitcnt lgkmcnt(0) | v_or_b32_e32 v0, 1, v2 | v_lshlrev_b32_e32 v0, 2, v0 | ds_read_b32 v1, v3 | ds_read_b32 v0, v0 | s_waitcnt lgkmcnt(0) | | LDS: 1 blocks | LDS: 0 blocks Reviewed-by: Michel Dänzer <[email protected]> Acked-by: Marek Olšák <[email protected]>
* radeonsi: Use llvm.amdgcn.mbcnt.* intrinsics instead of llvm.SI.tidTom Stellard2016-04-221-1/+16
| | | | | | | | | | | We're trying to move to more of the new style intrinsics with include the correct target name, and map directly to ISA instructions. v2: - Only do this with LLVM 3.8 and newer. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Set range metadata on calls to llvm.SI.tidTom Stellard2016-04-221-3/+26
| | | | | | | | The range metadata tells LLVM the range of expected values for this intrinsic, so it can do some additional optimizations on the result. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Create a helper function for computing the thread idTom Stellard2016-04-221-6/+11
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965: Disable KHR_texture_compression_astc_hdr on Gen9Nanley Chery2016-04-222-4/+3
| | | | | | | | | | | | Although Gen9 samples from most HDR ASTC surfaces of correctly, there currently are no software workarounds to fix the incorrect sampling that occurs in others of certain color endpoint modes. With this change, we are no longer failing the 14 tests from: dEQP-GLES3.functional.texture.compressed.astc.endpoint_value_hdr_cem_15.* Signed-off-by: Nanley Chery <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* swr: [rasterizer memory] Constify load tilesTim Rowley2016-04-222-6/+8
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] CompleteDrawContext changes for gccTim Rowley2016-04-221-4/+11
| | | | | | | Add explicit inline and non-inline versions of CompleteDrawContext to make gcc happy. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Small cleanupsTim Rowley2016-04-227-20/+29
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer scripts] Knob scripts tweaksTim Rowley2016-04-222-2/+27
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Interpolation utility functionsTim Rowley2016-04-223-6/+55
| | | | | | v2: use _mm_cmpunord_ps for vIsNaN Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] TemplateArgUnrollerTim Rowley2016-04-225-109/+101
| | | | | | | | | Switch boolean template arguments to typename template arguments of type std::integral_constant<bool, VALUE>. This allows the template argument unroller to easily be extended to enums. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Arena: make most allocated blocks the same sizeTim Rowley2016-04-221-16/+52
| | | | | | Reduces sorting cost Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Fix global arena allocator bugTim Rowley2016-04-222-42/+51
| | | | | | - Plus some minor code refactoring Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Fix thread binding for 32-bit windowsTim Rowley2016-04-221-1/+15
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer fetch] Add support for fetching non-uniform component formatsTim Rowley2016-04-221-1/+189
| | | | | | For example, R10G10B10A2_UNORM. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Use CS spill/fill size in coreTim Rowley2016-04-224-5/+9
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: fix memory leaks from vs/fs compilationTim Rowley2016-04-223-23/+41
| | | | | | v2: varient -> variant Reviewed by: George Kyriazis <[email protected]>
* swr: fix clang warningsTim Rowley2016-04-222-5/+5
| | | | | | v2: use alternate logic version in swr_check_render_cond Reviewed-by: Bruce Cherniak <[email protected]>
* freedreno/a4xx: fix encoding of blend color stateRob Clark2016-04-221-35/+14
| | | | | | | Fixes a whole bunch of dEQP-GLES3.functional.fragment_ops.random.* (now they all pass) Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-04-225-9/+33
| | | | | | Pull in RB_BLEND_* fixes. Signed-off-by: Rob Clark <[email protected]>
* vc4: Make sure we recompile when sample_mask changes.Eric Anholt2016-04-221-0/+1
| | | | | | | Part of fixing piglit EXT_framebuffer_multisample/sample-coverage inverted (there is also a bug with RCL tiled blits) Cc: "11.1 11.2" <[email protected]>
* vc4: Fix validation of full res tile offset if used for non-MSAA.Eric Anholt2016-04-223-2/+14
| | | | | | There's no reason we couldn't do non-MSAA full resolution tile buffer load/stores, but we would have claimed buffer overflow was being attempted. Nothing does this currently.
* vc4: Only do MSAA FB operations if the FB is MSAA.Eric Anholt2016-04-221-5/+8
| | | | | I noticed this as a problem with ET:QW traces emitting coverage code when the framebuffer was supposed to be single sampled.
* vc4: Fix tests for format supported with nr_samples == 1.Eric Anholt2016-04-221-3/+4
| | | | | | | | | | This was a bug from the MSAA enabling. Tests for surfaces with nr_samples==1 instead of 0 (generally GL renderbuffers) would incorrectly fail out. Fixes the ARB_framebuffer_sRGB piglit tests other than srgb_conformance. Cc: "11.1 11.2" <[email protected]>
* vc4: Don't try to blit from MSAA surfaces with mismatched width to dst.Eric Anholt2016-04-221-11/+14
| | | | | | | | | I had made the previous blit fix non-MSAA only because I was thinking about how the hardware infers stride from the RENDERING_CONFIG packet. However, I'm also inferring the stride for both MSAA src and dst in vc4_render_cl.c from the width argument in the ioctl. Fixes 15 EXT_framebuffer_multisample piglit tests.
* i965: Disable channel expressions for scalar GS, TCS, TES.Kenneth Graunke2016-04-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Broadwell, I get the following shader-db statistics: Tessellation Control Shaders: total instructions in shared programs: 57327 -> 57012 (-0.55%) instructions in affected programs: 27334 -> 27019 (-1.15%) helped: 45 HURT: 0 total cycles in shared programs: 265692 -> 255188 (-3.95%) cycles in affected programs: 263122 -> 252618 (-3.99%) helped: 184 HURT: 26 Tessellation Evaluation Shaders: total instructions in shared programs: 23236 -> 23157 (-0.34%) instructions in affected programs: 2791 -> 2712 (-2.83%) helped: 27 HURT: 0 total cycles in shared programs: 151858 -> 149704 (-1.42%) cycles in affected programs: 151858 -> 149704 (-1.42%) helped: 101 HURT: 114 Geometry Shaders: Orbital Explorer goes from 6442 -> 6356 instructions. Two Shadow of Mordor shaders increase by a single instruction. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>