summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965/blorp: Do not trigger re-emission of base state addressTopi Pohjolainen2016-04-232-2/+0
| | | | | | | | In case blorp needs to configure it will be just as if render or compute pipeline had configured it. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Reconfigure base state address only if neededTopi Pohjolainen2016-04-233-3/+7
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Use BRW_NEW_BLORP instead of trashing all state bitsTopi Pohjolainen2016-04-232-5/+2
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-2362-46/+179
| | | | Reviewed-by: Topi Pohjolainen <[email protected]
* i965: Introduce state flag for blorpTopi Pohjolainen2016-04-232-0/+3
| | | | | | | | | | | | | | | | | | | | | | | In the past, BLORP has clobbered all BRW_NEW_* state flags, to trigger re-emission of the entire 3D pipeline on the next draw. However, there are some packets BLORP simply leaves alone, so there's no need to re-emit them. Trying to reduce the set of dirty bits flagged after BLORP runs is tricky. Instead, we introduce a BRW_NEW_BLORP flag. This should be set on any atom which emits a packet that BLORP also emits. When BLORP runs, it will flag BRW_NEW_BLORP, causing those packets to get re-emitted. This also makes it easy to avoid re-emitting specific atoms - we can simply drop the BRW_NEW_BLORP flag on those. To start, we assume that all packets need to be re-emitted. This is the safest approach and closest to the existing code's behavior. Many of these are obviously not required, and can be dropped in subsequent patches. Signed-off-by: Topi Pohjolainen <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
* i965/blorp/gen6: Use normal base state address setupTopi Pohjolainen2016-04-233-54/+5
| | | | | | | | | | | | This is identical to the blorp version which only differs in case fragment shader isn't used. In that case blorp would reset batch buffer address to zero. This is not really needed, and having blorp to use base state address setup that is compatible with normal upload allows one to skip resetting it. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove pointers to non-existing atomsTopi Pohjolainen2016-04-231-8/+0
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radeonsi: Implement ddx/ddy on VI using ds_bpermuteTom Stellard2016-04-221-12/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ds_bpermute instruction allows threads to transfer data directly to or from the vgprs of other threads. These instructions use the LDS hardware to transfer data, but do not read or write LDS memory. DDX BEFORE: | DDX AFTER: | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 | v_mbcnt_lo_u32_b32_e64 v2, -1, 0 v_mbcnt_hi_u32_b32_e64 v2, -1, v2 | v_mbcnt_hi_u32_b32_e64 v2, -1, v2 v_lshlrev_b32_e32 v4, 2, v2 | v_and_b32_e32 v2, 60, v2 v_and_b32_e32 v2, 60, v2 | v_lshlrev_b32_e32 v2, 2, v2 v_lshlrev_b32_e32 v3, 2, v2 | ds_bpermute_b32 v3, v2, v0 s_mov_b32 m0, -1 | ds_bpermute_b32 v0, v2, v0 offset:4 ds_write_b32 v4, v0 | s_waitcnt lgkmcnt(0) s_waitcnt lgkmcnt(0) | v_or_b32_e32 v0, 1, v2 | v_lshlrev_b32_e32 v0, 2, v0 | ds_read_b32 v1, v3 | ds_read_b32 v0, v0 | s_waitcnt lgkmcnt(0) | | LDS: 1 blocks | LDS: 0 blocks Reviewed-by: Michel Dänzer <[email protected]> Acked-by: Marek Olšák <[email protected]>
* radeonsi: Use llvm.amdgcn.mbcnt.* intrinsics instead of llvm.SI.tidTom Stellard2016-04-221-1/+16
| | | | | | | | | | | We're trying to move to more of the new style intrinsics with include the correct target name, and map directly to ISA instructions. v2: - Only do this with LLVM 3.8 and newer. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Set range metadata on calls to llvm.SI.tidTom Stellard2016-04-221-3/+26
| | | | | | | | The range metadata tells LLVM the range of expected values for this intrinsic, so it can do some additional optimizations on the result. Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Create a helper function for computing the thread idTom Stellard2016-04-221-6/+11
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965: Disable KHR_texture_compression_astc_hdr on Gen9Nanley Chery2016-04-222-4/+3
| | | | | | | | | | | | Although Gen9 samples from most HDR ASTC surfaces of correctly, there currently are no software workarounds to fix the incorrect sampling that occurs in others of certain color endpoint modes. With this change, we are no longer failing the 14 tests from: dEQP-GLES3.functional.texture.compressed.astc.endpoint_value_hdr_cem_15.* Signed-off-by: Nanley Chery <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* swr: [rasterizer memory] Constify load tilesTim Rowley2016-04-222-6/+8
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] CompleteDrawContext changes for gccTim Rowley2016-04-221-4/+11
| | | | | | | Add explicit inline and non-inline versions of CompleteDrawContext to make gcc happy. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Small cleanupsTim Rowley2016-04-227-20/+29
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer scripts] Knob scripts tweaksTim Rowley2016-04-222-2/+27
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Interpolation utility functionsTim Rowley2016-04-223-6/+55
| | | | | | v2: use _mm_cmpunord_ps for vIsNaN Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] TemplateArgUnrollerTim Rowley2016-04-225-109/+101
| | | | | | | | | Switch boolean template arguments to typename template arguments of type std::integral_constant<bool, VALUE>. This allows the template argument unroller to easily be extended to enums. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Arena: make most allocated blocks the same sizeTim Rowley2016-04-221-16/+52
| | | | | | Reduces sorting cost Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Fix global arena allocator bugTim Rowley2016-04-222-42/+51
| | | | | | - Plus some minor code refactoring Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Fix thread binding for 32-bit windowsTim Rowley2016-04-221-1/+15
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer fetch] Add support for fetching non-uniform component formatsTim Rowley2016-04-221-1/+189
| | | | | | For example, R10G10B10A2_UNORM. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Use CS spill/fill size in coreTim Rowley2016-04-224-5/+9
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: fix memory leaks from vs/fs compilationTim Rowley2016-04-223-23/+41
| | | | | | v2: varient -> variant Reviewed by: George Kyriazis <[email protected]>
* swr: fix clang warningsTim Rowley2016-04-222-5/+5
| | | | | | v2: use alternate logic version in swr_check_render_cond Reviewed-by: Bruce Cherniak <[email protected]>
* freedreno/a4xx: fix encoding of blend color stateRob Clark2016-04-221-35/+14
| | | | | | | Fixes a whole bunch of dEQP-GLES3.functional.fragment_ops.random.* (now they all pass) Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-04-225-9/+33
| | | | | | Pull in RB_BLEND_* fixes. Signed-off-by: Rob Clark <[email protected]>
* vc4: Make sure we recompile when sample_mask changes.Eric Anholt2016-04-221-0/+1
| | | | | | | Part of fixing piglit EXT_framebuffer_multisample/sample-coverage inverted (there is also a bug with RCL tiled blits) Cc: "11.1 11.2" <[email protected]>
* vc4: Fix validation of full res tile offset if used for non-MSAA.Eric Anholt2016-04-223-2/+14
| | | | | | There's no reason we couldn't do non-MSAA full resolution tile buffer load/stores, but we would have claimed buffer overflow was being attempted. Nothing does this currently.
* vc4: Only do MSAA FB operations if the FB is MSAA.Eric Anholt2016-04-221-5/+8
| | | | | I noticed this as a problem with ET:QW traces emitting coverage code when the framebuffer was supposed to be single sampled.
* vc4: Fix tests for format supported with nr_samples == 1.Eric Anholt2016-04-221-3/+4
| | | | | | | | | | This was a bug from the MSAA enabling. Tests for surfaces with nr_samples==1 instead of 0 (generally GL renderbuffers) would incorrectly fail out. Fixes the ARB_framebuffer_sRGB piglit tests other than srgb_conformance. Cc: "11.1 11.2" <[email protected]>
* vc4: Don't try to blit from MSAA surfaces with mismatched width to dst.Eric Anholt2016-04-221-11/+14
| | | | | | | | | I had made the previous blit fix non-MSAA only because I was thinking about how the hardware infers stride from the RENDERING_CONFIG packet. However, I'm also inferring the stride for both MSAA src and dst in vc4_render_cl.c from the width argument in the ioctl. Fixes 15 EXT_framebuffer_multisample piglit tests.
* i965: Disable channel expressions for scalar GS, TCS, TES.Kenneth Graunke2016-04-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Broadwell, I get the following shader-db statistics: Tessellation Control Shaders: total instructions in shared programs: 57327 -> 57012 (-0.55%) instructions in affected programs: 27334 -> 27019 (-1.15%) helped: 45 HURT: 0 total cycles in shared programs: 265692 -> 255188 (-3.95%) cycles in affected programs: 263122 -> 252618 (-3.99%) helped: 184 HURT: 26 Tessellation Evaluation Shaders: total instructions in shared programs: 23236 -> 23157 (-0.34%) instructions in affected programs: 2791 -> 2712 (-2.83%) helped: 27 HURT: 0 total cycles in shared programs: 151858 -> 149704 (-1.42%) cycles in affected programs: 151858 -> 149704 (-1.42%) helped: 101 HURT: 114 Geometry Shaders: Orbital Explorer goes from 6442 -> 6356 instructions. Two Shadow of Mordor shaders increase by a single instruction. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/blorp: Add support for 2x msaaTopi Pohjolainen2016-04-222-10/+9
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp: Add support for encoding/decoding interleaved 2x msaaTopi Pohjolainen2016-04-221-8/+36
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: don't lower mod() in glsl irSamuel Iglesias Gonsálvez2016-04-221-1/+0
| | | | | | | | | NIR will lower it in nir_opt_algebraic. No change in shader-db. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: fix cross validation for explicit locations on structs and arraysTimothy Arceri2016-04-221-13/+30
| | | | | | Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radeonsi: implement TGSI_SEMANTIC_HELPER_INVOCATIONNicolai Hähnle2016-04-212-1/+12
| | | | | | Depends on LLVM support introduced in r267102. Reviewed-by: Marek Olšák <[email protected]>
* swr: ignore generated files in rasterizerIlia Mirkin2016-04-221-0/+7
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* nvc0: fix retrieving query results into buffer for timestampsIlia Mirkin2016-04-221-5/+15
| | | | | | | | | The timestamps are stored in a funny place, and even though they are a 64-bit result, are not stored with is64bit. Account for that when retrieving the query result into a resource. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.2" <[email protected]>
* i965/surface_state: Use libisl functions for image format loweringJason Ekstrand2016-04-213-120/+12
| | | | | | | This lets us delete some redundant code and keep all of the image_load_store format lowering logic in one place: libisl. Reviewed-by: Chad Versace <[email protected]>
* i965/fs_surface_builder: Use isl instead of mesa for format infoJason Ekstrand2016-04-211-66/+52
| | | | Reviewed-by: Chad Versace <[email protected]>
* i965/fs_surface_builder: Add a helper for converting GL to ISL formatsJason Ekstrand2016-04-211-0/+55
| | | | Reviewed-by: Chad Versace <[email protected]>
* i965/fs_surface_builder: Explicitly handle FORMAT_NONE in num_image_coordinatesJason Ekstrand2016-04-211-0/+1
| | | | | | | | | Previously, we were relying on has_matching_typed_format returning true for MESA_FORMAT_NONE which, in turn, relied on _mesa_get_format_bytes returning 1 for MESA_FORMAT_NONE. When we switch to ISL, this behaviour will no longer be something we can rely on. Reviewed-by: Chad Versace <[email protected]>
* i965/fs_surface_builder: Take a GL format enum instead of mesa_formatJason Ekstrand2016-04-213-9/+10
| | | | Reviewed-by: Chad Versace <[email protected]>
* isl/format: Add a get_num_channels helperJason Ekstrand2016-04-212-0/+17
| | | | Reviewed-by: Chad Versace <[email protected]>
* isl/format: Add more isl_format_has_type_channel functionsJason Ekstrand2016-04-212-4/+43
| | | | Reviewed-by: Chad Versace <[email protected]>
* isl/format: Break the guts of has_[us]int_channel into a helperJason Ekstrand2016-04-211-18/+16
| | | | Reviewed-by: Chad Versace <[email protected]>
* anv/image: Use the has_matching_typed_storage_image_format helper from islJason Ekstrand2016-04-211-12/+3
| | | | Reviewed-by: Chad Versace <[email protected]>
* isl: Add a helper for determining when a typed load/store can be usedJason Ekstrand2016-04-212-0/+20
| | | | Reviewed-by: Chad Versace <[email protected]>