aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add B8G8R8X8_SRGB to the alpha format overrideNeil Roberts2015-12-131-0/+4
| | | | | | | | | | | brw_init_surface_formats overrides the render format for RGBX formats which aren't supported for rendering so that they internally use RGBA instead. However, B8G8R8X8_SRGB was missing so it wasn't marked as a renderable format. This patch just adds it. Cc: "11.0 11.1" <[email protected]> Cc: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add MESA_FORMAT_B8G8R8X8_SRGB to brw_format_for_mesa_formatNeil Roberts2015-12-131-0/+1
| | | | | | | | | This will be used in a subsequent patch as the format for RGB visuals. Cc: "11.0 11.1" <[email protected]> Cc: Ilia Mirkin <[email protected]> Suggested-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* gk104/ir: simplify and fool-proof texbar algorithmIlia Mirkin2015-12-122-83/+56
| | | | | | | | | | | | | | With the current algorithm, we only look at tex uses. However there's a write-after-write hazard where we might decide to, on some path, not use a texture's output at all, but instead to write a different value to that register. However without the barrier, the texture might complete later and overwrite that value. This fixes Unreal Elemental demo on GK110/GK208, flightgear on GK10x, and likely other random-looking failures. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.1" <[email protected]>
* nv50/ir: combine sequences of conversionsIlia Mirkin2015-12-121-0/+43
| | | | | | | | | | In some cases shaders want non-default rounding when converting float to integer. This can be done in one go, so merge the two ops. This comes up in the packUnorm4x8 & co functions, as well as a few random shaders. Overall shader-db impact is minimal, helping a handful of witcher2 and other misc shaders. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: manually optimize multiplication expansion logicIlia Mirkin2015-12-121-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | The conversion of 32-bit integer multiplies into 16-bit ones happens after the regular optimization loop. However it's fairly common to multiply by a small integer, rendering some of the expansion pointless. Firstly, propagate immediates when possible into mul ops, secondly just remove the ops when they are unnecessary. Including the change to generate imad immediates, the effect is: total instructions in shared programs : 6365463 -> 6351898 (-0.21%) total gprs used in shared programs : 728684 -> 728684 (0.00%) total local used in shared programs : 9904 -> 9904 (0.00%) total bytes used in shared programs : 44001576 -> 44036120 (0.08%) local gpr inst bytes helped 0 0 3288 4 hurt 0 0 0 842 It's easy for this to hurt bytes since we end up always generating the 8-byte form, while we can't always get rid of the immediate in question. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix imul emission in the presence of an immediateIlia Mirkin2015-12-121-4/+7
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: teach post-ra immediate folding into mad about integersIlia Mirkin2015-12-121-3/+31
| | | | | | | There will usually be a split before the mad op, peer through that and pick out the right word of the immediate. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add short imad supportIlia Mirkin2015-12-123-22/+40
| | | | | | | Support emission of the short imad, but also include it in the various logic that tries to make it possible to emit. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: can't have predication and immediatesIlia Mirkin2015-12-121-0/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: fix texture grad for cubemapsIlia Mirkin2015-12-124-7/+6
| | | | | | We were ignoring the partial derivatives on the last dim. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix assumption that prog->maxGPR is in 32-bit reg unitsIlia Mirkin2015-12-122-4/+21
| | | | | | | On NV50, we use 16-bit reg units (to make it all work with half-regs). A few places assumed that it was always in 32-bit units. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium/ddebug: regularly log the total number of draw callsNicolai Hähnle2015-12-121-0/+3
| | | | | | | | | | This helps in the use of GALLIUM_DDEBUG_SKIP: first run a target application with skip set to a very large number and note how many draw calls happen before the bug. Then re-run, skipping the corresponding number of calls. Despite the additional run, this can still be much faster than not skipping anything. Reviewed-by: Marek Olšák <[email protected]>
* gallium/ddebug: add GALLIUM_DDEBUG_SKIP optionNicolai Hähnle2015-12-123-15/+36
| | | | | | | | When we know that hangs occur only very late in a reproducible run (e.g. apitrace), we can save a lot of debugging time by skipping the flush and hang detection for earlier draw calls. Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: fix layer/vp input into fs when not written by prior stagesRoland Scheidegger2015-12-128-53/+96
| | | | | | | | | | | | | | | | | | | | | | | | | ARB_fragment_layer_viewport requires that if a fs reads layer or viewport index but it wasn't output by gs (or vs with other extensions), then it reads 0. This never worked for llvmpipe, and is surprisingly non-trivial to fix. The problem is the mechanism to handle non-existing outputs in draw is rather crude, it will simply redirect them to whatever is at output 0, thus later stages will just get garbage. So, rather than trying to fix this up (which looks non-trivial), fix this up in llvmpipe setup by detecting this case there and output a fixed zero directly. While here, also optimize the hw vertex layout a bit - previously if the gs outputted layer (or vp) and the fs read those inputs, we'd add them twice to the vertex layout, which is unnecessary. And do some minor cleanup, slots don't require that many bits, there was some bogus (but harmless) float/int mixup for psize slot too, make the slots all unsigned (we always put pos at pos zero thus everything else has to be positive if it exists), and make sure they are properly initialized (layer and vp index slot were not which looked fishy as they might not have got set back to zero when changing from a gs which outputs them to one which does not). This fixes the failures in piglit's arb_fragment_layer_viewport group (3 each for layer and vp). Reviewed-by: Jose Fonseca <[email protected]>
* svga: avoid emitting redundant SetSamplers() commandsBrian Paul2015-12-112-7/+18
| | | | | | | | This greatly reduces the number of SetSamplers() commands for some applications. Reviewed-by: José Fonseca <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: avoid emitting redundant SetIndexBuffer commandsBrian Paul2015-12-112-5/+16
| | | | | Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: José Fonseca <[email protected]>
* st/mesa: trivial indentation fixBrian Paul2015-12-111-1/+1
|
* util/blitter: minor formatting fixesBrian Paul2015-12-111-5/+4
|
* i965/fs: Use the correct source for local memory load offsetsJason Ekstrand2015-12-111-1/+1
| | | | | | | | | | The offset for loads is in src[0]. This was a copy+paste error in the nir_intrinsic_load/store refactoring. This commit fixes a segfault in ES31-CTS.compute_shader.work-group-size. I have no idea how piglit failed to catch this... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93348 Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Add Gen8+ tessellation control shader state (3DSTATE_HS).Kenneth Graunke2015-12-111-13/+51
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add Gen7+ tessellation engine state (3DSTATE_TE).Kenneth Graunke2015-12-111-8/+28
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add Gen8+ tessellation evaluation shader state (3DSTATE_DS).Kenneth Graunke2015-12-111-7/+59
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add tessellation shader push constant support.Kenneth Graunke2015-12-117-29/+63
| | | | | | | Based on a patch by Chris Forbes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add tessellation shader sampler support.Kenneth Graunke2015-12-114-1/+51
| | | | | | | Based on code by Chris Forbes and Fabian Bieler. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add tessellation shader surface support.Kenneth Graunke2015-12-1110-11/+389
| | | | | | | | | | | | This is brw_gs_surface_state.c copy and pasted twice with search and replace. brw_binding_table.c code is similarly copy and pasted. v2: Drop dword_pitch related fields. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* i965: Make fs_visitor::emit_urb_writes set EOT for TES as well.Kenneth Graunke2015-12-111-1/+1
| | | | | | | | | | | | | Tessellation evaluation shaders work almost identically to vertex shaders - we have a set of URB writes at the end of the program, and the last one should terminate it. Geometry shaders really are the special case, where multiple EmitVertex() calls trigger URB writes in the middle of the program. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Don't hardcode g1 for URB handles in fs_visitor::emit_urb_writes().Kenneth Graunke2015-12-111-4/+5
| | | | | | | | | Tessellation evaluation shaders will use g4 instead. For now, make an fs_reg called urb_handle and use that in place of hardcoding g1. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Make brw_set_message_descriptor() non-static.Kenneth Graunke2015-12-112-1/+9
| | | | | | | | I want to use this directly from brw_vec4_generator.cpp. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Move brw_cs_fill_local_id_payload() to libi965_compilerKristian Høgsberg Kristensen2015-12-114-40/+43
| | | | | | This is a helper function for setting up the local invocation ID payload according to the cs_prog_data generated by the compiler. It's intended to be available to users of libi965_compiler so move it there.
* vc4: Add quick algebraic optimization for clamping of unpacked values.Eric Anholt2015-12-111-0/+18
| | | | | | | | | | | | | | | | GL likes to saturate your incoming color, but if that color's coming from unpacking from unorms, there's no point. Ideally we'd have a range propagation pass that cleans these up in NIR, but that doesn't seem to be going to land soon. It seems like we could do a one-off optimization in nir_opt_algebraic, except that doesn't want to operate on expressions involving unpack_unorm_4x8, since it's sized. total instructions in shared programs: 87879 -> 87761 (-0.13%) instructions in affected programs: 6044 -> 5926 (-1.95%) total estimated cycles in shared programs: 349457 -> 349252 (-0.06%) estimated cycles in affected programs: 6172 -> 5967 (-3.32%) No SSPD on openarena (which had the biggest gains, in its VS/CSes), n=15.
* vc4: When doing algebraic optimization into a MOV, use the right MOV.Eric Anholt2015-12-111-1/+6
| | | | | If there were src unpacks, changing to the integer MOV instead of float (for example) would change the unpack operation.
* vc4: Fix handling of src packs on in qir_follow_movs().Eric Anholt2015-12-111-2/+8
| | | | | | The caller isn't going to expect it from a return, so it would probably get misinterpreted. If the caller had an unpack in its reg, that's fine, but don't lose track of it.
* vc4: Add missing progress note in opt_algebraic.Eric Anholt2015-12-111-0/+1
|
* vc4: Add debugging of the estimated time to run the shader to shader-db.Eric Anholt2015-12-113-17/+50
|
* vc4: Fix handling of sample_mask output.Eric Anholt2015-12-112-6/+6
| | | | | | | I apparently broke this in a late refactor, in such a way that I decided its tests were some of those interminable ones that I should just blacklist from my testing. As a result, the refactors related to it were totally wrong.
* softpipe: enable GL_ARB_viewport_array support, update GL3.txt docEdward O'Callaghan2015-12-111-1/+1
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* softpipe: implement some support for multiple viewportsEdward O'Callaghan2015-12-118-52/+112
| | | | | | | | Mostly related to making sure the rasterizer can correctly pick out the correct scissor box for the current viewport. Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* draw: don't assume fixed offset for data in struct vertex_infoRoland Scheidegger2015-12-111-5/+3
| | | | | | Otherwise, if struct vertex_info is changed, you're in for some surprises... Reviewed-by: Jose Fonseca <[email protected]>
* i965/gen9: Don't do fast clears when GL_FRAMEBUFFER_SRGB is enabledNeil Roberts2015-12-111-0/+11
| | | | | | | | | When GL_FRAMEBUFFER_SRGB is enabled any single-sampled renderbuffers are resolved in intel_update_state because the hardware can't cope with fast clears on SRGB buffers. In that case it's pointless to do a fast clear because it will just be immediately resolved. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/gen9: Allow fast clears for non-MSRT SRGB buffersNeil Roberts2015-12-111-1/+2
| | | | | | | | | | | | | | | SRGB buffers are not marked as losslessly compressible so previously they would not be used for fast clears. However in practice the hardware will never actually see that we are using SRGB buffers for fast clears if we use the linear equivalent format when clearing and make sure to resolve the buffer as a linear format before sampling from it. This is an important use case because by default the window system framebuffers are created as SRGB so without this fast clears won't be used there. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/gen9: Resolve SRGB color buffers when GL_FRAMEBUFFER_SRGB enabledNeil Roberts2015-12-111-0/+27
| | | | | | | | | | SKL can't cope with the CCS buffer for SRGB buffers. Normally the hardware won't see the SRGB formats because when GL_FRAMEBUFFER_SRGB is disabled these get mapped to their linear equivalents. In order to avoid relying on the CCS buffer when it is enabled this patch now makes it flush the renderbuffers. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/gen8+: Don't upload the MCS buffer for single-sampled texturesNeil Roberts2015-12-111-1/+5
| | | | | | | | | | | For single-sampled textures the MCS buffer is only used to implement fast clears. However the surface always needs to be resolved before being used as a texture anyway so the the MCS buffer doesn't actually achieve anything. This is important for Gen9 because in that case SRGB surfaces are not supported for fast clears and we don't want the hardware to see the MCS buffer in that case. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/meta-fast-clear: Disable GL_FRAMEBUFFER_SRGB during clearNeil Roberts2015-12-111-0/+16
| | | | | | | | | | | | | | | Adds MESA_META_FRAMEBUFFER_SRGB to the meta save state so that GL_FRAMEBUFFER_SRGB will be disabled when performing the fast clear. That way the render surface state will be programmed with the linear equivalent format during the clear. This is important for Gen9 because the SRGB formats are not marked as losslessly compressible so in theory they aren't support for fast clears. It shouldn't make any difference whether GL_FRAMEBUFFER_SRGB is enabled for the fast clear operation because the color is not actually written to the framebuffer so there is no chance for the hardware to apply the SRGB conversion on it anyway. Reviewed-by: Topi Pohjolainen <[email protected]>
* winsys/amdgpu: clear the buffer cache on mmap failure and try againMarek Olšák2015-12-111-0/+5
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* winsys/radeon: clear the buffer cache on mmap failure and try againMarek Olšák2015-12-111-3/+10
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* winsys/amdgpu: clear the buffer cache on allocation failure and try againMarek Olšák2015-12-111-2/+7
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* winsys/radeon: clear the buffer cache on allocation failure and try againMarek Olšák2015-12-111-2/+7
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* gallium/radeon: remove radeon_winsys_cs_handleMarek Olšák2015-12-1139-173/+130
| | | | | | | | "radeon_winsys_cs_handle *cs_buf" is now equivalent to "pb_buffer *buf". Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* winsys/radeon: use pb_cache instead of pb_cache_managerMarek Olšák2015-12-114-178/+75
| | | | | | | | This is a prerequisite for the removal of radeon_winsys_cs_handle. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* winsys/radeon: use radeon_bomgr lessMarek Olšák2015-12-112-16/+8
| | | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>