summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* nvc0: support MP performance counters on MaxwellSamuel Pitoiset2016-11-103-3/+721
| | | | | | | This adds some performance counters/metrics for SM50/SM52. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]>
* radeonsi: fix r600_texture::tc_compatible_htileMarek Olšák2016-11-101-3/+3
| | | | | | | | htile_size is now always non-zero if HTILE is allocated. It seems to have caused no issues. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: accept is_store in image_fetch_rsrc instead of dcc_offMarek Olšák2016-11-101-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't rely on tgsi_scan::images_buffersMarek Olšák2016-11-101-8/+11
| | | | | | the instruction knows the target Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: re-order cases in si_get_shader_paramMarek Olšák2016-11-101-28/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: increase MAX_CONTROL_FLOW_DEPTH AKA MaxIfDepthMarek Olšák2016-11-101-2/+1
| | | | | | we don't want to lower deep IFs unconditionally Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix/silence unused variable warnings in optimized buildsNicolai Hähnle2016-11-102-3/+3
| | | | | | | | I'm leaving num_out_sgpr around since it's not in a fast path, and besides the compiler should be able to optimize it away easily. The alternative with #if/#endif would be extremely ugly. Reviewed-by: Marek Olšák <[email protected]>
* swr: correct setting of independentAlphaBlendEnableIlia Mirkin2016-11-091-1/+6
| | | | | | | | This setting is for whether color and alpha have different blend settings, not for whether blending is enabled on a per-RT basis. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer] add a .dir-locals.el to support 4-space indentsIlia Mirkin2016-11-091-0/+8
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: set halfz rasterizer settingIlia Mirkin2016-11-091-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] allow an OpenGL driver to specify halfz clippingIlia Mirkin2016-11-092-7/+7
| | | | | | | | | | With ARB_clip_control, GL may also do 0..1 depth clipping, not just -1..1. This removes clip's reliance on driver type. DX users will need to be updated to set the new clipHalfZ flag to get proper clipping functionality. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: fix support for inverted depth scalesIlia Mirkin2016-11-091-7/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] fix logic op to work with unorm/snormIlia Mirkin2016-11-091-17/+65
| | | | | | | | | Most logic op usage is probably going to end up with normalized textures. Scale the floating point values and convert to integer before performing the logic operations. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* vc4: Clamp the shadow comparison value.Eric Anholt2016-11-091-0/+9
| | | | | | Fixes piglit glsl-fs-shadow2D-clamp-z. Cc: <[email protected]>
* vc4: Don't pair up TLB scoreboard locking instructions early in QPU sched.Eric Anholt2016-11-091-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | Jonas Pfeil noticed that we were putting passthrough tlb_z writes early in the shader, despite QIR and QPU scheduling both trying to delay scoreboard locking for as long as possible. The problem was that when trying to pair up QPU instructions, at some point the passthrough tlb_z would be the last one available and it would get paired, even if the other half would open up other instructions to be scheduled and we could have paired tlb_z with something later in the program. Also, since passthrough z is just a mov, it pairs up really easily. The proper fix would probably be to flip the order of scheduling instructions so we went from bottom to top (also relevant for branch delay slot scheduling). However, we can do a quick fix here to just not schedule a TLB lock until there's nothing but TLB left in the program, at a slight instruction cost (est .61% cycle count in shader-db) but a major fragment shader parallelism win. glmark2 results: texture:texture-filter=linear: +1.24481% +/- 0.626117% (n=15) bump:bump-render=height: 1.24991% +/- 0.154793% (n=136,133 -- screensaver outliers removed)
* vc4: Print a reg pressure estimate in our reg allocation failure dump.Eric Anholt2016-11-091-0/+5
|
* vc4: Don't abort when a shader compile fails.Eric Anholt2016-11-096-8/+32
| | | | | | | | | It's much better to just skip the draw call entirely. Getting this information out of register allocation will also be useful for implementing threaded fragment shaders, which will need to retry non-threaded if RA fails. Cc: <[email protected]>
* llvmpipe: Fix build after removal of deprecated attribute API v2Aaron Watry2016-11-092-3/+2
| | | | | | | | | | | | Applies on top of v3 of Tom's gallivm change. v2: - Tom Stellard: Use enums instread of strings. Reviewed-by: Nicolai Hähnle <[email protected]> Signed-off-by: Aaron Watry <[email protected]> CC: Tom Stellard <[email protected]> CC: Jan Vesely <[email protected]>
* gallivm: Fix build after removal of deprecated attribute API v3Tom Stellard2016-11-092-44/+49
| | | | | | | | | | | | v2: Fix adding parameter attributes with LLVM < 4.0. v3: Fix typo. Fix parameter index. Add a gallivm enum for function attributes. Reviewed-by: Nicolai Hähnle <[email protected]>
* swr: disable logic op when the rt format is float or srgbIlia Mirkin2016-11-081-0/+6
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: fix AND_INVERTED logic op conversionIlia Mirkin2016-11-081-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: add support for EXT_depth_bounds_testIlia Mirkin2016-11-082-1/+7
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] set depth hottile when depth bounds test enabledIlia Mirkin2016-11-081-1/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Tim Rowley <[email protected]>
* swr: allow alphatest without blend or logicopTim Rowley2016-11-081-1/+2
| | | | | | We need to compile a blend function when alphatest is enabled. Reviewed-by: Bruce Cherniak <[email protected]>
* nvc0: simplify draw parameters upload for vertex shadersSamuel Pitoiset2016-11-071-8/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: get rid of NVE4_COMPUTE_MP_PM_{A,B}_SIGSEL_XXXSamuel Pitoiset2016-11-051-56/+56
| | | | | | | Instead, hardcode group sigsel because there are a bunch of unknown groups, especially on SM50/SM52. Signed-off-by: Samuel Pitoiset <[email protected]>
* gm107/ir: emit RED instead of ATOM when no dstSamuel Pitoiset2016-11-051-1/+28
| | | | | | | | | | | | | This is similar to NVC0 and GK110 emitters where we emit reduction operations instead of atomic operations when the destination is not used. Found after writing some tests which check if performance counters return the expected value. In that case, gred_count returned 0 on gm107 while at least gk106 returned the correct value. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.Eric Anholt2016-11-041-1/+1
| | | | | | | The 1/W was apparently not accurate enough, and we were getting sparklies in the distance. The closed driver also did a N-R step here. Cc: <[email protected]>
* vc4: Make sure that vertex shader texture2D() calls use LOD 0.Eric Anholt2016-11-041-0/+10
| | | | | I noticed this while trying to debug glmark2 terrain (which does vertex shader texturing, but no mipmaps on its textures sampled from the VS).
* radeonsi: fix vertex fetches for 2_10_10_10 formatsNicolai Hähnle2016-11-045-6/+78
| | | | | | | | | | | The hardware always treats the alpha channel as unsigned, so add a shader workaround. This is rare enough that we'll just build a monolithic vertex shader. The SINT case cannot actually happen in OpenGL, but I've included it for completeness since it's just a mix of the other cases. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix an assertion failure in si_decompress_sampler_color_texturesMarek Olšák2016-11-041-1/+3
| | | | | | | | | This fixes a crash in Deus Ex: Mankind Divided. Release builds were unaffected, so it's not too serious. Cc: 11.2 12.0 13.0 <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: enable GLSL 4.50Nicolai Hähnle2016-11-041-1/+1
| | | | | Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* vc4: Add miptree/texture state support for ETC1 compressed textures.Eric Anholt2016-11-035-1/+33
| | | | | The format isn't flagged as enabled at runtime yet, because we need kernel validation support.
* vc4: Fix use of undefined values since the ralloc zeroing changes.Eric Anholt2016-11-031-6/+11
| | | | | reralloc() no longer zeroes the new contents, so switch to using rzalloc_array() instead.
* svga: move svga_mark_surfaces_dirty() prototype to svga_surface.hBrian Paul2016-11-033-10/+4
| | | | Trivial.
* svga: whitespace / formatting clean-up in svga_context.cBrian Paul2016-11-031-28/+34
| | | | Trivial.
* svga: collect stats for time spent in svga_context_finish()Brian Paul2016-11-031-0/+4
| | | | | This should have appeared with commit "svga: add guest statistic gathering interface" from August 4, but was somehow lost.
* svga: invalidate new surface before it is bound to a render target viewCharmaine Lee2016-11-036-3/+42
| | | | | | | | | Invalidate a "new" surface before it is bound to a render target view or depth stencil view in order to avoid the unnecessary host side copy of the surface data before it is rendered to. Note that, recycled surface is already invalidated before it is reused. Reviewed-by: Brian Paul <[email protected]>
* Revert "svga: use untyped surface formats in most cases"Charmaine Lee2016-11-031-7/+4
| | | | | | Using untyped surface formats causes huge performance degradation on Fusion. This reverts commit eb0ced74f6decd1bf1e111b162e1389bede89af6 until the backend has a better solution to address typeless surface formats.
* svga: allow quad blit for more formatsCharmaine Lee2016-11-031-1/+136
| | | | | | | | | | | | Currently blitter will fail if the blit format is different and view-incompatible to the resource format. Instead of punting to software blit which will stall the pipeline, we will create temporary resource to allow blitter to work. Fixes piglit test arb_copy_image-formats. Also tested with MTT piglit, glretrace. Reviewed-by: Brian Paul <[email protected]>
* svga: create BGRX render target view for BGRX_UNORM surfaceCharmaine Lee2016-11-031-1/+2
| | | | | | | | | | Currently we adjust the view format when we are asked to create a BGRA render target view for BGRX surface. But we only look for SVGA3D_B8G8R8X8_TYPELESS surface format. With this patch, we will also check for SVGA3D_B8G8R8X8_UNORM surface format, and use SVGA3D_B8G8R8X8_UNORM as the view format for that case. Reviewed-by: Brian Paul <[email protected]>
* svga: add a helper function to check for typeless formatCharmaine Lee2016-11-032-0/+34
| | | | | | | This patch adds a helper function svga_format_is_typeless() which returns TRUE if the specified format is typeless. Reviewed-by: Brian Paul <[email protected]>
* svga: add SVGA_NEW_FRAME_BUFFER to svga_hw_tss_binding state atomBrian Paul2016-11-031-0/+1
| | | | | | | | | | | We may need to re-emit texture bindings when the framebuffer state changes. In particular, emitting the texture binding can also involve updating a texture from its backing copy during sampler view validation. The backing copy is made during framebuffer validation. This helps to fix an issue with Photoshop on VGPU9 (VMware bug 1723971). Reviewed-by: Charmaine Lee <[email protected]>
* svga: allow copy_region if sample counts matchCharmaine Lee2016-11-031-4/+10
| | | | | | | | | | | With this patch, we will allow blit with copy_region if the source and destination textures have the same sample counts. Fixes failures with piglit tests spec@arb_texture_float@multisample-formats 2 gl_arb_texture_float spec@arb_texture_rg@multisample-formats 2 gl_arb_texture_rg-float Reviewed-by: Brian Paul <[email protected]>
* svga: set rendered-to flag after updating the texture using PredCopyRegionCharmaine Lee2016-11-031-0/+4
| | | | | | | | | | This patch sets the rendered-to flag for the subresource after it is updated using the PredCopyRegion command. This is to ensure that the GB surface will be sync up properly before it will be directly mapped to. Tested with MTT piglit, glretrace. Reviewed-by: Brian Paul <[email protected]>
* svga: add can_use_upload flagCharmaine Lee2016-11-032-31/+37
| | | | | | | | | This patch adds a flag "can_use_upload" to svga_texture structure to avoid some checking of the upload availability at each transfer map time. Tested with Lightsmark2008, Tropics, MTT glretrace, piglit. Reviewed-by: Brian Paul <[email protected]>
* svga: fix texture upload path conditionCharmaine Lee2016-11-031-30/+60
| | | | | | | | | | | | As Thomas suggested, we'll first try to map directly to a GB surface. If it is blocked, then we'll use texture upload buffer. Also if a texture is already "rendered to", that is, the GB surface is already out of sync, then we'll use the texture upload buffer to avoid syncing the GB surface. Tested with Lightsmark2008, Tropics, MTT piglit, glretrace. Reviewed-by: Brian Paul <[email protected]>
* svga: set rendered_to flag with texture uploaded using TransferFromBuffer ↵Charmaine Lee2016-11-031-0/+4
| | | | | | | | | | | | command This patch sets the rendered_to flag for the texture subresource that is uploaded using the TransferFromBuffer command. This is to ensure that the subresource will be read back or invalidated before it will be directly mapped to. This makes sure that the content of the GB surface will not be accidentally overwritten by the device at suspend/resume time. Reviewed-by: Brian Paul <[email protected]>
* svga: Add render_condition boolean flag in struct svga_contextNeha Bhende2016-11-033-1/+6
| | | | | | | | | | | | | | set render_condition flag when driver performs conditional rendering. Blit using DXPredCopyRegion command gets affected by conditional rendering so We should check this flag while performing blit operation Tested with piglit tests. v2: As per Charmaine's comment, setting render_condition flag if svga_query is valid. Tested with pigit tests. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Charmaine Lee <[email protected]>
* svga: Allow DXPredCopyRegion for depth_and_stencil formats.Neha Bhende2016-11-031-4/+5
| | | | | | | | | | | | | | | DXPredCopyRegion supports copy between src and dst for depth_and_stencil formats if src and dst have same formats. tested ith piglit v2: As per Brian's comment, allow DXPredCopyRegion for depth+stencil buffers if the blit mask is PIPE_MASK_ZS. Tested with piglit tests and added new piglit test arb_framebuffer_object-depth-stencil-blit to test this particular testcase. Reviewed-by: Brian Paul <[email protected]>