summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radeonsi: fix passing gl_ClipVertex for GS and tessMarek Olšák2018-05-253-4/+8
| | | | | | Also add the fprintf call. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix color inputs/outputs for GS and tessMarek Olšák2018-05-253-20/+34
| | | | | | | | | | GS is tested, tessellation is untested. Have outputs_written_before_ps for HW VS and outputs_written for other stages. The reason is that COLOR and BCOLOR alias for HW VS, which drives elimination of VS outputs based on PS inputs. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: fix incorrect parentheses around VS-PS varying eliminationMarek Olšák2018-05-251-2/+2
| | | | | | | I don't know if it caused issues. Cc: 18.0 18.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: simplify lastLevel determination in st_finalize_textureMarek Olšák2018-05-251-13/+4
| | | | | | | | | | | | | This fixes shader images where we always bind stObj->pt and not individual gl_texture_images. Roughly based on i965 commit 845ad2667ab2466752f06ea30bdb9c837116c308 which does a similar thing but for a different reason. This fixes GL CTS assertion failures introduced by Ilia. Cc: 18.0 18.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* i965/tiled_memcpy: inline movntdqa loads in tiled_to_linearScott D Phillips2018-05-254-5/+88
| | | | | | | | | | | | | | | | | | | | The reference for MOVNTDQA says: For WC memory type, the nontemporal hint may be implemented by loading a temporary internal buffer with the equivalent of an aligned cache line without filling this data to the cache. [...] Subsequent MOVNTDQA reads to unread portions of the WC cache line will receive data from the temporary internal buffer if data is available. This hidden cache line sized temporary buffer can improve the read performance from wc maps. v2: Add mfence at start of tiled_to_linear for streaming loads (Chris) Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Matt Turner <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* swr/rast: Adjusted avx512 primitive assembly for msvc codegenAlok Hota2018-05-251-49/+90
| | | | | | | | | Optimize AVX-512 PA Assemble (PA_STATE_OPT). Reduced generated code by about 4x, MSVC compiler was going crazy making temporaries and split-loading inputs onto the stack unless explicit AVX-512 load ops were added Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Moved memory init out of core swr initAlok Hota2018-05-257-7/+86
| | | | | | | | Added two new files for a wrapper function for initialization v2: added missing include for single architecture builds Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Removed superfluous JitManager argument from passesAlok Hota2018-05-256-14/+13
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Renamed MetaData callsAlok Hota2018-05-252-87/+87
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Use metadata to communicate between passesAlok Hota2018-05-251-0/+28
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Check gCoreBuckets/CORE_BUCKETS equal length at compile timeAlok Hota2018-05-251-0/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Added in-place building to SCATTERPSAlok Hota2018-05-251-9/+20
| | | | | | | SCATTERPS previously assumed it was being used with an existing basic block Reviewed-by: Bruce Cherniak <[email protected]>
* radv: run the EarlyCSEMemSSA LLVM passSamuel Pitoiset2018-05-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It's recommended by the instruction combining pass, and RadeonSI also runs it. This pass used to segfault with one shader of F12017 in the past, but it no longer crashes. Maybe the LLVM IR generated by RADV has changed. Polaris10: Totals from affected shaders: SGPRS: 441352 -> 441648 (0.07 %) VGPRS: 310888 -> 300784 (-3.25 %) Spilled SGPRs: 13576 -> 12983 (-4.37 %) Code Size: 22560328 -> 22420544 (-0.62 %) bytes Max Waves: 40755 -> 41366 (1.50 %) Vega10: Totals from affected shaders: SGPRS: 442848 -> 442000 (-0.19 %) VGPRS: 310396 -> 300460 (-3.20 %) Spilled SGPRs: 13708 -> 12906 (-5.85 %) Code Size: 22479428 -> 22336216 (-0.64 %) bytes Max Waves: 45783 -> 46506 (1.58 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: fix dumping compute shader on the graphics queueSamuel Pitoiset2018-05-251-5/+8
| | | | | | | The graphics pipeline can be NULL. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_dump_pipeline_state() helperSamuel Pitoiset2018-05-251-6/+11
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: rework how shaders are dumped when generating a hang reportSamuel Pitoiset2018-05-251-26/+15
| | | | | | | Use a flag for the active stages instead. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: remove unused parameter in radv_dump_annotated_shader()Samuel Pitoiset2018-05-251-8/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* mesa: do not leak ctx->Shader.ReferencedProgram referencesJose Dapena Paz2018-05-251-0/+3
| | | | | | | | | | | | When glUseProgram is used, references to the included shaders are added in ctx->Shader.ReferencedProgram. But those references are not decreased when the shader data is deallocated. Thus, those shaders are leaked. Explicitely remove the pending references to these shaders. Fixes: e6506b3cd23 ("mesa: retain gl_shader_programs after glDeleteProgram if they are in use") Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: set DB_EQAA.MAX_ANCHOR_SAMPLES correctlyMarek Olšák2018-05-241-4/+12
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: round ps_iter_samples in set_min_samplesMarek Olšák2018-05-242-3/+5
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove redundant ps_iter_samples clampMarek Olšák2018-05-241-1/+0
| | | | | | | si_get_ps_iter_samples already does this. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove some old gfx 9.x registersMarek Olšák2018-05-241-48/+0
| | | | | | | Leftover from bring up. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: disable primitive binning for all blitter opsMarek Olšák2018-05-243-2/+12
| | | | | | | same as amdvlk. Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* ac/surface/gfx6: don't overallocate mipmapped HTILEMarek Olšák2018-05-241-2/+11
| | | | | Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* egl/x11: deduplicate depth-to-format logicEric Engestrom2018-05-243-33/+26
| | | | | | Suggested-by: Emil Velikov <[email protected]> Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965: enable OES_texture_view for gen8+Tapani Pälli2018-05-241-1/+2
| | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: changes to expose OES_texture_view extensionTapani Pälli2018-05-246-6/+32
| | | | | | | | | | | Functionality already covered by ARB_texture_view, patch also adds missing 'gles guard' for enums (added in f1563e6392). Tested via arb_texture_view.*_gles3 tests and individual app utilizing texture view with ETC2. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* docs: update release calendar for 18.1 seriesJuan A. Suarez Romero2018-05-241-26/+19
| | | | | | | | | | | v2: extend 18.1 series (Andres) v3: fix copy/paste typo (Engestrom) CC: Andres Gomez <[email protected]> CC: Emil Velikov <[email protected]> CC: Dylan Baker <[email protected]> Reviewed-by: Andres Gomez <[email protected]> Reviewed-by: Dylan Baker <[email protected]>
* radv: call nir_lower_io_to_temporaries for VS, GS, TES and FSSamuel Pitoiset2018-05-241-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not lower FS inputs because this moves all load_var instructions at beginning of shaders and because interp_var_at_sample (and friends) seem broken. That might be eventually enabled later on if we really want to preload all FS inputs at beginning. Polaris10: Totals from affected shaders: SGPRS: 54072 -> 54264 (0.36 %) VGPRS: 38580 -> 38124 (-1.18 %) Spilled SGPRs: 652 -> 652 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Code Size: 2128116 -> 2127380 (-0.03 %) bytes Max Waves: 8048 -> 8086 (0.47 %) Vega10: Totals from affected shaders: SGPRS: 52616 -> 52656 (0.08 %) VGPRS: 37536 -> 37116 (-1.12 %) Spilled SGPRs: 828 -> 828 (0.00 %) Code Size: 2043756 -> 2042672 (-0.05 %) bytes Max Waves: 9176 -> 9254 (0.85 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: call nir_split_var_copies() before nir_lower_var_copies()Samuel Pitoiset2018-05-241-0/+3
| | | | | | | | This doesn't nothing special currently because we don't create any copy_var instructions, but this is needed for the next patch. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965: Use intel_bufferobj_buffer() wrapper in image surface state setup.Francisco Jerez2018-05-231-3/+5
| | | | | | | | | | | | | | | Instead of directly using intel_obj->buffer. Among other things intel_bufferobj_buffer() will update intel_buffer_object:: gpu_active_start/end, which are used by glBufferSubData() to decide which path to take. Fixes a failure in the Piglit ARB_shader_image_load_store-host-mem-barrier Buffer Update/WaW tests, which could be reproduced with a non-standard glGetTexSubImage implementation (see bug report). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105351 Reported-by: Nanley Chery <[email protected]> Cc: [email protected] Reviewed-by: Nanley Chery <[email protected]>
* i965: Handle non-zero texture buffer offsets in buffer object range calculation.Francisco Jerez2018-05-231-1/+3
| | | | | | | | | | Otherwise the specified surface state will allow the GPU to access memory up to BufferOffset bytes past the end of the buffer. Found by inspection. v2: Protect against out-of-range BufferOffset (Nanley). Cc: [email protected] Reviewed-by: Nanley Chery <[email protected]>
* i965: Move buffer texture size calculation into a common helper function.Francisco Jerez2018-05-231-23/+32
| | | | | | | | | | | | | The buffer texture size calculations (should be easy enough, right?) are repeated in three different places, each of them subtly broken in a different way. E.g. the image load/store path was never fixed to clamp to MaxTextureBufferSize, and none of them are taking into account the buffer offset correctly. It's easier to fix it all in one place. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106481 Reviewed-by: Nanley Chery <[email protected]>
* Revert "mesa: simplify _mesa_is_image_unit_valid for buffers"Francisco Jerez2018-05-231-13/+12
| | | | | | | | | | | | | | | | | | This reverts commit c0ed52f6146c7e24e1275451773bd47c1eda3145. It was preventing the image format validation from being done on buffer textures, which is required to ensure that the application doesn't attempt to bind a buffer texture with an internal format incompatible with the image unit format (e.g. of different texel size), which is not allowed by the spec (it's not allowed for *any* texture target, whether or not there is spec wording restricting this behavior specifically for buffer textures) and will cause the driver to calculate texel bounds incorrectly and potentially crash instead of the expected behavior. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106465 Reviewed-by: Nanley Chery <[email protected]>
* ac: Use DPP for build_ddxy where possible.Bas Nieuwenhuizen2018-05-231-1/+15
| | | | | | | | | | | | WQM is pretty reliable now on LLVM 7, so let us just use DPP + WQM. This gives approximately a 1.5% performance increase on the vrcompositor built-in benchmark. v2: Use ac_build_quad_swizzle. Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: add {X,A}BGR2101010 to 'intel_image_formats'Miguel Casas2018-05-231-0/+6
| | | | | | | | | This patch adds {X,A}BGR2101010 entries to the list of supported 'intel_image_formats'. Bug: https://crbug.com/776093 Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* dri_util: Add R10G10B10{A,X}2 translation between DRI and mesa_format.Miguel Casas2018-05-231-0/+8
| | | | | | | | | Add R10G10B10{A,X}2 translation between mesa_format and DRI format to driGLFormatToImageFormat() and driImageFormatToGLFormat(). Bug: https://crbug.com/776093 Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* bin/get-pick-listh.sh: force git --pretty=mediumDylan Baker2018-05-231-1/+1
| | | | | | Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Andres Gomez <[email protected]>
* bin/bugzilla_mesa.sh: explicitly set the --pretty argumentDylan Baker2018-05-231-1/+1
| | | | | | Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Andres Gomez <[email protected]>
* docs: drop unnecessary out-of-frame targetEric Engestrom2018-05-231-12/+9
| | | | | | | | | I'm guessing an earlier version of the website used to have the page contents in <frames>, but this isn't the case anymore so just drop the unnecessary `target="_main"` :) Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* docs: fix various html tags mistakesEric Engestrom2018-05-233-1/+4
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* docs: fix `<` & `>` used in html codeEric Engestrom2018-05-232-5/+5
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* docs: add news notes to 18.1.0Juan A. Suarez Romero2018-05-231-0/+7
| | | | | | CC: Dylan Baker <[email protected]> Reviewed-by: Dylan Baker <[email protected]> Reviewed-by: Andres Gomez <[email protected]>
* tgsi/scan: add hw atomic to the list of memory accessing filesDave Airlie2018-05-231-1/+2
| | | | | | | | This fixes 4 out of 5 cases in: arb_framebuffer_no_attachments-atomic on cayman. Reviewed-by: Marek Olšák <[email protected]> Cc: "18.0 18.1" <[email protected]>
* llvmpipe: improve rasterization discard logicRoland Scheidegger2018-05-2315-89/+118
| | | | | | | | | | | | | | | | | | | | | | This unifies the explicit rasterization discard as well as the implicit rasterization disabled logic (which we need for another state tracker), which really should do the exact same thing. We'll now toss out the prims early on in setup with (implicit or explicit) discard, rather than do setup and binning with them, which was entirely pointless. (We should eventually get rid of implicit discard, which should also enable us to discard stuff already in draw, hence draw would be able to skip the pointless clip and fallback stages in this case.) We still need separate logic for only null ps - this is not the same as rasterization discard. But simplify the logic there and don't count primitives simply when there's an empty fs, regardless of depth/stencil tests, which seems perfectly acceptable by d3d10. While here, also fix statistics for primitives if face culling is enabled. No piglit changes. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* ac/surface/gfx6: Don't force a tile index for fmask.Bas Nieuwenhuizen2018-05-231-1/+1
| | | | | | | | | | | | | | The bpe of the fmask often differs from the bpe of the main surface. On SI that means it has to get a different tile index. addrlib is capable of figuring this out itself, so just pass -1 instead to let it know that it is not preset. Fixes: 9bf3570fed0 "ac/surface/gfx6: compute FMASK together with the color surface" Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106511 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106499 Reviewed-by: Marek Olšák <[email protected]>
* i965: Remove ring switching entirelyJason Ekstrand2018-05-2211-105/+61
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/miptree: Move the access_raw call to the individual map functionsJason Ekstrand2018-05-221-3/+13
| | | | | | | | | | | The only function that doesn't need to call access_raw is map_blit. If it takes the blitter path, it will happen as part of intel_miptree_copy. If map_blit takes the blorp path, brw_blorp_copy_miptrees will handle doing whatever resolves are needed. This should save us resolves in quite a few cases and will probably help performance a bit. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove support for the BLT ringJason Ekstrand2018-05-221-9/+3
| | | | | | | We still support the blitter on gen4-5 but it's on the same ring as 3D. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/miptree: Use blorp for blit maps on gen6+Jason Ekstrand2018-05-221-11/+25
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>