summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Rework scheduling of thread switch to cut one more NOP.Eric Anholt2016-12-291-46/+75
| | | | | | | | | | | | | | Jonas's patch got us most of the benefit of scheduling instructions into the delay slots of thread switch, but if there had been nothing to pair the thrsw with, it would move the thrsw up and leave a NOP where the thrsw was. Instead, don't pair anything with thrsw through the normal scheduling path, and have a separate helper function that inserts the thrsw earlier if possible and inserts any necessary NOPs. total instructions in shared programs: 93027 -> 92643 (-0.41%) instructions in affected programs: 14952 -> 14568 (-2.57%)
* vc4: Fill thread switching delay slotsJonas Pfeil2016-12-291-7/+38
| | | | | | | | | | | | | | | Scan for instructions without a signal set in front of the switching instruction and move the signal up there. shader-db results: total instructions in shared programs: 94494 -> 93027 (-1.55%) instructions in affected programs: 23545 -> 22078 (-6.23%) v2: Fix re-emitting of the instruction in the loop trying to emit NOPs, drop a scheduling change from branch delay slots. (by anholt) Signed-off-by: Jonas Pfeil <[email protected]>
* vc4: Enable NIR-based loop unrolling.Eric Anholt2016-12-291-0/+5
| | | | | This successfully unrolls a new shader in GLB2.7, which also gets that shader to successfully compile in multithreaded mode.
* nir: stop gcc warning about uninitialised variablesTimothy Arceri2016-12-291-1/+1
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* radv: denote support for extended storage image formats.Dave Airlie2016-12-281-2/+4
| | | | | | | | | I'm sure anv has support for these as well, but this is just a first use of the interface to allow different supported spir-v features. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* spirv: add interface for drivers to define support extensions.Dave Airlie2016-12-285-4/+24
| | | | | | | | | | | I expect over time the struct contents will change as all drivers support stuff etc, but for now this should be a good starting point. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa/shaderobj: Fix races on refcountsChad Versace2016-12-281-10/+4
| | | | | | | | | | | | | | | | | | | | | | Use atomic ops when updating gl_shader::RefCount. Fixes intermittent failures and crashes in 'dEQP-EGL.functional.sharing.gles2.multithread.*'. All tests in that group now pass except 'dEQP-EGL.functional.sharing.gles2.multithread.simple_egl_server_sync.textures.copyteximage2d_texsubimage2d_render'. Tested with: mesa: branch 'master' at d6545f2 deqp: branch 'nougat-cts-dev' at 4acf725 with additional local fixes DEQP_TARGET: x11_egl hw: Intel Broadwell 0x1616 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99085 Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: [email protected] Cc: Mark Janes <[email protected]> Cc: Haixia Shi <[email protected]>
* freedreno/ir3: fix linkage::var sizeRob Clark2016-12-271-1/+1
| | | | | | | It should actually be 32 for a4xx/a5xx.. we still only advertise 16 but for a5xx the linkage map includes position/psize. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: treat clipvertex like a normal varyingRob Clark2016-12-271-3/+1
| | | | | | | | We need this in case it is streamed out. Not sure why we were treating it specially before. Having it as a VS out is harmless if FS doesn't have a matching input. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: transform-feedback supportRob Clark2016-12-277-38/+209
| | | | | | | | | | | We'll need to revisit when adding hw binning pass support, whether we can still do this in main draw step, as we do w/ a3xx/a4xx, or if we needed to move it to the binning stage. Still some failing piglits but most tests pass and the common cases seem to work. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2016-12-277-43/+81
| | | | | | | Pull in a5xx streamout related regs. Also fixes a couple incorrect register definitions. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: UBO support for 64b GPUs (a5xx)Rob Clark2016-12-271-3/+24
| | | | | | Update address calculation to support 64b addresses. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: rework location of driver constantsRob Clark2016-12-276-53/+75
| | | | | | | | | | | Rework how we lay out driver constants (driver-params, UBO/TFBO buffer addresses, immediates) for more flexibility. For a5xx+ we need to deal with the fact that gpu ptrs are 64b instead of 32b, which makes the fixed offset scheme not work so well. While we are dealing with that we might also make the layout more dynamic to account for varying # of UBOs, etc. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: fix emit for bo addressesRob Clark2016-12-271-3/+9
| | | | | | Reloc for the buffer address is two dwords on 64b devices (a5xx+) Signed-off-by: Rob Clark <[email protected]>
* freedreno/a5xx: texture layoutRob Clark2016-12-272-2/+2
| | | | | | | Seems to be imilar to a4xx, and sampler state "array-pitch" needs to be aligned to page size. Signed-off-by: Rob Clark <[email protected]>
* ttn: set ->info->num_ubosRob Clark2016-12-271-1/+4
| | | | | | | | | For dealing w/ 32b vs 64b gpu addresses, I need to rework how we pass UBO buffer addresses to shader, and knowing up front the # of UBOs is useful. But I noticed ttn wasn't setting this. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* anv: Handle vkGetPhysicalDeviceQueueFamilyProperties with count == 0Chad Versace2016-12-271-1/+8
| | | | | | | | | | | | The spec implicitly allows the incoming count to be 0. From the Vulkan 1.0.38 spec, Section 4.1 Physical Devices: If the value referenced by pQueueFamilyPropertyCount is not 0 [then do stuff]. Cc: [email protected] Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* egl: Emit correct error when robust context creation failsChad Versace2016-12-271-12/+26
| | | | | | | | | | | | | | | | | Fixes dEQP-EGL.functional.create_context_ext.robust_* on Intel with GBM. If the user sets the EGL_CONTEXT_OPENGL_ROBUST_ACCESS_BIT_KHR in EGL_CONTEXT_FLAGS_KHR when creating an OpenGL ES context, then EGL_KHR_create_context spec requires that we unconditionally emit EGL_BAD_ATTRIBUTE because that flag does not exist for OpenGL ES. When creating an OpenGL context, the spec requires that we emit EGL_BAD_MATCH if we can't support the request; that error is generated in the egl_dri2 layer where the driver capability is actually checked. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99188 Cc: [email protected] Reviewed-by: Tapani Pälli <[email protected]>
* anv: return count of queue families writtenDamien Grassart2016-12-271-0/+2
| | | | | | | | | | | The Vulkan spec indicates that vkGetPhysicalDeviceQueueFamilyProperties() should overwrite pQueueFamilyPropertyCount with the number of structures actually written to pQueueFamilyProperties. Signed-off-by: Damien Grassart <[email protected]> Reviewed-by: Chad Versace <[email protected]> Cc: [email protected]
* i965: Allow import/export of ARGB1555 imagesChad Versace2016-12-271-0/+3
| | | | | | | To my knowledge, this fixes no tests. I simply wrote the patch for completeness as a follow-up to the previous two patches. Reviewed-by: Tapani Pälli <[email protected]>
* mesa/texformat: Handle GL_RGBA + GL_UNSIGNED_SHORT_5_5_5_1Chad Versace2016-12-271-0/+2
| | | | | | | | | | | | | | | | _mesa_choose_tex_format() already handles GL_RGBA + GL_UNSIGNED_SHORT_1_5_5_5_REV by converting it to MESA_FORMAT_B5G5R5A1_UNORM. Teach it do the same for the non-reversed type. Otherwise, the switch's fallthrough converts it to an 8888 format, which has incompatible precision in the alpha channel. Patch 2/2 to fix dEQP-EGL.functional.image.modify.tex_rgb5_a1_tex_subimage_rgba8 on Intel. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99185 Cc: Haixia Shi <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: "13.0" <[email protected]>
* dri: Add __DRI_IMAGE_FORMAT_ARGB1555Chad Versace2016-12-272-0/+6
| | | | | | | | | | | | | This allows eglCreateImage() to accept textures of said format. Patch 1/2 to fix dEQP-EGL.functional.image.modify.tex_rgb5_a1_tex_subimage_rgba8 on Intel. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99185 Cc: Haixia Shi <[email protected]> Reviewed-by: Tapani Pälli <[email protected]> Cc: "13.0" <[email protected]>
* egl/dri2: implement query surface hookTapani Pälli2016-12-271-0/+36
| | | | | | | | | | | | | | | | This makes better guarantee that the values we return are in sync what the underlying drawable currently has. Together with dEQP change in bug #98327 this fixes following test: dEQP-EGL.functional.resize.surface_size.grow v2: avoid unnecessary x11 roundtrips (Chad Versace) Signed-off-by: Tapani Pälli <[email protected]> Tested-by: Mark Janes <[email protected]> Reviewed-by: Chad Versace <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98327
* radv: add some asserts for operations on general queueDave Airlie2016-12-272-0/+3
| | | | | | These might be useful in the future, or not. Signed-off-by: Dave Airlie <[email protected]>
* radv: Also skip DCC clear flushes for compute.Bas Nieuwenhuizen2016-12-274-12/+16
| | | | | | (airlied: fixes DOOM hang with compute queue enabled) Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Bas Nieuwenhuizen <[email protected]>
* radv: handle queue present directly to winsysDave Airlie2016-12-261-1/+9
| | | | | | | | | | Don't call the QueueSubmit interface, just call direct to the winsys, so we can pass the wait semaphores. Noticed while debugging doom, doesn't fix anything. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* intel/blorp_blit: Fix max blit size for gen6Jordan Justen2016-12-261-2/+3
| | | | | | | | Fixes ES3-CTS.gtf.GL3Tests.framebuffer_blit.framebuffer_blit_functionality_stencil_blit Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radv: fix rendering to b10g11r11_ufloat_pack32Dave Airlie2016-12-261-1/+1
| | | | | | | | | | doom was causing a printf about an illegal color, it was due the non-void returning -1, and the other function checking for 4, align these. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: handle multi-component shared load/stores.Dave Airlie2016-12-261-12/+29
| | | | | | | This was seen in doom shaders, so handle it properly. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave AIrlie <[email protected]>
* clover: Use Clang's diagnosticsVedran Miletić2016-12-241-1/+6
| | | | | | | | | | | | | | | | Presently errors from frontend are handled only if they occur in clang::CompilerInvocation::CreateFromArgs(). This patch uses clang::DiagnosticsEngine to detect errors such as invalid values for Clang frontend arguments. Fixes Piglit's cl/program/build/fail/invalid-version-declaration.cl test. v2: fix inconsistent code formatting Signed-off-by: Vedran Miletić <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Tested-by: Aaron Watry <[email protected]>
* radv: return count of queue families writtenDamien Grassart2016-12-251-1/+4
| | | | | | | | | | The Vulkan spec indicates that vkGetPhysicalDeviceQueueFamilyProperties() should overwrite pQueueFamilyPropertyCount with the number of structures actually written to pQueueFamilyProperties. Signed-off-by: Damien Grassart <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* i965/generator/tex: Handle an immediate sampler with an indirect textureJason Ekstrand2016-12-232-4/+12
| | | | | | | | | In this case we were dying when we tried to do SHL addr sampler imm(8) because that puts an immediate in src0 of a two source instruction. This fixes 2704 of the new separate sampler Vulkan CTS tests on Sky Lake. Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: "13.0" <[email protected]>
* swr: fix icc compile errorBruce Cherniak2016-12-231-1/+1
| | | | | | | | ICC doesn't like the use of nullptr (std::nullptr_t) argument in p_atomic_set. GCC and clang don't complain. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99119 Reviewed-by: Tim Rowley <[email protected]>
* radv: set some proper values for interp offset limits.Dave Airlie2016-12-231-3/+3
| | | | | | | | These are taken from the amdgpu-pro driver, and cause no CTS change. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* radv: bump texel offsets to align with radeonsiDave Airlie2016-12-231-4/+4
| | | | | | | | | | it appears from the amdgpu-pro results the hw can do more, but let's just align with radeonsi for now. No CTS regressions. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nir/algebraic: Add optimizations for "a == a && a CMP b"Jason Ekstrand2016-12-221-0/+8
| | | | | | | | This sequence shows up The Talos Principal, at least under Vulkan, and prevents loop analysis from properly computing trip counts in a few loops. Reviewed-by: Ian Romanick <[email protected]>
* i965: Use nir_opt_trivial_continues and nir_opt_ifJason Ekstrand2016-12-221-0/+9
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* nir: Add a pass for moving SPIR-V continue blocks to the ends of loopsJason Ekstrand2016-12-223-0/+259
| | | | | | | | | | | | | | | When shaders come in from SPIR-V, we handle continue blocks by placing the contents of the continue inside of a "if (!first_iteration)". We do this so that we can properly handle the fact that continues in SPIR-V jump to the continue block at the end of the loop rather than jumping directly to the top of the loop like they do in NIR. In particular, the increment step of a simple for loop ends up in the continue block. This pass looks for this case in loops that don't actually have any continues and moves the continue contents to the end of the loop instead. We need this because loop unrolling doesn't work if the increment is inside of a condition. Reviewed-by: Timothy Arceri <[email protected]>
* nir: Add an optimization pass to remove trivial continuesJason Ekstrand2016-12-223-0/+140
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* nir: Correctly handle blocks in cf_node_cf_tree_nextJason Ekstrand2016-12-221-1/+1
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* i965: make use of nir_lower_returns() for GLTimothy Arceri2016-12-232-6/+2
| | | | | | | | | | | | | | | | | | | | | Fixes two new piglit tests: spec/glsl-1.10/execution/vs-nested-return-sibling-loop.shader_test spec/glsl-1.10/execution/vs-nested-return-sibling-loop2.shader_test shader-db results for BDW: total instructions in shared programs: 12903158 -> 12903134 (-0.00%) instructions in affected programs: 27100 -> 27076 (-0.09%) helped: 32 HURT: 6 total cycles in shared programs: 294922518 -> 294922804 (0.00%) cycles in affected programs: 4372828 -> 4373114 (0.01%) helped: 31 HURT: 8 Reviewed-by: Kenneth Graunke <[email protected]>
* nir: update nir_lower_returns to only predicate instructions when neededTimothy Arceri2016-12-231-6/+41
| | | | | | | | | | | | | | Unless an if statement contains nested returns we can simply add any following instructions to the branch without the return. V2: fix handling if_nested_return value when there is a sibling if/loop that doesn't contain a return. (Spotted by Ken) V3: - add a better comment to the new variable - remove instructions after if when both branches return Reviewed-by: Jason Ekstrand <[email protected]>
* i965: disable loop unrolling in GLSL IRTimothy Arceri2016-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a single regression in loop unrolling which is: loops HURT: shaders/orbital_explorer.shader_test GS SIMD8: 0 -> 1 However the loop is huge so it seems reasonable not to unroll it. It's surprising that GLSL IR does unroll it. shader-db results BDW: total instructions in shared programs: 13037455 -> 13036947 (-0.00%) instructions in affected programs: 17982 -> 17474 (-2.83%) helped: 63 HURT: 25 total cycles in shared programs: 262217870 -> 262227990 (0.00%) cycles in affected programs: 2287046 -> 2297166 (0.44%) helped: 969 HURT: 844 total loops in shared programs: 2951 -> 2952 (0.03%) loops in affected programs: 0 -> 1 helped: 0 HURT: 1 LOST: 0 GAINED: 1 Reviewed-by: Jason Ekstrand <[email protected]>
* i965: use nir loop unrolling passTimothy Arceri2016-12-232-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shader-db results for BDW: total instructions in shared programs: 12589614 -> 12590119 (0.00%) instructions in affected programs: 50525 -> 51030 (1.00%) helped: 7 HURT: 145 total cycles in shared programs: 241524604 -> 241490502 (-0.01%) cycles in affected programs: 1941404 -> 1907302 (-1.76%) helped: 302 HURT: 449 total loops in shared programs: 4245 -> 2947 (-30.58%) loops in affected programs: 1535 -> 237 (-84.56%) helped: 1142 HURT: 0 total spills in shared programs: 14453 -> 14453 (0.00%) spills in affected programs: 0 -> 0 helped: 0 HURT: 0 total fills in shared programs: 18984 -> 18984 (0.00%) fills in affected programs: 0 -> 0 helped: 0 HURT: 0 LOST: 26 GAINED: 15 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: pass compiler rather than devinfo to functions that call nir_optimizeTimothy Arceri2016-12-237-21/+18
| | | | | | | Later we will pass compiler to nir_optimise to be used by the loop unroll pass. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add a loop unrolling passTimothy Arceri2016-12-233-0/+578
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | V2: - tidy ups suggested by Connor. - tidy up cloning logic and handle copy propagation based of suggestion by Connor. - use nir_ssa_def_rewrite_uses to fix up lcssa phis suggested by Connor. - add support for complex loop unrolling (two terminators) - handle case were the ssa defs use outside the loop is already a phi - support unrolling loops with multiple terminators when trip count is know for each terminator V3: - set correct num_components when creating phi in complex unroll - rewrite update remap table based on Jasons suggestions. - remove unrequired extract_loop_body() helper as suggested by Jason. - simplify the lcssa phi fix up code for simple loops as per Jasons suggestions. - use mem context to keep track of hash table memory as suggested by Jason. - move is_{complex,simple}_loop helpers to the unroll code - require nir_metadata_block_index - partially rewrote complex unroll to be simpler and easier to follow. V4: - use rzalloc() when creating nir_phi_src but not setting pred right away fixes regression cause by ralloc() no longer zeroing memory. V5: - simplify calling of complex_unroll() - use new loop terminator fields to get the break/continue from blocks and simplify loop unrolling code - handle slightly less trivial loop terminators. if branches can now have instructions but can only contain a single block. - use nir print type IR snippets in unroll function descriptions - add better explanation and variable for why we need to clone additional times when the second terminator it the limiting terminator. - partially convert out of ssa before unrolling loops (suggested by Jason) v6: - remove unused nir_builder - use Jasons new from ssa helper - tidy/fixup cursor use - unroll terminators that contain control flow correctly - unroll complex loops with control flow before the terminators correctly Reviewed-by: Jason Ekstrand <[email protected]>
* nir: add helper for cloning nir_cf_listTimothy Arceri2016-12-232-9/+56
| | | | | | | | | V2: - updated to create a generic list clone helper nir_cf_list_clone() - continue to assert on clone when fallback flag not set as suggested by Jason. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: update fixup_phi_srcs() to handle registersTimothy Arceri2016-12-231-4/+9
| | | | | | | We need to do this because we partially get out of SSA when unrolling and cloning loops. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: create helper for fixing phi srcs when cloningTimothy Arceri2016-12-231-15/+21
| | | | | | | This will be useful for fixing phi srcs when cloning a loop body during loop unrolling. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a LCSAA-passThomas Helland2016-12-233-0/+206
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | V2: Do a "depth first search" to convert to LCSSA V3: Small comment fixup V4: Rebase, adapt to removal of function overloads V5: Rebase, adapt to relocation of nir to compiler/nir Still need to adapt to potential if-uses Work around nir_validate issue V6 (Timothy): - tidy lcssa and stop leaking memory - dont rewrite the src for the lcssa phi node - validate lcssa phi srcs to avoid postvalidate assert - don't add new phi if one already exists - more lcssa phi validation fixes - Rather than marking ssa defs inside a loop just mark blocks inside a loop. This is simpler and fixes lcssa for intrinsics which do not have a destination. - don't create LCSSA phis for loops we won't unroll - require loop metadata for lcssa pass - handle case were the ssa defs use outside the loop is already a phi V7: (Timothy) - pass indirect mask to metadata call v8: (Timothy) - make convert to lcssa a helper function rather than a nir pass - replace inside loop bitset with on the fly block index logic. - remove lcssa phi validation special cases - inline code from useless helpers, suggested by Jason. - always do lcssa on loops, suggested by Jason. - stop making lcssa phis special. Add as many source as the block has predecessors, suggested by Jason. V9: (Timothy) - fix regression with the is_lcssa_phi field not being initialised to false now that ralloc() doesn't zero out memory. V10: (Timothy) - remove extra braces in SSA example, pointed out by Topi V11: (Timothy) - add missing support for LCSSA phis in if conditions. V12: (Timothy) - small tidy up suggested by Jason. - always create lcssa phi even if it just points to an lcssa phi from an inner loop Reviewed-by: Jason Ekstrand <[email protected]>