summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* draw: finally optimize bool clip mask generationRoland Scheidegger2016-11-183-23/+26
| | | | | | | | | | | lp_build_any_true_range is just what we need, though it will only produce optimal code with sse41 (ptest + set) - but even without it on 64bit x86 the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's going to be roughly the same as before. While here also make it a "real" 8bit boolean - cuts one instruction but more importantly similar to ordinary booleans. Reviewed-by: Jose Fonseca <[email protected]>
* draw: use vectorized calculations for fetch (v2)Roland Scheidegger2016-11-181-131/+265
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the mul could actually be optimized away too), where things are still scalar. To eliminate control flow in the main shader loop fetch, provide fake buffers (so index 0 is always valid to fetch). Still uses aos fetch though in the end - mostly because some more code would be needed to handle unaligned fetches in that path, and because for most formats it won't make a difference anyway (we generate some truly horrendous code for things like R16G16_something for instance). Instanced fetch however stays roughly the same as before, except that no longer the same element is fetched multiple times (I've seen a reduction of ~3 times in main shader loop size due to llvm not recognizing it's all the same fetch, since it would have been possible some of the fetches getting replaced with zeros in case vector size exceeds remaining fetch count - the values of such fetches don't matter at all though). Also, for elts gathering, use vectorized code as well. The generated shaders are smaller and faster to compile (not entirely sure about execution speed, but generally unless there's just single vertices to handle I would expect it to be faster - there's more opportunities for future improvements by using soa fetch). v3: skip the fake index buffer, not needed due to the jit code never seeing the real index buffer in the first place. Fix a bug with mask expansion (needs SExt, not ZExt). Also, be really really careful to keep the behavior the same, even in cases where it looks wrong, and add comments why the code is doing the seemingly wrong stuff... Fortunately it's not actually more complex in the end... Also change function order slightly just to make the diff more readable. No piglit change. Passes some internal testing with another api too... Reviewed-by: Jose Fonseca <[email protected]>
* i965/gen7: Minify blit size for stencil tree copyJordan Justen2016-11-171-2/+4
| | | | | | | | | Found by the piglit 'fbo-depth-array stencil-clear' test when implementing blorp blit splitting for gen7. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: Drop PATH_MAX usage.Kenneth Graunke2016-11-172-34/+15
| | | | | | | | | | | | | | | | | | | | | | | | GNU/Hurd does not define PATH_MAX since it doesn't have such arbitrary limitation, so this failed to compile. Apparently glibc does not enforce PATH_MAX restrictions anyway, so it's kind of a hoax: https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html MSVC uses a different name (_MAX_PATH) as well, which is annoying. We don't really need it. We can simply asprintf() the filenames. If the filename exceeds an OS path limit, presumably fopen() will fail, and we already check that. (We actually use ralloc_asprintf because Mesa provides that everywhere, and it doesn't look like we've provided an implementation of GNU's asprintf() for all platforms.) Fixes the build on GNU/Hurd. Cc: "13.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98632 Signed-off-by: Samuel Thibault <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Fix compute shader crash.Kenneth Graunke2016-11-171-1/+1
| | | | | | | | Fixes crashes when starting Deus Ex: Mankind Divided. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* anv/TODO: Check off render buffer compressionJason Ekstrand2016-11-171-1/+0
| | | | | There's still a tiny bit of work to do for storage images but it's otherwise pretty much done at this point.
* anv: Enable "permanent" compression for immutable format imagesJason Ekstrand2016-11-172-1/+26
| | | | | | | | | | | This commit extends our support of color compression to surfaces without the VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT set. These images will never have an image view created with a different format then the one set at image creation time so it's safe to always use compression. We still bail if the image is used as a storage image because that sometimes ends up using a different format. Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Properly handle color compression in blorp_copyJason Ekstrand2016-11-173-4/+183
| | | | | | | | | Previously, blorp copy operations were CCS-unaware so you had to perform resolves on the source and destination before performing the copy. This commit makes blorp_copy capable of handling CCS-compressed images without any resolves. Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Always use UINT formats on SKL+Jason Ekstrand2016-11-171-22/+44
| | | | | | | | | Many of these UINT formats aren't available prior to Sky Lake so we used UNORM formats. Using UINT formats is a bit nicer because it guarantees we don't run into rounding issues. Also, we will need it in the next commit for handling copies with CCS enabled. Reviewed-by: Topi Pohjolainen <[email protected]>
* i965/blorp: Rework resolve handlingJason Ekstrand2016-11-171-92/+88
| | | | | | | | | This commit moves the handling of resolves into blorp_surf_for_miptree(). Instead of each helper doing resolves and checks itself, it simply tells blorp_surf_for_miptree which aux modes are supported by the given blorp operation and blorp_surf_for_miptree will resolve as-needed. Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/image: Add an aux_usage field for "default" auxJason Ekstrand2016-11-174-19/+45
| | | | | | | Initially, the field is set to ISL_AUX_USAGE_NONE so this commit shouldn't bring any functional changes. Setting this field to something else will cause all sampled and storage image views to be created with AUX and blorp will start trying to respect it so set with care.
* anv: Add initial support for Sky Lake color compressionJason Ekstrand2016-11-174-34/+169
| | | | | | | This commit adds basic support for color compression. For the moment, color compression is only enabled within a render pass and a full resolve is done before the render pass finishes. All texturing operations still happen with CCS disabled.
* anv/pass: Precompute some subpass usage informationJason Ekstrand2016-11-172-7/+43
|
* util/vk_alloc: Add a vk_zalloc2 helperJason Ekstrand2016-11-171-0/+16
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/image: Memset all aux surfaces (not just HiZ) to 0Jason Ekstrand2016-11-171-4/+6
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/image: Rename hiz_surface to aux_surfaceJason Ekstrand2016-11-173-17/+18
|
* anv/blorp: Ignore clears for attachments first used as resolve destinationsJason Ekstrand2016-11-171-9/+11
| | | | | | | | Otherwise, we'll try to clear it the first time it's used as a draw so if you do some multisampled rendering, resolve to an attachment, and then draw on top of the single-sampled attachment, we might accidentally clear it. Cc: "13.0" <[email protected]>
* intel/blorp: Take a fast_clear_op in ccs_resolveJason Ekstrand2016-11-174-15/+29
| | | | | | | | Eventually, we may want to just have a single blorp_ccs_op function that does both clears and resolves. For now we'll stick to just making the ccs_resolve function we have now a bit more configurable. Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/blorp: Add plumbing for color resolve slice detailsPohjolainen, Topi2016-11-173-4/+10
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/isl: Allow non-2D CCS surfacesJason Ekstrand2016-11-171-2/+2
| | | | | | | | | The CCS calculations in ISL are already correct for 1-D and 3-D CCS surfaces since they have exactly the same layout as 2-D array surfaces (at least on Sky Lake). The only problem was that we weren't passing in the right dimensionality and we weren't passing in the depth. Reviewed-by: Topi Pohjolainen <[email protected]>
* intel/isl: Rework the asserts and fails in isl_surf_get_ccsJason Ekstrand2016-11-171-2/+7
| | | | | | | There are some invariants such as number of samples on which we should assert. However, most other things should silently return false since they're much easier for isl_surf_get_ccs to check than the caller. We also update the checking to be a bit more complete.
* anv/cmd_buffer: Refactor surface state relocation handlingJason Ekstrand2016-11-171-13/+22
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/cmd_buffer: Pull add_surface_state_reloc into genX_cmd_buffer.cJason Ekstrand2016-11-172-16/+14
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* anv/image: Stop force-disabling AUXJason Ekstrand2016-11-171-4/+0
| | | | | | | Auxiliary surfaces have to be created manually anyway so force-disabling it does nothing whatsoever at the moment. Reviewed-by: Topi Pohjolainen <[email protected]>
* mesa: Add missing call to _mesa_unlock_debug_state(ctx); v2Tom Stellard2016-11-171-1/+3
| | | | | | | | | | | | cd724208d3e1e3307f84a794f2c1fc83b69ccf8a added a call to _mesa_lock_debug_state(ctx) but wasn't unlocking the debug state. This fixes a hang in glsl-fs-loop piglit test with MESA_DEBUG=context. v2: - Remove unrelated changes. Reviewed-by: Brian Paul <[email protected]>
* egl: fix helper function nameEric Engestrom2016-11-171-4/+4
| | | | | | | | | I introduced this code last month, but didn't follow the naming convention. Fix this. Fixes: 0a606a400fe382a9bc72 ("egl: add eglSwapBuffersWithDamageKHR") Reviewed-by: Tapani Pälli <[email protected]> Signed-off-by: Eric Engestrom <[email protected]>
* egl/x11: misc style fixesEric Engestrom2016-11-172-2/+2
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* egl: fix function name in debug stringEric Engestrom2016-11-171-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* nir/spirv: Fix handling of gl_PrimitiveIdJason Ekstrand2016-11-161-2/+6
| | | | | | | | | | Before, we were always treating it as an output which bogus. The only stage in which this it can be an output is the geometry stage. In all other stages, it's an input which, in the back-end, we actually want to be a system value. Cc: "13.0" <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* anv/fence: Handle ANV_FENCE_CREATE_SIGNALED_BITJason Ekstrand2016-11-161-1/+5
| | | | | Cc: "13.0" <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* anv: Handle null in all destructorsJason Ekstrand2016-11-169-1/+65
| | | | | | | | | | This fixes a bunch of new CTS tests which look for exactly this. Even in the cases where we just call vk_free to free a CPU data structure, we still handle NULL explicitly. This way we're less likely to forget to handle NULL later should we actually do something less trivial. Cc: "13.0" <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* util/vk_alloc: Ensure NULL is handled correctly in vk_freeJason Ekstrand2016-11-161-0/+3
| | | | Reviewed-by: Dave Airlie <[email protected]>
* anv/device: Silence a 32-bit warningJason Ekstrand2016-11-161-1/+2
|
* nir: Avoid an extra NIR op in integer divide lowering.Eric Anholt2016-11-161-2/+1
| | | | | | NIR bools are ~0 for true, so ((unsigned)a >> 31) != 0 -> ((int)a >> 31). Reviewed-by: Kenneth Graunke <[email protected]>
* vc4: Try compiling our FSes in multithreaded mode on new kernels.Eric Anholt2016-11-165-2/+20
| | | | | | Multithreaded fragment shaders let us hide texturing latency by a hyperthreading-style switch to another fragment shader. This gets us up to 20% framerate improvements on glmark2 tests.
* vc4: Add support for ETC1 textures if the kernel is new enough.Eric Anholt2016-11-164-5/+18
| | | | | The kernel changes for exposing the param have now been merged, so we can expose it here.
* vc4: Fix simulator mode missing-GETPARAM debug info.Eric Anholt2016-11-161-1/+1
| | | | The value is 0 since we didn't set it, we wanted to see the param.
* vc4: Fix resource leak in register allocation failure path.Mun Gwan-gyeong2016-11-161-0/+2
| | | | | | CID 1394322 Signed-off-by: Mun Gwan-gyeong <[email protected]>
* glsl: stub out _mesa_reference_program() in standalone compilerTimothy Arceri2016-11-172-0/+12
| | | | | The follow patch will call this directly from the linker, the shader cache will also start calling these from the compiler.
* st/mesa/r200/i915/i965: move ARB program fields into a unionTimothy Arceri2016-11-1725-307/+319
| | | | | | | | | | | It's common for games to compile 2000 programs or more so at 32bits x 2000 programs x 22 fields x 2 (at least) stages This should give us something like 352 kilobytes in savings once we add some more glsl only fields. Reviewed-by: Emil Velikov <[email protected]>
* st/mesa: stop initialing Instructions and NumInstructionsTimothy Arceri2016-11-172-6/+0
| | | | | | | Since gl_program is now created with rzalloc() they should already be initialised. Reviewed-by: Emil Velikov <[email protected]>
* mesa: make use of ralloc when creating ARB asm gl_program fieldsTimothy Arceri2016-11-1714-74/+60
| | | | | | | | | | | This will allow us to move the ARB asm fields in gl_program into a union as we will be able call ralloc_free() on the entire struct when destroying the context. In this change we switch over to using ralloc for the Instructions, String and LocalParams fields of gl_program. Reviewed-by: Emil Velikov <[email protected]>
* mesa: remove unused Comment field in prog_instructionTimothy Arceri2016-11-173-38/+12
| | | | Reviewed-by: Emil Velikov <[email protected]>
* i965: get num_abos from shader_info rather than gl_linked_shaderTimothy Arceri2016-11-177-15/+24
| | | | | | This is a step towards freeing gl_linked_shader after linking. Reviewed-by: Emil Velikov <[email protected]>
* mesa/glsl: copy num_abos to gl_programTimothy Arceri2016-11-172-1/+1
| | | | | | | We should be able to free gl_linked_shader after linking in order to do so we need to switch to getting values from gl_program instead. Reviewed-by: Emil Velikov <[email protected]>
* i965: get num_images from shader_info rather than gl_linked_shaderTimothy Arceri2016-11-1714-35/+46
| | | | | | This is a step towards freeing gl_linked_shader after linking. Reviewed-by: Emil Velikov <[email protected]>
* mesa/glsl: copy num_images to gl_programTimothy Arceri2016-11-172-1/+2
| | | | | | | We should be able to free gl_linked_shader after linking in order to do so we need to switch to getting values from gl_program instead. Reviewed-by: Emil Velikov <[email protected]>
* nir: add support for counting AoA uniforms in nir_shader_gather_info()Timothy Arceri2016-11-171-2/+2
| | | | Reviewed-by: Emil Velikov <[email protected]>
* i965: only try print GLSL IR once when using INTEL_DEBUG to dump irTimothy Arceri2016-11-179-37/+21
| | | | | | | | Since we started releasing GLSL IR after linking the only time we can print GLSL IR is during linking. When regenerating variants only NIR will be available. Reviewed-by: Emil Velikov <[email protected]>
* anv/descriptor_set: Put the whole state in the state free listJason Ekstrand2016-11-161-5/+4
| | | | | | We're not really saving much by just putting the offset in there. Reviewed-by: Iago Toral Quiroga <[email protected]>