summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* llvmpipe: use simple coeffs calc for 128bit vectorsOded Gabbay2015-11-181-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are currently two methods in llvmpipe code to calculate coeffs to be used as inputs for the fragment shader. The two methods use slightly different ways to do the floating point calculations and thus produce slightly different results. The decision which method to use is determined by the size of the vector that is used by the platform. For vectors with size of more than 128bit, a single-step method is used, in which coeffs_init_simple() + attribs_update_simple() are called. For vectors with size of 128bit or less, a two-step method is used, in which coeffs_init() + attribs_update() are called. This causes some piglit tests (clip-distance-bulk-copy, interface-vs-unnamed-to-fs-unnamed) to fail when using platforms with 128bit vectors (such as ppc64le or x86-64 without AVX). This patch makes platforms with 128bit vectors use the single-step method (aka "simple" method) instead of the two-step method. This would make the resulting coeffs identical between more platforms, make sure the piglit tests passes, and make debugging and maintainability a bit easier as the generated LLVM IR will be the same for more platforms. The performance impact is negligible for x86-64 without AVX, and basically non-existent for ppc64le, as it can be seen from the following benchmarking results: - glxspheres, on ppc64le: - original code: 4.892745317 frames/sec 5.460303857 Mpixels/sec - with the patch: 4.932083873 frames/sec 5.504205571 Mpixels/sec - Additional 0.8% performance boost - glxspheres, on x86-64 without AVX: - original code: 20.16418809 frames/sec 22.50323395 Mpixels/sec - with the patch: 20.31328989 frames/sec 22.66963152 Mpixels/sec - Additional 0.74% performance boost - glmark2, on ppc64le: - original code: score of 58 - with my change: score of 57 - glmark2, on x86-64 without AVX: - original code: score of 175 - with the patch: score of 167 - Impact of of -4.5% on performance - OpenArena, on ppc64le: - original code: 3398 frames 1719.0 seconds 2.0 fps 255.0/505.9/2773.0/0.0 ms - with the patch: 3398 frames 1690.4 seconds 2.0 fps 241.0/497.5/2563.0/0.2 ms - 29 seconds faster with the patch, which is about 2% - OpenArena, on x86-64 without AVX: - original code: 3398 frames 239.6 seconds 14.2 fps 38.0/70.5/719.0/14.6 ms - with the patch: 3398 frames 244.4 seconds 13.9 fps 38.0/71.9/697.0/14.3 ms - 0.3 fps slower with the patch (about 2%) Additional details can be found at: http://lists.freedesktop.org/archives/mesa-dev/2015-October/098635.html Signed-off-by: Oded Gabbay <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (cherry picked from commit 39b4dfe6ab1003863778a25c091c080e098833ec)
* vc4: Add support for nir_op_uge, using the carry bit on QPU_A_SUB.Eric Anholt2015-11-185-0/+26
| | | | | | | | | | | | | | It looks like nir_lower_idiv is going to use it soon, so add support. With Ilia's change, this fixes one case in fs-op-div-large-uint-uint (with GL 3.0 forced on). Cc: "11.0" <[email protected]> (cherry picked from commit a4bf28178f064082d3b818d2cd48abf9075cc459) [Emil Velikov: Resolve trivial conflicts] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/gallium/drivers/vc4/vc4_qpu_emit.c
* r200: fix bgrx8/xrgb8 blitsRoland Scheidegger2015-11-181-0/+4
| | | | | | | | | | | | | | | | | | Since 779cabfc7d022de8b7b9bc7fdac0caffa8646c51 the same txformat table entries are used for "normal" texturing as well as for blits. However, I forgot to put in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing path can't hit them because the radeon tex format chooser will never chose them, but we get that format from the dri buffers (at least I assume we got it from there). This is untested but essentially addressing the same bug as for radeon. (I don't think that the second entry per le/be table is actually necessary, but shouldn't hurt...) Tested-by: Ian Romanick <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit a2611ffe4b5f1852c59301f086b988233a1c62f3)
* radeon: fix bgrx8/xrgb8 blitsRoland Scheidegger2015-11-181-0/+2
| | | | | | | | | | | | | | | | | Since d21320f6258b2e1780a15c1ca718963d8a15ca18 the same txformat table entries are used for "normal" texturing as well as for blits. However, I forgot to put in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing path can't hit them because the radeon tex format chooser will never chose them, but we get that format from the dri buffers (at least I assume we got it from there). This caused lots of piglit regressions (and probably lots of trouble outside piglit too). This fixes bug https://bugs.freedesktop.org/show_bug.cgi?id=92900. Tested-by: Ian Romanick <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit 983614dbede7b94cba1bad9f3e8627fc5e14bb91)
* meta/generate_mipmap: Only modify the draw framebuffer binding in ↵Ian Romanick2015-11-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | fallback_required Previously GL_FRAMEBUFFER was used. However, if GL_EXT_framebuffer_blit is supported (note: it is supported by every Mesa driver), this is *sometimes* an alias for GL_DRAW_FRAMEBUFFER (getters) and *sometimes* an alias for *both* GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER (setters). As a result, the code saved one binding but modified both. If the bindings were different, the GL_READ_FRAMEBUFFER would be incorrect on exit. Fixes the piglit fbo-generatemipmap-versus-READ_FRAMEBUFFER test. Ideally this function would use DSA functions and not modify the binding at all. However, that would be a much more intrusive change because _mesa_meta_bind_fbo_image would also need to be modified. _mesa_meta_bind_fbo_image has a lot of callers. Much of this code is about to get a major rework due to bug #92363, so I don't think it matters too much. In fact, I discovered this bug while working on the other bug. Le bon temps! Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit c40a88b6c5a698e5297957e28cccf2ce23820caa)
* radeonsi: enable optimal raster config setting for fiji (v2)Alex Deucher2015-11-181-3/+9
| | | | | | | | | | | | Requires proper kernel tiling configuration so check the tiling config registers. v2: send the right version of the patch Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] (cherry picked from commit 00f554abba8c0f3b65af94365c15109c3b858486)
* nouveau: don't expose HEVC decoding supportIlia Mirkin2015-11-181-0/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] (cherry picked from commit f94e1d97381ec787c2abbbcd5265252596217e33)
* glsl: Allow implicit int -> uint conversions for the % operator.Kenneth Graunke2015-11-181-9/+28
| | | | | | | | | | | | | | | | | | GLSL 4.00 and GL_ARB_gpu_shader5 introduced a new int -> uint implicit conversion rule and updated the rules for modulus to use them. (In earlier languages, none of the implicit conversion rules did anything relevant, so there was no point in applying them.) This allows expressions such as: int foo; uint bar; uint mod = foo % bar; Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 511de1a80cedc0add386dad79cce56dd68d2f611)
* meta/generate_mipmap: Don't leak the sampler objectIan Romanick2015-11-181-0/+2
| | | | | | | Signed-off-by: Ian Romanick <[email protected]> Cc: "10.6 11.0" <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> (cherry picked from commit 758f12fd98dea9a9682becf2d496bd38ef3959e5)
* radeonsi: initialize SX_PS_DOWNCONVERT to 0 on StoneyMarek Olšák2015-11-181-0/+3
| | | | | | | | | | | | | otherwise the SX or CB blocks can go bananas Reviewed-by: Nicolai Hähnle <[email protected]> Cc: [email protected] (cherry picked from commit 40912dd91e96376517fb41bb4dc228b45fd1a01c) [Emil Velikov: resolve trivial conflicts] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/gallium/drivers/radeonsi/si_state.c
* nir/vars_to_ssa: Rework copy set handling in lower_copies_to_load_storeJason Ekstrand2015-11-181-1/+4
| | | | | | | | | | | | | | | | | | | | | | Previously, we walked through a given deref_node's copies and, after lowering the copy away, removed it from both the source and destination copy sets. This commit changes this to only remove it from the other node's copy set (not the one we're lowering). At the end of the loop, we just throw away the copy set for the node we're lowering since that node no longer has any copies. This has two advantages: 1) It's more efficient because we're doing potentially half as many set search operations. 2) It now properly handles copies from a node to itself. Perviously, it would delete the copy from the set when processing the destinatioon and then assert-fail when we couldn't find it for the source. Cc: "11.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92588 Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Connor Abbott <[email protected]> (cherry picked from commit 226ba889a0f820b9f4b1132e379620d2688c96e7)
* i965/skl/gt4: Fix URB programming restriction.Ben Widawsky2015-11-181-0/+9
| | | | | | | | | | | | | | | The comment in the code details the restriction. Thanks to Ken for having a very helpful conversation with me, and spotting the blurb in the link I sent him :P. There are still stability problems for me on GT4, but this definitely helps with some of the failures. v2: Comment fixes Cc: [email protected] Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 55314c5be4cbf933ab7fbd20f6aa49207e04c946)
* r600: initialised PGM_RESOURCES_2 for ES/GSDave Airlie2015-11-182-0/+6
| | | | | | | | | | | | | This fixes the corruption on rendering that we are seeing in certain geometry shaders. Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=91780 Reviewed-by: Alex Deucher <[email protected]> Tested / Reviewed-by: Glenn Kennard <[email protected]> Cc: "10.6" "11.0" <[email protected]> Signed-off-by: Dave Airlie <[email protected]> (cherry picked from commit df8af7d75155845d12d5a14a3a5ca644f07cb3b1)
* mesa/copyimage: allow width/height to not be multiples of blockIlia Mirkin2015-11-181-3/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For compressed textures, the image size is not necessarily a multiple of the block size (e.g. the last mip levels). Section 18.3.2 (Copying Between Images) of the OpenGL 4.5 Core Profile spec says: An INVALID_VALUE error is generated if the dimensions of either subregion exceeds the boundaries of the corresponding image object, or if the image format is compressed and the dimensions of the subregion fail to meet the alignment constraints of the format. and Section 8.7 (Compressed Texture Images) says: An INVALID_OPERATION error is generated if any of the following conditions occurs: * width is not a multiple of four, and width + xoffset is not equal to the value of TEXTURE_WIDTH. * height is not a multiple of four, and height + yoffset is not equal to the value of TEXTURE_HEIGHT. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92860 Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: [email protected] (cherry picked from commit 912babba7bf1abd3caa49f6372d581ae1afe7e84) [Emil Velikov: resolve trivial conflicts] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/mesa/main/copyimage.c
* vc4: Return NULL when we can't make our shadow for a sampler view.Eric Anholt2015-11-181-0/+4
| | | | | | | | I'm not sure what the caller does is appropriate (just have a NULL sampler at this slot), but it fixes the immediate crash. Cc: "11.0" <[email protected]> (cherry picked from commit 5980389bbf98b8186ba6a06392d92b82fa9efad3)
* vc4: Return GL_OUT_OF_MEMORY when buffer allocation fails.Eric Anholt2015-11-182-19/+32
| | | | | | | | I was afraid our callers weren't prepared for this, but it looks like at least for resource creation, mesa/st throws an error appropriately. Cc: "11.0" <[email protected]> (cherry picked from commit eb8fb0064dbde7a363c2f99466a51b346b09a029)
* winsys/radeon: Use CPU page size instead of hardcoding 4096 bytes v3Michel Dänzer2015-11-181-11/+19
| | | | | | | | | | | | | | | | Fixes GPUVM conflicts with non-4K page size. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92738 v2: Replace sanitization of VM base address alignment with comment why that's not necessary. v3: Use unsigned instead of long as the type for the size_align member. (Marek) Cc: [email protected] Reviewed-by: Christian König <[email protected]> (v1) Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 24abbaff9ad177624c2b4906c7d94f5d91ac3cc0)
* radeon/uvd: fix VC-1 simple/main profile decode v2Boyuan Zhang2015-11-182-2/+7
| | | | | | | | | | | | We just needed to set the extra width/height fields to get this working. v2 (chk): rebased, CC stable added, commit message added, fixed coding style Signed-off-by: Boyuan Zhang <[email protected]> Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit 6bad554d98004e6c8ab46e8cbe73f3b3024e55c5)
* st/vaapi: fix vaapi VC-1 simple/main corruption v2Boyuan Zhang2015-11-181-0/+2
| | | | | | | | | | | | Apply the start code fix only to advanced profile. v2 (chk): add commit message Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit ed55def44febbe1662ddcc0c33a23308899ce488)
* radeonsi: add register definitions for StoneyMarek Olšák2015-11-111-0/+322
| | | | | | | | There are a few non-stoney changes too. Reviewed-by: Alex Deucher <[email protected]> (cherry picked from commit d57ede92b7832f01df2aa5755c8c34b4de4866d4) Nominated-by: Emil Velikov <[email protected]>
* Revert "mesa/glformats: Undo code changes from _mesa_base_tex_format() move"Emil Velikov2015-11-101-6/+142
| | | | | | | | | | | | This reverts commit 2294f6f3112f34c4685586f95cd567ce130ee1ab. It introduces a regression in the following test piglit.spec.oes_compressed_paletted_texture.basic api In general this commit is needed to prevent regressions in GL_KHR_texture_compression_astc_ldr, which... isn't in 11.0 Reported-by: Mark Janes <[email protected]>
* st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBufferJulien Isorce2015-11-071-0/+6
| | | | | | | Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian König <[email protected]> (cherry picked from commit 5e763aaa21654d0591b7da14c573fc03d4a60205) Nominated-by: Emil Velikov <[email protected]>
* st/va: do not destroy old buffer when new one failedJulien Isorce2015-11-071-6/+13
| | | | | | | | | | | | | If formats are not the same vlVaPutImage re-creates the video buffer with the right format. But if the creation of this new video buffer fails then the surface looses its current buffer. Let's just destroy the previous buffer on success. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]> (cherry picked from commit d42029d2d9bc6b65ccf847dc9ba2e70b496d0299) Nominated-by: Emil Velikov <[email protected]>
* nvc0: fix crash when nv50_miptree_from_handle failsJulien Isorce2015-11-071-1/+2
| | | | | | | Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> (cherry picked from commit 3bbb8715acd1cb85ea7aa7763c06cd12347a1a9a) Nominated-by: Emil Velikov <[email protected]>
* st/va: pass picture desc to begin and decodeJulien Isorce2015-11-071-2/+2
| | | | | | | | | | | | At least vl_mpeg12_decoder uses the picture desc in begin_frame and decode_bitstream. https://bugs.freedesktop.org/show_bug.cgi?id=92634 Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian König <[email protected]> (cherry picked from commit a61be1a79897931e3efb5b9119c48e1fb1257db4) Nominated-by: Emil Velikov <[email protected]>
* nouveau: relax fence emit space assertIlia Mirkin2015-11-073-3/+3
| | | | | | | | | | We also have the "reserved for kick" space available. Some of my earlier changes can probably be removed, but this is a quick fix for some of the rarer fallout. Signed-off-by: Ilia Mirkin <[email protected]> Cc: <[email protected]> (cherry picked from commit bb73fc4cb82c1abdf47aa373c78c2a85fe29b3ec)
* vc4: When the create ioctl fails, free our cache and try again.Eric Anholt2015-11-071-5/+24
| | | | | | | | | | This greatly increases the pressure you can put on the driver before create fails. Ultimately we need to let the kernel take control of our cached BOs and just take them from us (and other clients) directly, but this is a very easy patch for the moment. Cc: "11.0" <[email protected]> (cherry picked from commit 6d3a24bce80a32063aedfe568efd5532aea4c875)
* nir: Properly invalidate metadata in nir_opt_remove_phis().Kenneth Graunke2015-11-071-0/+5
| | | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: [email protected] (cherry picked from commit 59bbe2681b73c3795b7298e2486d5fde7c464ed5)
* nir: Properly invalidate metadata in nir_lower_vec_to_movs().Kenneth Graunke2015-11-071-0/+5
| | | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: [email protected] (cherry picked from commit bc3942e2970c60a816cf954b1fa4d416d0852bd9)
* nir: Report progress from lower_vec_to_movs().Jason Ekstrand2015-11-072-7/+23
| | | | | | | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit 9f5e7ae9d83ce6de761936b95cd0b7ba4c1219c4) [Emil Velikov] Correctly derive nir_shader from vec_to_movs_state Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/glsl/nir/nir.h src/glsl/nir/nir_lower_vec_to_movs.c
* nir/lower_vec_to_movs: Pass the shader around directlyJason Ekstrand2015-11-071-6/+8
| | | | | | | | | Previously, we were passing the shader around, we were just calling it "mem_ctx". However, the nir_shader is (and must be for the purposes of mark-and-sweep) the mem_ctx so we might as well pass it around explicitly. Reviewed-by: Eduardo Lima Mitev <[email protected]> (cherry picked from commit b7eeced3c724bf5de05290551ced8621ce2c7c52)
* nir: Properly invalidate metadata in nir_opt_copy_prop().Kenneth Graunke2015-11-071-0/+6
| | | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: [email protected] (cherry picked from commit 0f037bd71ffe083c05cd0867ef54bce91ff84243)
* nir: Properly invalidate metadata in nir_split_var_copies().Kenneth Graunke2015-11-071-0/+5
| | | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]> Cc: [email protected] (cherry picked from commit 8bb44510fca5315bbdd61502c72c22c7198c0daf)
* nir: Report progress from nir_split_var_copies().Kenneth Graunke2015-11-072-4/+13
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> (cherry picked from commit dc18b9357b553a972ea439facfbc55e376f1179f)
* i965/skl: Add GT4 PCI IDsBen Widawsky2015-11-071-1/+5
| | | | | | | | | | | | | | | | | | | | | Like other gen8+ hardware, the hardware automatically scales up thread counts. We must be careful about the URB sizes since GT4 adds another slice. One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a real bug since the URB size will be wrong. Because this patch is simply meant to add the missing IDs, that will be fixed in a later patch. v2: No longer relevant. v3: Update the wm thread count to support GT4. The WM thread count is used to determine the maximum scratch space required. Currently the code always allocates the maximum amount even though lower GT SKUs require less. The formula is threads_per_psd * subslices_per_slice * slices Cc: [email protected] Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> (cherry picked from commit 7cbd6608f544591bc6aadf48877608b30a78ccb8)
* nouveau: set MaxDrawBuffers to the same value as MaxColorAttachmentsIlia Mirkin2015-11-071-1/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] (cherry picked from commit 985b51551a9bafec86604714d5faf3065dad4812)
* gbm.h: Add a missing stddef.h include for size_t.Emmanuel Gil Peyrot2015-11-051-0/+1
| | | | | | | | | This was causing compilation issues when one of its providers wasn’t already included before gbm.h. Cc: "11.0" <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit f3d4d10a1d483cff7b3fbb6db4d6d752dd002243)
* r600g: Fix special negative immediate constants when using ABS modifier.Ivan Kalvachev2015-11-053-6/+6
| | | | | | | | | | | | | | | | | | | | | | Some constants (like 1.0 and 0.5) could be inlined as immediate inputs without using their literal value. The r600_bytecode_special_constants() function emulates the negative of these constants by using NEG modifier. However some shaders define -1.0 constant and want to use it as 1.0. They do so by using ABS modifier. But r600_bytecode_special_constants() set NEG in addition to ABS. Since NEG modifier have priority over ABS one, we get -|1.0| as result, instead of |1.0|. The patch simply prevents the additional switching of NEG when ABS is set. [According to Ivan Kalvachev, this bug was fond via https://github.com/iXit/Mesa-3D/issues/126 and https://github.com/iXit/Mesa-3D/issues/127] Signed-off-by: Ivan Kalvachev <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> CC: <[email protected]> (cherry picked from commit f75f21a24ae2dd83507f3d4d8007f0fcfe6db802)
* st/mesa: fix mipmap generation for immutable textures with incomplete pyramidsNicolai Hähnle2015-11-051-32/+36
| | | | | | | | | | | | | | | | Without the clamping by NumLevels, the state tracker would reallocate the texture storage (incorrect) and even fail to copy the base level image after reallocation, leading to the graphical glitch of https://bugs.freedesktop.org/show_bug.cgi?id=91993 . A piglit test has been submitted for review as well (subtest of arb_texture_storage-texture-storage). v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák) Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 24c90888aeaf90b13700389b91b74bf63ee9f28d)
* i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.Kenneth Graunke2015-11-052-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the case of two nearly identical GLSL fragment shaders: out vec4 color; void main() { color = vec4(1); } and layout(early_fragment_tests) in; out vec4 color; void main() { color = vec4(1); } These shaders compile to the exact same assembly, but have distinct values for brw_wm_prog_data::early_fragment_tests. Since these are two independent GLSL shaders, they have different program keys - notably, brw_wm_prog_key::program_string_id differs. When uploading the second, brw_upload_cache will find an existing copy of the assembly in the cache BO, which means matching_data will be non-NULL. Although we create a second cache item (with the new key and prog_data), we set item->offset to the existing copy and avoid re-uploading duplicate assembly. However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if item->offset differed from the supplied offset. With reuse, both programs have the same offset, but prog_data changed. We have to flag it, but failed to. To fix this, we simply need to check if the aux (prog_data) pointer changed. If either the assembly or the prog_data differs, flag it. This fixes a regression since 1bba29ed403e735ba0bf04ed8aa2e571884f, where Topi fixed brw_upload_cache() to actually reuse identical assembly. Prior to that, reuse basically never happened due to bugs. Unfortunately, this code apparently wasn't prepared to handle reuse! Fixes GPU hangs in Dolphin on Broadwell. Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this and helping track down the real issue. Cc: "11.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Tested-by: Pierre Bourdon <[email protected]> (cherry picked from commit bf05af3f0e8769f417bbd995470dc1b8083a0df9)
* i965: Fix is-renderable check in intel_image_target_renderbuffer_storageIan Romanick2015-11-051-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Previously we could create a renderbuffer with format MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage, then FAIL to convert the EGLImage back to a renderbuffer because reasons. Just use the same check in intel_image_target_renderbuffer_storage that brw_render_target_supported uses. There are more checks in brw_render_target_supported, but I don't think they are necessary here. A different approach would be to refactor brw_render_target_supported to take rb->Format and rb->NumSamples as parameters (instead of a gl_renderbuffer) and use the new function here. Fixes: ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476 Cc: "10.3 10.4 10.5 10.6 11.0" <[email protected]> (cherry picked from commit 7070c8879adff2a1204d7473f119d8194eff919b)
* radeonsi: add support for Stoney asics (v3)Samuel Li2015-11-055-3/+19
| | | | | | | | | | | | v2 (agd): rebase on mesa master, split pci ids to separate commit v3 (agd): use carrizo for llvm processor name for llvm 3.7 and older Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Samuel Li <[email protected]> Cc: [email protected] (cherry picked from commit bf0d0ce0d57dce5df8195942d2eda6389d341fea)
* nvc0: respect edgeflag attribute widthIlia Mirkin2015-11-051-7/+33
| | | | | | | | | | | | | The edgeflag comes in as ubyte with glEdgeFlagPointer but as float with plain immediate glEdgeFlag. Avoid reading bytes that weren't meant for the edgeflag in the pointer case. Fixes intermittent failures with gl-2.0-edgeflag piglit (and valgrind complaints about reading uninitialized memory). Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] (cherry picked from commit e05021ff72abb7de6506c90dd70a9f7ab490bf90)
* gallivm: disable f16c when not using AVXRoland Scheidegger2015-11-051-0/+3
| | | | | | | | | | | | | | | | | | | f16c intrinsic can only be emitted when AVX is used. So when we disable AVX due to forcing 128bit vectors we must not use this intrinsic (depending on llvm version, this worked previously because llvm used AVX even when we didn't tell it to, however I've seen this fail with llvm 3.3 since 718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect of disabling avx in llvm albeit it only touches sse flags really, but with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled). Albeit being able to use AVX with 128bit vectors also would have its uses, the code as is really was meant to emulate jit code creation for less capable cpus. v2: add some (ifdefed out) missing de-featuring options for simulating less capable cpus. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Jose Fonseca <[email protected]> (cherry picked from commit 711489648bcce5cd8fcf14e73e5affe069010c01) Nominated-by: Roland Scheidegger <[email protected]>
* gallivm: Explicitly disable unsupported CPU features.Jose Fonseca2015-11-051-38/+34
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92214 CC: "10.6 11.0" <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (cherry picked from commit ea421e919ae6e72e1319fb205c42a6fb53ca2f82)
* radeon/uvd: don't expose HEVC on old UVD hw (v3)Alex Deucher2015-11-051-32/+18
| | | | | | | | | | | | | | | | | The section for UVD 2 and older was not updated when HEVC support was added. Reported by Kano on irc. v2: integrate the UVD2 and older checks into the main switch statement. v3: handle encode checking as well. Encode is already checked in the top case statement, so drop encode checks in the lower case statement. Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected] (cherry picked from commit 7b636581253fe858ac883e3d3eec21173ac069d4)
* gallivm: Translate all util_cpu_caps bits to LLVM attributes.Jose Fonseca2015-11-051-2/+34
| | | | | | | | | | | | | | | This should prevent disparity between features Mesa and LLVM believe are supported by the CPU. http://lists.freedesktop.org/archives/mesa-dev/2015-October/thread.html#96990 Tested on a i7-3720QM w/ LLVM 3.3 and 3.6. v2: Increase SmallVector initial size as suggested by Gustaw Smolarczyk. Reviewed-by: Roland Scheidegger <[email protected]> CC: "10.6 11.0" <[email protected]> (cherry picked from commit 718249843b915decf8fccec92e466ac1a6219934)
* mesa/glformats: Undo code changes from _mesa_base_tex_format() moveNanley Chery2015-11-051-142/+6
| | | | | | | | | | | | | | | | | | | The refactoring commit, c6bf1cd, accidentally reverted cd49b97 and 99b1f47. These changes caused more code to be added to the function and removed the existing support for ASTC. This patch reverts those modifications. v2. Actually include ASTC support again. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92221 Cc: "11.0" <[email protected]> Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit f1147a238ab35a56fa7d1c64f6025ff3b909dad8) [Emil Velikov] - Drop the KHR_texture_compression_astc_ldr check - Add texcompress.h include. Signed-off-by: Emil Velikov <[email protected]>
* osmesa: Expose GL entry points for Windows build via DEF file.Nigel Stewart2015-11-052-0/+674
| | | | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92437 CC: "10.6 11.0" <[email protected]> Signed-off-by: Jose Fonseca <[email protected]> (cherry picked from commit 04703762e544bc732f6f8b07033221dfbd58159f)
* mesa: fix ARRAY_SIZE query for GetProgramResourceivTapani Pälli2015-10-213-43/+62
| | | | | | | | | | | | | | | | | Patch also refactors name length queries which were using array size in computation, this has to be done in same time to avoid regression in arb_program_interface_query-resource-query Piglit test. Fixes rest of the failures with ES31-CTS.program_interface_query.no-locations v2: make additional check only for GS inputs v3: create helper function for resource name length so that it gets calculated only in one place Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Martin Peres <[email protected]> (cherry picked from commit c0722be9f58ef89dae98d8c459ec4f9589f97748)