summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* r200: fix bgrx8/xrgb8 blitsRoland Scheidegger2015-11-181-0/+4
| | | | | | | | | | | | | | | | | | Since 779cabfc7d022de8b7b9bc7fdac0caffa8646c51 the same txformat table entries are used for "normal" texturing as well as for blits. However, I forgot to put in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing path can't hit them because the radeon tex format chooser will never chose them, but we get that format from the dri buffers (at least I assume we got it from there). This is untested but essentially addressing the same bug as for radeon. (I don't think that the second entry per le/be table is actually necessary, but shouldn't hurt...) Tested-by: Ian Romanick <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit a2611ffe4b5f1852c59301f086b988233a1c62f3)
* radeon: fix bgrx8/xrgb8 blitsRoland Scheidegger2015-11-181-0/+2
| | | | | | | | | | | | | | | | | Since d21320f6258b2e1780a15c1ca718963d8a15ca18 the same txformat table entries are used for "normal" texturing as well as for blits. However, I forgot to put in an entry for the bgrx8 (le) and xrgb8 (be) formats - the normal texturing path can't hit them because the radeon tex format chooser will never chose them, but we get that format from the dri buffers (at least I assume we got it from there). This caused lots of piglit regressions (and probably lots of trouble outside piglit too). This fixes bug https://bugs.freedesktop.org/show_bug.cgi?id=92900. Tested-by: Ian Romanick <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit 983614dbede7b94cba1bad9f3e8627fc5e14bb91)
* meta/generate_mipmap: Only modify the draw framebuffer binding in ↵Ian Romanick2015-11-181-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | fallback_required Previously GL_FRAMEBUFFER was used. However, if GL_EXT_framebuffer_blit is supported (note: it is supported by every Mesa driver), this is *sometimes* an alias for GL_DRAW_FRAMEBUFFER (getters) and *sometimes* an alias for *both* GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER (setters). As a result, the code saved one binding but modified both. If the bindings were different, the GL_READ_FRAMEBUFFER would be incorrect on exit. Fixes the piglit fbo-generatemipmap-versus-READ_FRAMEBUFFER test. Ideally this function would use DSA functions and not modify the binding at all. However, that would be a much more intrusive change because _mesa_meta_bind_fbo_image would also need to be modified. _mesa_meta_bind_fbo_image has a lot of callers. Much of this code is about to get a major rework due to bug #92363, so I don't think it matters too much. In fact, I discovered this bug while working on the other bug. Le bon temps! Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit c40a88b6c5a698e5297957e28cccf2ce23820caa)
* meta/generate_mipmap: Don't leak the sampler objectIan Romanick2015-11-181-0/+2
| | | | | | | Signed-off-by: Ian Romanick <[email protected]> Cc: "10.6 11.0" <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> (cherry picked from commit 758f12fd98dea9a9682becf2d496bd38ef3959e5)
* i965/skl/gt4: Fix URB programming restriction.Ben Widawsky2015-11-181-0/+9
| | | | | | | | | | | | | | | The comment in the code details the restriction. Thanks to Ken for having a very helpful conversation with me, and spotting the blurb in the link I sent him :P. There are still stability problems for me on GT4, but this definitely helps with some of the failures. v2: Comment fixes Cc: [email protected] Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> (cherry picked from commit 55314c5be4cbf933ab7fbd20f6aa49207e04c946)
* mesa/copyimage: allow width/height to not be multiples of blockIlia Mirkin2015-11-181-3/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For compressed textures, the image size is not necessarily a multiple of the block size (e.g. the last mip levels). Section 18.3.2 (Copying Between Images) of the OpenGL 4.5 Core Profile spec says: An INVALID_VALUE error is generated if the dimensions of either subregion exceeds the boundaries of the corresponding image object, or if the image format is compressed and the dimensions of the subregion fail to meet the alignment constraints of the format. and Section 8.7 (Compressed Texture Images) says: An INVALID_OPERATION error is generated if any of the following conditions occurs: * width is not a multiple of four, and width + xoffset is not equal to the value of TEXTURE_WIDTH. * height is not a multiple of four, and height + yoffset is not equal to the value of TEXTURE_HEIGHT. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92860 Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Alex Deucher <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: [email protected] (cherry picked from commit 912babba7bf1abd3caa49f6372d581ae1afe7e84) [Emil Velikov: resolve trivial conflicts] Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/mesa/main/copyimage.c
* Revert "mesa/glformats: Undo code changes from _mesa_base_tex_format() move"Emil Velikov2015-11-101-6/+142
| | | | | | | | | | | | This reverts commit 2294f6f3112f34c4685586f95cd567ce130ee1ab. It introduces a regression in the following test piglit.spec.oes_compressed_paletted_texture.basic api In general this commit is needed to prevent regressions in GL_KHR_texture_compression_astc_ldr, which... isn't in 11.0 Reported-by: Mark Janes <[email protected]>
* i965/skl: Add GT4 PCI IDsBen Widawsky2015-11-071-1/+5
| | | | | | | | | | | | | | | | | | | | | Like other gen8+ hardware, the hardware automatically scales up thread counts. We must be careful about the URB sizes since GT4 adds another slice. One of the existing PCI IDs is actually mislabeled as GT3. Arguably this is a real bug since the URB size will be wrong. Because this patch is simply meant to add the missing IDs, that will be fixed in a later patch. v2: No longer relevant. v3: Update the wm thread count to support GT4. The WM thread count is used to determine the maximum scratch space required. Currently the code always allocates the maximum amount even though lower GT SKUs require less. The formula is threads_per_psd * subslices_per_slice * slices Cc: [email protected] Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Ben Widawsky <[email protected]> (cherry picked from commit 7cbd6608f544591bc6aadf48877608b30a78ccb8)
* nouveau: set MaxDrawBuffers to the same value as MaxColorAttachmentsIlia Mirkin2015-11-071-1/+1
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] (cherry picked from commit 985b51551a9bafec86604714d5faf3065dad4812)
* st/mesa: fix mipmap generation for immutable textures with incomplete pyramidsNicolai Hähnle2015-11-051-32/+36
| | | | | | | | | | | | | | | | Without the clamping by NumLevels, the state tracker would reallocate the texture storage (incorrect) and even fail to copy the base level image after reallocation, leading to the graphical glitch of https://bugs.freedesktop.org/show_bug.cgi?id=91993 . A piglit test has been submitted for review as well (subtest of arb_texture_storage-texture-storage). v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák) Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit 24c90888aeaf90b13700389b91b74bf63ee9f28d)
* i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.Kenneth Graunke2015-11-052-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the case of two nearly identical GLSL fragment shaders: out vec4 color; void main() { color = vec4(1); } and layout(early_fragment_tests) in; out vec4 color; void main() { color = vec4(1); } These shaders compile to the exact same assembly, but have distinct values for brw_wm_prog_data::early_fragment_tests. Since these are two independent GLSL shaders, they have different program keys - notably, brw_wm_prog_key::program_string_id differs. When uploading the second, brw_upload_cache will find an existing copy of the assembly in the cache BO, which means matching_data will be non-NULL. Although we create a second cache item (with the new key and prog_data), we set item->offset to the existing copy and avoid re-uploading duplicate assembly. However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if item->offset differed from the supplied offset. With reuse, both programs have the same offset, but prog_data changed. We have to flag it, but failed to. To fix this, we simply need to check if the aux (prog_data) pointer changed. If either the assembly or the prog_data differs, flag it. This fixes a regression since 1bba29ed403e735ba0bf04ed8aa2e571884f, where Topi fixed brw_upload_cache() to actually reuse identical assembly. Prior to that, reuse basically never happened due to bugs. Unfortunately, this code apparently wasn't prepared to handle reuse! Fixes GPU hangs in Dolphin on Broadwell. Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this and helping track down the real issue. Cc: "11.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Tested-by: Pierre Bourdon <[email protected]> (cherry picked from commit bf05af3f0e8769f417bbd995470dc1b8083a0df9)
* i965: Fix is-renderable check in intel_image_target_renderbuffer_storageIan Romanick2015-11-051-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Previously we could create a renderbuffer with format MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage, then FAIL to convert the EGLImage back to a renderbuffer because reasons. Just use the same check in intel_image_target_renderbuffer_storage that brw_render_target_supported uses. There are more checks in brw_render_target_supported, but I don't think they are necessary here. A different approach would be to refactor brw_render_target_supported to take rb->Format and rb->NumSamples as parameters (instead of a gl_renderbuffer) and use the new function here. Fixes: ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476 Cc: "10.3 10.4 10.5 10.6 11.0" <[email protected]> (cherry picked from commit 7070c8879adff2a1204d7473f119d8194eff919b)
* mesa/glformats: Undo code changes from _mesa_base_tex_format() moveNanley Chery2015-11-051-142/+6
| | | | | | | | | | | | | | | | | | | The refactoring commit, c6bf1cd, accidentally reverted cd49b97 and 99b1f47. These changes caused more code to be added to the function and removed the existing support for ASTC. This patch reverts those modifications. v2. Actually include ASTC support again. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92221 Cc: "11.0" <[email protected]> Signed-off-by: Nanley Chery <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit f1147a238ab35a56fa7d1c64f6025ff3b909dad8) [Emil Velikov] - Drop the KHR_texture_compression_astc_ldr check - Add texcompress.h include. Signed-off-by: Emil Velikov <[email protected]>
* mesa: fix ARRAY_SIZE query for GetProgramResourceivTapani Pälli2015-10-213-43/+62
| | | | | | | | | | | | | | | | | Patch also refactors name length queries which were using array size in computation, this has to be done in same time to avoid regression in arb_program_interface_query-resource-query Piglit test. Fixes rest of the failures with ES31-CTS.program_interface_query.no-locations v2: make additional check only for GS inputs v3: create helper function for resource name length so that it gets calculated only in one place Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Martin Peres <[email protected]> (cherry picked from commit c0722be9f58ef89dae98d8c459ec4f9589f97748)
* i965: Remove early release of DRI2 miptreeChris Wilson2015-10-211-1/+0
| | | | | | | | | | | | | | | | intel_update_winsys_renderbuffer_miptree() will release the existing miptree when wrapping a new DRI2 buffer, so we can remove the early release and so prevent a NULL mt dereference should importing the new DRI2 name fail for any reason. (Reusing the old DRI2 name will result in the rendering going astray, to a stale buffer, and not shown on the screen, but it allows us to issue a warning and not crash much later in innocent code.) Signed-off-by: Chris Wilson <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86281 Reviewed-by: Martin Peres <[email protected]> Reviewed-by: Chad Versace <[email protected]> (cherry picked from commit 70e91d61fde239e8ae58148cacd4ff891126e2aa)
* i965/vec4: fill src_reg type using the constructor type parameterAlejandro Piñeiro2015-10-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | The src_reg constructor that received the glsl_type was using it only to build the swizzle, but not to fill this->type as dst_reg is doing. This caused some type mismatch between movs and alu operations on the NIR path, so copy propagation optimization was not applied to remove unneeded movs if negate modifier was involved. This was first detected on minus (negate+add) operations. Shader DB results (taking into account only vec4): total instructions in shared programs: 20019 -> 19934 (-0.42%) instructions in affected programs: 2918 -> 2833 (-2.91%) helped: 79 HURT: 0 GAINED: 0 LOST: 0 Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit 4de86e1371b0d59a5b9a787b726be3d373024647) Nominated-by: Christoph Brill <[email protected]>
* i965/vec4: check writemask when bailing out at register coalesceAlejandro Piñeiro2015-10-211-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | opt_register_coalesce stopped to check previous instructions to coalesce with if somebody else was writing on the same destination. This can be optimized to check if somebody else was writing to the same channels of the same destination using the writemask. Shader DB results (taking into account only vec4): total instructions in shared programs: 1781593 -> 1734957 (-2.62%) instructions in affected programs: 1238390 -> 1191754 (-3.77%) helped: 12782 HURT: 0 GAINED: 0 LOST: 0 v2: removed some parenthesis, fixed indentation, as suggested by Matt Turner v3: added brackets, for consistency, as suggested by Eduardo Lima Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit d4e29af2344c06490913efc35430f93a966061bb) Nominated-by: Jason Ekstrand <[email protected]>
* mesa: fix incorrect opcode in save_BlendFunci()Brian Paul2015-10-211-1/+1
| | | | | | | | | | Fixes assertion failure with new piglit arb_draw_buffers_blend-state_set_get test. Cc: [email protected] Reviewed-by: Jose Fonseca <[email protected]> (cherry picked from commit e24d04e436ed48d4a0aac90590cbaa40da936208)
* gallium: add PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINTMarek Olšák2015-10-211-0/+3
| | | | | | | | | | | | | | | This avoids a serious r600g bug leading to a GPU hang. The chances this bug will get fixed are pretty low now. I deeply regret listening to others and not pushing this patch, leaving other users with a GPU-crashing driver. Yes, it should be fixed in the compiler and it's ugly, but users couldn't care less about that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86720 Cc: 11.0 10.6 <[email protected]> Reviewed-by: Brian Paul <[email protected]> (cherry picked from commit 814f31457e9ae83d4f1e39236f704721b279b73d)
* st/mesa: fix clip state dependenciesMarek Olšák2015-10-211-1/+4
| | | | | | | | This allows removing FLUSH_VERTICES in MatrixMode. Cc: [email protected] Reviewed-by: Brian Paul <[email protected]> (cherry picked from commit 3c6156a4a7b647cc55cbe3a4c13d53b5ffe505e6)
* mesa: Set api prefix to version string when overriding versionTapani Pälli2015-10-211-1/+18
| | | | | | | | | | | | | | | | | | | | | | | Otherwise there are problems when user overrides version and application such as Piglit wants to detect used api with glGetString(GL_VERSION). This makes it currently impossible to run glslparsertest tests for OpenGL ES when using version override. Below is example when using MESA_GLES_VERSION_OVERRIDE=3.1. Before: "3.1 Mesa 11.1.0-devel (git-24a1a15)" After: "OpenGL ES 3.1 Mesa 11.1.0-devel (git-78042ff)" v2: only include api prefix for OpenGL ES (Boyan Ding) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit dc8c221e2890cc9913dfc99e1e0fcb73c89af52c)
* mesa: android: Fix the incorrect path of sse_minmax.cChih-Wei Huang2015-10-211-1/+1
| | | | | | | | | | Cc: "10.6 11.0" <[email protected]> Fixes: 669cfc267a1 (android: mesa: fix the path of the SSE4_1 optimisations) Signed-off-by: Chih-Wei Huang <[email protected]> Reviewed-by: Emil Velikov <[email protected]> (cherry picked from commit 67d8518a0e5a3df400a6e70de667d69e4b6ce9c5)
* st/fbo: use pipe_surface_release instead of pipe_surface_referenceKrzysztof Sobiecki2015-10-211-1/+1
| | | | | | | | | | | | pipe_surface_reference have problems with deleted contexts, so use of pipe_surface_release might be more appropriate. Fixes Wasteland 2 Director's Cut crash on start. Cc: [email protected] Reviewed-by: Brian Paul <[email protected]> (cherry picked from commit 14f7ce42484c31a45fcb6aabdf503f7496a9a94c)
* vbo: fix incorrect switch statement in init_mat_currval()Brian Paul2015-10-211-1/+1
| | | | | | | | | | | | | The variable 'i' is a value in [0, MAT_ATTRIB_MAX-1] so subtracting VERT_ATTRIB_GENERIC0 gave a bogus value and we executed the default switch clause for all loop iterations. This doesn't fix any known issues but was clearly incorrect. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> (cherry picked from commit dd293d8aae324ac7b9d5297e33a1e732e1f3f4d3)
* ff_fragment_shader: Use binding to set the sampler unitIan Romanick2015-10-211-6/+4
| | | | | | | | | | | | | This is the way layout(binding=xxx) works from GLSL. The old method just happened to work (and significantly predated support for layout(binding=xxx)), but future changes will break this. v2: Remove some stale comments. Suggested by Matt and Chris Forbes. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit 8acce5d53af44a3d1d05a26e69559fd35f835de4)
* mesa/uniforms: fix get_uniform for doubles (v2)Dave Airlie2015-10-211-16/+37
| | | | | | | | | | | | | | | | The initial glGetUniformdv support didn't cover all the casting cases that are apparantly legal, and cts seems to test for them. I've updated the piglit test to cover these cases now. v2: fix indentation - it's all broken in this file (Ilia) fix src/dst index tracking in light of fp64 support (Ilia) cc: "11.0" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Signed-off-by: Dave Airlie <[email protected]> (cherry picked from commit bcfaab38858fdcfbd8ffeaf6b0e3da8a726f02e6)
* mesa: Get rid of texture-dependent image unit derived state.Francisco Jerez2015-10-214-33/+0
| | | | | | | | | | | | | | | The point is to avoid having to re-validate all image units when _NEW_TEXTURE is flagged, which can be expensive if the driver exposes a large number of image units. This has been reported to fix a 36% performance regression in the Synmark2 Multithread benchmark on the i965 driver which exposes 192 image units. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91788 Reported-by: Wendy Wang <[email protected]> Tested-by: Ye Tian <[email protected]> CC: "11.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 7e441bf025cf8c5d088430d546acb4c0ed58d27b)
* i965: Use _mesa_is_image_unit_valid() instead of gl_image_unit::_Valid.Francisco Jerez2015-10-213-6/+10
| | | | | | | | | gl_image_unit::_Valid will be removed in a future commit. Tested-by: Ye Tian <[email protected]> CC: "11.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 2d97a78b37ddf325d90e056f5eefee0548092530)
* mesa: Skip redundant texture completeness checking during image validation.Francisco Jerez2015-10-211-1/+2
| | | | | | | | | | | | | | The call to _mesa_test_texobj_completeness() is unnecessary if the texture is already known to be complete. If the texture object is dirtied in the meantime _BaseComplete and _MipmapComplete will be reset to false. _mesa_is_image_unit_valid() will start to be called more frequently in a future commit, so it seems desirable to avoid the unnecessary work. Tested-by: Ye Tian <[email protected]> CC: "11.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 25d3338be37ddbfe676716034ec5f29e27323704)
* mesa: Expose function to calculate whether a shader image unit is valid.Francisco Jerez2015-10-212-4/+15
| | | | | | | | | | | | | | A future commit will remove all texture object-dependent derived state from the image unit struct to make validation unnecessary on texture state changes. Instead of checking gl_image_unit::_Valid drivers will be required to call this function when needed to find out whether an image unit is in a valid state and whether access from the shader is allowed. Tested-by: Ye Tian <[email protected]> CC: "11.0" <[email protected]> Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit 5152db415f4047569822d648fda09bdde4171d6d)
* i965: Don't tell the hardware about our UAV access.Francisco Jerez2015-10-216-19/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hardware documentation relating to the UAV HW-assisted coherency mechanism and UAV access enable bits is scarce and sometimes contradictory, and there's quite some guesswork behind this commit, so let me summarize the background first: HSW and later hardware have infrastructure to support a stricter form of data coherency between shader invocations from separate primitives. The mechanism is controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and _PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on the 3DPRIMITIVE command. Regardless of whether "UAV Coherency Required" is set, the hardware fixed-function units will increment a per-stage semaphore for each request received if "Accesses UAV" is set for the same or any lower stage. An implicit DC flush is emitted by the lowermost stage with "Accesses UAV" set once it's done processing the request, this also happens regardless of the value of "UAV Coherency Required". The completion of the DC flush will cause the same stage and all previous ones to decrement the semaphore, marking the UAV accesses for the primitive as coherent with L3. The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline stall before any threads are dispatched for the first FF stage with "Accesses UAV" set until the semaphore is cleared for the same stage. Effectively this guarantees that UAV memory accesses performed by previous primitives from any stage will be strictly ordered (and thanks to the implicit DC flush visible in memory) with UAV accesses from the following primitives. None of this is required by the usual image, atomic counter and SSBO GL APIs which have very relaxed cross-primitive coherency and ordering requirements, so we don't actually ever set the "UAV Coherency Required" bit -- Ordering with respect to shader invocations from previous stages on the same primitive where there is a data dependency is of course already guaranteed as the spec requires, regardless of this mechanism being enabled. We do set the "Accesses UAV" bits though since my commit ac7664e493655e290783c23a0412b9c70936da50 (which this patch partially reverts), mainly because of comments like the following from the BDW PRM: > 3DSTATE_GS >[...] > 12 Accesses UAV > Format: Enable > This field must be set when GS has a UAV access. There are similar comments in the documentation for the other 3DSTATE_*S commands. The "must" part is misleading and unjustified AFAIK. Most of the "Accesses UAV" bits don't seem to have any side effects other than the implicit DC flushes and the related book-keeping in anticipation for a subsequent primitive with "UAV Coherency Required" set, so in most cases they are unnecessary and may incur a performance penalty. There is an exception though. On Gen8+ the PS_EXTRA UAV access bit influences the calculation of the PS UAV-only and ThreadDispatchEnable signals which on previous generations were set explicitly by the driver, so we cannot always avoid enabling it on the PS stage. The primary motivation for this change is that in fact the hardware coherency mechanism is buggy and will cause a rather non-deterministic hang on Gen8 when VS is the only stage with "Accesses UAV" set and the processing of a request terminates immediately after the implicit DC flush is sent for a previous primitive with no additional vertices being emitted for the second primitive, what will cause the hardware to skip sending a second DC flush and cause the VS to stall indefinitely waiting for a response from the DC (BDWGFX HSD 1912017). This hardware bug can be reproduced on current master with the spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit subtest (if you have the patience to run it a few dozen times). The proposed workaround is to insert CS STALLs speculatively between 3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage only. Because this would affect one of the hottest paths in the driver and likely decrease performance even further due to the unnecessary serialization, and because we don't actually need the implicit DC flushes, it seems better to just disable them. Cc: 11.0 <[email protected]> (cherry picked from commit 5346c1167064d6429c6338974c6342f8346fd34b)
* mesa: add GL_UNSIGNED_INT_24_8 to _mesa_pack_depth_spanTapani Pälli2015-10-211-0/+15
| | | | | | | | | | | | Patch adds missing type (used with NV_read_depth) so that it gets handled correctly. This fixes errors seen with following CTS test: ES3-CTS.gtf.GL3Tests.packed_pixels.packed_pixels Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit d8d0e4a81e42678cc8c8b876dfee24d5c2f4ba38)
* mesa: Correctly handle GL_BGRA_EXT in ES3 format_and_type checksJason Ekstrand2015-10-101-2/+19
| | | | | | | | | | | | | | | | | | | | | | The EXT_texture_format_BGRA8888 extension (which mesa supports unconditionally) adds a new format and internal format called GL_BGRA_EXT. Previously, this was not really handled at all in _mesa_ex3_error_check_format_and_type. When the checks were tightened in commit f15a7f3c, we accidentally tightened things too far and GL_BGRA_EXT would always cause an error to be thrown. There were two primary issues here. First, is that _mesa_es3_effective_internal_format_for_format_and_type didn't handle the GL_BGRA_EXT format. Second is that it blindly uses _mesa_base_tex_format which returns GL_RGBA for GL_BGRA_EXT. This commit fixes both of these issues as well as adds explicit checks that GL_BGRA_EXT is only ever used with GL_BGRA_EXT and GL_UNSIGNED_BYTE. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92265 Reviewed-by: Ian Romanick <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit 6ad9ebb073fc4ed245ef8e9db4479a52e818cb92)
* mesa: Add abs input modifier to base for POW in ffvertex_progDaniel Scharrer2015-10-071-3/+14
| | | | | | | | | | | | The result of POW for a negative base is undefined. Even when the result is multiplied by zero (which is the case here whenever the base is negative), the Inf and NaNs can propagate past that. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91342 Signed-off-by: Daniel Scharrer <[email protected]> Cc: "10.6 11.0" <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> (cherry picked from commit b3f9c5cc0fab6d770d220efd87ba3746c6673875)
* meta: Handle array textures in scaled MSAA blitsIan Romanick2015-10-071-15/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code had some significant problems with respect to sampler2DArray textures. The biggest problem was that some of the code would use vec3 for the texture coordinate type, and other parts of the code would use vec2. The resulting shader would not even compile. Since there were not tests for this path, nobody noticed. The input to the fragment shader is always treated as a vec3. If the source data is only vec2, the vertex puller will supply 0 for the .z component. The texture coordinate passed to the fragment shader is always a vec2 that comes from the .xy part of the vertex shader input. The layer, taken from the .z of the vertex shader input is passed separately as a flat integer. If the generated fragment shader does not use the layer integer, the GLSL linker will eliminate all the dead code in the vertex shader. Fixes the new piglit tests "blit-scaled samples=2 with gl_texture_2d_multisample_array", etc. on i965. Note for stable maintainer: This patch may depend on 46037237, and that patch should be safe for stable. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Cc: Topi Pohjolainen <[email protected]> Cc: Jordan Justen <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit 9bd9cf1fa402bf948020ee5d560259a5cfd2a739)
* i915: Remember to call intel_prepare_render() before blittingVille Syrjälä2015-10-071-0/+5
| | | | | | | | | | | | | | | | | | Bring over the following fix from i965: commit fb3d62fe3d4fc40ba4ad9804d8b6f451316c9ae2 Author: Kenneth Graunke <[email protected]> Date: Tue Aug 6 14:36:09 2013 -0700 i965: Remember to call intel_prepare_render() before blitting. Fixes a crash in the following piglit tests: bin/fbo-sys-blit -auto bin/fbo-sys-sub-blit -auto Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit a1a3f0961b20907b6948959c1f224bb055bd4f3d)
* i915: Fix texcoord vs. varying collision in fragment programsVille Syrjälä2015-10-072-26/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i915 fragment programs utilize the texture coordinate registers for both texture coordinates and varyings. Unfortunately the code doesn't check if the same index might be in use for both. It just naively uses the index to pick a texture unit, which could lead to collisions. Add an extra mapping step to allocate non conflicting texture units for both uses. The issue can be reproduced with a pair of simple shaders like these: attribute vec4 in_mod; varying vec4 mod; void main() { mod = in_mod; gl_TexCoord[0] = gl_MultiTexCoord0; gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex; } varying vec4 mod; uniform sampler2D tex; void main() { gl_FragColor = texture2D(tex, vec2(gl_TexCoord[0])) * mod; } Fixes many piglit tests on i915: glsl-link-varyings-2 glsl-orangebook-ch06-bump interpolation-none-gl_frontcolor-smooth-fixed interpolation-none-gl_frontcolor-smooth-none interpolation-none-gl_frontcolor-smooth-vertex interpolation-none-gl_frontsecondarycolor-smooth-fixed interpolation-none-gl_frontsecondarycolor-smooth-vertex interpolation-none-gl_frontsecondarycolor-smooth-none interpolation-none-other-flat-fixed interpolation-none-other-flat-none interpolation-none-other-flat-vertex interpolation-none-other-smooth-fixed interpolation-none-other-smooth-none interpolation-none-other-smooth-vertex v2 [idr]: Minor formatting tweaks. Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "11.0" <[email protected]> (cherry picked from commit c349031c27b7f66151f07d785625c585e10a92c2)
* i830: Fix collision between I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0)Ville Syrjälä2015-10-071-4/+4
| | | | | | | | | | | | | | | I830_UPLOAD_RASTER_RULES and I830_UPLOAD_TEX(0) are trying to occupy the same bit. Move the texture bits upwards a bit to make room for I830_UPLOAD_RASTER_RULES. Now the driver will actually upload the raster rules which is rather important to get the provoking vertex right. Fixes the appearance of glxgears teeth on gen2. Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit 9504740f3e6698d860ac93310a33f51f01c10c4f)
* st/mesa: try PIPE_BIND_RENDER_TARGET when choosing float texture formatsBrian Paul2015-10-071-1/+5
| | | | | | | | | | | | | | For 8-bit RGB(A) texture formats we set the PIPE_BIND_RENDER_TARGET flag to try to get a hardware format which also supports rendering (for FBO textures). Do the same thing for floating point formats. This allows the Redway3D Flat demo to run. Cc: 10.6 11.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> (cherry picked from commit cb758b892a7e62ff1f6187f2ca9ac543ff70a096)
* i965/fs: Fix hang on IVB and VLV with image format mismatch.Francisco Jerez2015-10-071-4/+38
| | | | | | | | | | | | | | | | | | | | IVB and VLV hang sporadically when an untyped surface read or write message is used to access a surface of format other than RAW, as may happen when there is a mismatch between the format qualifier of the image uniform and the format of the actual image bound to the pipeline. According to the spec this condition gives undefined results but may not lead to program termination (which is one of the possible outcomes of the hang). Fix it by checking at runtime whether the surface is of the right type. Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit subtest. Reported-by: Mark Janes <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718 CC: [email protected] Reviewed-by: Ian Romanick <[email protected]> (cherry picked from commit b61292296bd7e1876fdb64725a783a7e96f6c4c1)
* mesa: fix mipmap generation for immutable, compressed texturesRoland Scheidegger2015-10-071-21/+15
| | | | | | | | | | | | | | | | | | | | | | If the immutable compressed texture didn't have the full mip pyramid, this didn't work, because it tried to generate mip levels for non-existing levels. _mesa_prepare_mipmap_level() would correctly handle this by returning FALSE if the mip level didn't exist, however we actually created the non-existing mip level right before that because we used _mesa_get_tex_image() before calling _mesa_prepare_mipmap_level(). It would then proceed to crash (we allocated the mip level, which is a bad idea on an immutable texture, but didn't initialize the values, leading to assertion failures or segfaults). Fix this by using _mesa_select_tex_image() instead and call it after _mesa_prepare_mipmap_level(), as that function will allocate missing mip levels for non-immutable textures already. This fixes a (2 year old) crash with astromenace which was hack-fixed in ubuntu packages instead: http://bugs.debian.org/718680 (I guess most apps do full mip chains - I believe this app not doing it is actually unintentional, always one level less than full mip chain...). Cc: "10.6 11.0" <[email protected]> Reviewed-by: Brian Paul <[email protected]> (cherry picked from commit 19604d30e1351868f7f54847c91ffec7b3fcd27e)
* st/mesa: fix front buffer regression after dropping st_validate_state in BlitMarek Olšák2015-10-071-0/+2
| | | | | | | | | Broken by: d082c5324914212f76e45be497229c7a0681f706 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92072 Cc: 10.6 11.0 <[email protected]> Tested-by: Ilia Mirkin <[email protected]> (cherry picked from commit f3a081953393c7d40bd8df9ec22a2551d01098f5)
* mesa: Use the effective internal format instead for validationEduardo Lima Mitev2015-09-281-0/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When validating format+type+internalFormat for texture pixel operations on GLES3, the effective internal format should be used if the one specified is an unsized internal format. Page 127, section "3.8 Texturing" of the GLES 3.0.4 spec says: "if internalformat is a base internal format, the effective internal format is a sized internal format that is derived from the format and type for internal use by the GL. Table 3.12 specifies the mapping of format and type to effective internal formats. The effective internal format is used by the GL for purposes such as texture completeness or type checks for CopyTex* commands. In these cases, the GL is required to operate as if the effective internal format was used as the internalformat when specifying the texture data." v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be considered sized internal formats. Return the corresponding unsize format instead. v4: * Improved comments in _mesa_es3_effective_internal_format_for_format_and_type(). * Splitted patch to separate chunk about reordering of error_check_subtexture_dimensions() error check, which is not directly related with this patch. v5: Dropped the splitted patch because it was actually a work around 3 dEQP tests that are buggy: dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt Cc: "11.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Mark Janes <[email protected]> (cherry picked from commit 5edd9961c15a80d557ba42f48c97a471b23d9c5e) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91582
* mesa: Move _mesa_base_tex_format() from teximage to glformats filesEduardo Lima Mitev2015-09-284-511/+507
| | | | | | | | | | | | | | | | | | | | | | | This function will be needed as part of validating the combination of format, type and internal format of texture pixel operations, which happens in glformats files. Specifically, we want to be able to obtain the base format of a resolved effective internal format, to compare it with the original internal format passed. Also, since this function deals solely with GL formats, it fits better in glformats where the rest of similar format functionality rests. The function is moved as-is, without any modification. Cc: "11.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Mark Janes <[email protected]> (cherry picked from commit c6bf1cd1467ea5d5370394ba99366dd8a59a385c) Signed-off-by: Emil Velikov <[email protected]> Conflicts: src/mesa/main/teximage.c src/mesa/main/teximage.h
* mesa: Fix order of format+type and internal format checks for glTexImageXD opsEduardo Lima Mitev2015-09-281-16/+25
| | | | | | | | | | | | | | | | | | | | The more specific GLES constrains should be checked after the general validation performed by _mesa_error_check_format_and_type(). This is also for consistency with the error checks order of glTexSubImage ops. v3: The change of order uncovered a bug that regresses a couple of piglit tests written against OpenGL-ES 1.1 spec, which expects an INVALID_VALUE instead of the INVALID_ENUM returned by _mesa_error_check_format_and_type() when an invalid format is passed to glTexImage2D. This version of the patch accounts for those cases. Fixes 1 dEQP test: * dEQP-GLES3.functional.negative_api.texture.teximage2d Cc: "11.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Mark Janes <[email protected]> (cherry picked from commit 15ab968f62dd322ecda6d70b1069f52616fe39bb)
* i965: Respect stride and subreg_offset for ATTR registersKristian Høgsberg Kristensen2015-09-281-1/+4
| | | | | | | | | | | | | When we assign hw regs to attributes, we don't incorporate the stride and subreg_offset from the fs_reg. It's rarely used, but the integer multiplication lowering uses unusual stride and subreg_offset combination breaks when one source is an attribute. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91970 Cc: "10.6 11.0" <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]> Reviewed-by: Matt Turner <[email protected]> (cherry picked from commit 2ea16966ae66d4dd5c5dcb996d7996d9c734bbee)
* t_dd_dmatmp: Use addition instead of subtraction in loop boundsIan Romanick2015-09-231-1/+1
| | | | | | | | | | | | | | This is used everywhere else in this file because it avoids problems when count is zero (due to trimming). No piglit regressions on i915 (G33) or radeon (Radeon 7500). Signed-off-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109 Reviewed-by: Brian Paul <[email protected]> Cc: Marius Predut <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit 25543d8ec506ef32599af6f5e0dd735e01b39909)
* t_dd_dmatmp: Pull out common 'count -= count & 3' codeIan Romanick2015-09-231-9/+6
| | | | | | | | | | | | | | This was missing in the HAVE_TRIANGLES path, and that could cause incorrect rendering. No piglit regressions on i915 (G33) or radeon (Radeon 7500). Signed-off-by: Ian Romanick <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38109 Reviewed-by: Brian Paul <[email protected]> Cc: Marius Predut <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit c0b3b2f7603eab210acdb2e654e5411fe912ca34)
* t_dd_dmatmp: Use '& 3' instead of '% 4' everywhereIan Romanick2015-09-231-2/+2
| | | | | | | | | No piglit regressions on i915 (G33) or radeon (Radeon 7500). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit 0d475ee2b989ac1697720ca391913e9158156bdc)
* t_dd_dmatmp: Clean up improper code formatting from previous patchIan Romanick2015-09-231-12/+6
| | | | | | | | | No piglit regressions on i915 (G33) or radeon (Radeon 7500). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Brian Paul <[email protected]> Cc: "10.6 11.0" <[email protected]> (cherry picked from commit fad8d54de7e7f908cb0d06f0b54af8440e689928)