aboutsummaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* mesa: rename gl_shader_program's NumUniformBlocks to NumBufferInterfaceBlocksSamuel Iglesias Gonsalvez2015-09-2911-26/+26
| | | | | | | | | | | Because it counts shader storage blocks too. v2: - Use NumBufferInterfaceBlocks instead (Jordan). Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* main: fix ACTIVE_UNIFORM_BLOCKS valueSamuel Iglesias Gonsalvez2015-09-291-1/+5
| | | | | | | | | | | | | | NumUniformBlocks also counts shader storage blocks. NumUniformBlocks variable will be renamed in a later patch to avoid misunderstandings. v2: - Modify the condition to use !IsShaderStorage and the list of uniform blocks (Timothy) Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/gen9: Add a condition for starting pixel in fast copy blitAnuj Phogat2015-09-281-0/+4
| | | | | | | | | | | | This condition restricts the use of fast copy blit to cases where starting pixel of src and dst is oword (16 byte) aligned. Many piglit tests (if using fast copy blit in Mesa) failed earlier because I missed adding this condition.Fast copy blit is currently enabled for use only with Yf/Ys tiling. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* nouveau: wait to unref the transfer's bo until it's no longer usedIlia Mirkin2015-09-281-2/+3
| | | | | | | | | | The bo will often come from a slab in which case it doesn't matter. But for larger allocations this will be in its own bo, and we have to make sure to wait until it's no longer used in order for it to be freed. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] Tested-by: Marcin Ślusarz <[email protected]>
* nouveau: delay deleting buffer with unflushed fenceIlia Mirkin2015-09-282-2/+10
| | | | | | | | | | | If there is an unflushed fence on the bo, then the resource may still be used in commands built up in the local pushbuf. Flushing can cause all sorts of unwanted effects, so just free the bo when the relevant fence is hit. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] Tested-by: Marcin Ślusarz <[email protected]>
* nouveau: be more careful about freeing temporary transfer buffersIlia Mirkin2015-09-285-4/+30
| | | | | | | | | Deleting a buffer does not flush the command stream. Make sure that we wait for the copies to finish before deleting the temporary bo. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected] Tested-by: Marcin Ślusarz <[email protected]>
* i965: Rename intel_miptree_get_dimensions_for_image()Anuj Phogat2015-09-285-10/+14
| | | | | | | | | | | This function isn't specific to miptrees. So, drop the "miptree" from function name. V3: Add a comment explaining how the 1D Array texture height and depth is interpreted by Intel hardware. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/gen9: Fix {src, dst}_pitch alignment check for XY_FAST_COPY_BLTAnuj Phogat2015-09-281-11/+7
| | | | | | | | | I misinterpreted the alignmnet restriction in XY_FAST_COPY_BLT earlier. Instead of checking pitch for 64KB alignmnet we need to check it for tile widh alignment. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Fix {src, dst}_pitch alignment check for XY_SRC_COPY_BLTAnuj Phogat2015-09-281-2/+7
| | | | | | | | | | | | | | Current code checks the alignment restrictions only for Y tiling. From Broadwell PRM vol 10: "pitch is of 512Byte granularity for Tile-X: This means the tiled-x surface pitch can be (512, 1024, 1536, 2048...)/4 (in Dwords)." This patch adds the restriction for X tiling as well. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Move conversion of {src, dst}_pitch to dwords outside if/elseAnuj Phogat2015-09-281-16/+9
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Delete temporary variable 'src_pitch'Anuj Phogat2015-09-281-5/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Use helper function intel_get_tile_dims() in surface setupAnuj Phogat2015-09-281-2/+12
| | | | | | | | | | It takes care of using the correct tile width if we later use other tiling patterns for aux miptree. V2: Remove the comment about using Yf for aux miptree. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Use intel_get_tile_dims() to get tile masksAnuj Phogat2015-09-284-33/+28
| | | | | | | | | | | | | | | This will require change in the parameters passed to intel_miptree_get_tile_masks(). V2: Rearrange the order of parameters. (Ben) Change the name to intel_get_tile_masks(). (Topi) V3: Use temporary variables in intel_get_tile_masks() for clarity. Fix mask_y computation. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Add a helper function intel_get_tile_dims()Anuj Phogat2015-09-282-22/+64
| | | | | | | | | | | | | | V2: - Do the tile width/height computations in the new helper function and use it later in intel_miptree_get_tile_masks(). - Change the name to intel_get_tile_dims(). V3: Return the tile_h in number of rows in place of bytes. Document the units of tile_w, tile_h parameters. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* mesa: Use the effective internal format instead for validationEduardo Lima Mitev2015-09-281-0/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When validating format+type+internalFormat for texture pixel operations on GLES3, the effective internal format should be used if the one specified is an unsized internal format. Page 127, section "3.8 Texturing" of the GLES 3.0.4 spec says: "if internalformat is a base internal format, the effective internal format is a sized internal format that is derived from the format and type for internal use by the GL. Table 3.12 specifies the mapping of format and type to effective internal formats. The effective internal format is used by the GL for purposes such as texture completeness or type checks for CopyTex* commands. In these cases, the GL is required to operate as if the effective internal format was used as the internalformat when specifying the texture data." v2: Per the spec, Luminance8Alpha8, Luminance8 and Alpha8 should not be considered sized internal formats. Return the corresponding unsize format instead. v4: * Improved comments in _mesa_es3_effective_internal_format_for_format_and_type(). * Splitted patch to separate chunk about reordering of error_check_subtexture_dimensions() error check, which is not directly related with this patch. v5: Dropped the splitted patch because it was actually a work around 3 dEQP tests that are buggy: dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_offset dEQP-GLES2.functional.negative_api.texture.texsubimage2d_offset_allowed dEQP-GLES2.functional.negative_api.texture.texsubimage2d_neg_wdt_hgt Cc: "11.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Mark Janes <[email protected]>
* mesa: Move _mesa_base_tex_format() from teximage to glformats filesEduardo Lima Mitev2015-09-284-378/+507
| | | | | | | | | | | | | | | | | This function will be needed as part of validating the combination of format, type and internal format of texture pixel operations, which happens in glformats files. Specifically, we want to be able to obtain the base format of a resolved effective internal format, to compare it with the original internal format passed. Also, since this function deals solely with GL formats, it fits better in glformats where the rest of similar format functionality rests. The function is moved as-is, without any modification. Cc: "11.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Mark Janes <[email protected]>
* mesa: Fix order of format+type and internal format checks for glTexImageXD opsEduardo Lima Mitev2015-09-281-16/+25
| | | | | | | | | | | | | | | | | | | The more specific GLES constrains should be checked after the general validation performed by _mesa_error_check_format_and_type(). This is also for consistency with the error checks order of glTexSubImage ops. v3: The change of order uncovered a bug that regresses a couple of piglit tests written against OpenGL-ES 1.1 spec, which expects an INVALID_VALUE instead of the INVALID_ENUM returned by _mesa_error_check_format_and_type() when an invalid format is passed to glTexImage2D. This version of the patch accounts for those cases. Fixes 1 dEQP test: * dEQP-GLES3.functional.negative_api.texture.teximage2d Cc: "11.0" <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Tested-by: Mark Janes <[email protected]>
* egl: Fix missing Haiku include pathAlexander von Gluck IV2015-09-281-0/+1
|
* state_trackers/hgl: Fix missing include pathAlexander von Gluck IV2015-09-281-0/+1
|
* i965/fs: Fix hang on IVB and VLV with image format mismatch.Francisco Jerez2015-09-281-4/+38
| | | | | | | | | | | | | | | | | | | IVB and VLV hang sporadically when an untyped surface read or write message is used to access a surface of format other than RAW, as may happen when there is a mismatch between the format qualifier of the image uniform and the format of the actual image bound to the pipeline. According to the spec this condition gives undefined results but may not lead to program termination (which is one of the possible outcomes of the hang). Fix it by checking at runtime whether the surface is of the right type. Fixes the "arb_shader_image_load_store.invalid/format mismatch" piglit subtest. Reported-by: Mark Janes <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91718 CC: [email protected] Reviewed-by: Ian Romanick <[email protected]>
* clover: Implement clCreateImage?D w/ clCreateImage.Serge Martin2015-09-281-52/+8
| | | | | | | Remplace clCreateImage2D and clCreateImage3D implementation with call to clCreateImage. Reviewed-by: Francisco Jerez <[email protected]>
* clover: Implement CL1.2 clCreateImage().Serge Martin2015-09-281-10/+91
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* clover: Move down canonicalization of memory object flags into validate_flags().Francisco Jerez2015-09-281-39/+40
| | | | | | | | This will be used to share the same logic between buffer and image creation. v2: Make memory flag set constants local to validate_flags. (Serge Martin)
* glsl: revert "glsl: atomic counters can be declared as buffer-qualified ↵Iago Toral Quiroga2015-09-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | variables" This reverts commit 586142658e2927a68c. The specs are not explicit about any restrictions related to the types allowed on buffer variables, however, the description of opaque types (like atomic counters) is in conclict with the purpose of buffer variables: "The opaque types declare variables that are effectively opaque handles to other objects. These objects are accessed through built-in functions, not through direct reading or writing of the declared variable. (...) Opaque variables cannot be treated as l-values;(...)" Also, Mesa is already disallowing opaque types in interface blocks anyway, so that commit was not really achieving anything. Reviewed-by: Francisco Jerez <[email protected]>
* gallium/util: avoid unreferencing random memory on buffer alloc failureIlia Mirkin2015-09-281-1/+1
| | | | | | | | Found by Coverity Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Albert Freeman <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* mesa: don't leak interface_nameIlia Mirkin2015-09-281-0/+1
| | | | | | | Found by Coverity Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: fix component size calculation for tessellation and geom shadersTimothy Arceri2015-09-281-1/+1
| | | | | | Broken in commit abdab88b30ab when adding arrays of arrays support Reviewed-by: Dave Airlie <[email protected]>
* i965/gs: Optimize away the EOT write on Gen8+ with static vertex count.Kenneth Graunke2015-09-261-0/+15
| | | | | | | | | | | | | | | | | | | With static vertex counts, the final EOT write doesn't actually write any data - it's just there to end the thread. Typically, the last thing before ending the thread will be an EmitVertex() call, resulting in a URB write. We can just set EOT on that. Note that this isn't always possible - there might be an intervening SSBO write/image store, or the URB write may have been in a loop. shader-db statistics for geometry shaders only: total instructions in shared programs: 3173 -> 3149 (-0.76%) instructions in affected programs: 176 -> 152 (-13.64%) helped: 8 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/gs: Allow src0 immediates in GS_OPCODE_SET_WRITE_OFFSET.Kenneth Graunke2015-09-262-2/+14
| | | | | | | | | | | | | | | | | | | GS_OPCODE_SET_WRITE_OFFSET is a MUL with a constant src[1] and special strides. We can easily make the generator handle constant src[0] arguments by instead generating a MOV with the product of both operands. This isn't necessarily a win in and of itself - instead of a MUL, we generate a MOV, which should be basically the same cost. However, we can probably avoid the earlier MOV to put src[0] into a register. shader-db statistics for geometry shaders only: total instructions in shared programs: 3207 -> 3173 (-1.06%) instructions in affected programs: 3207 -> 3173 (-1.06%) helped: 11 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Implement "Static Vertex Count" geometry shader optimization.Kenneth Graunke2015-09-265-4/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Broadwell's 3DSTATE_GS contains new "Static Output" and "Static Vertex Count" fields, which control a new optimization. Normally, geometry shaders can output arbitrary numbers of vertices, which means that resource allocation has to be done on the fly. However, if the number of vertices is statically known, the hardware can pre-allocate resources up front, which is more efficient. Thanks to the new NIR GS intrinsics, this is easy. We just call the function introduced in the previous commit to get the vertex count. If it obtains a count, we stop emitting the extra 32-bit "Vertex Count" field in the VUE, and instead fill out the 3DSTATE_GS fields. Improves performance of Gl32GSCloth by 5.16347% +/- 0.12611% (n=91) on my Lenovo X250 laptop (Broadwell GT2) at 1024x768. shader-db statistics for geometry shaders only: total instructions in shared programs: 3227 -> 3207 (-0.62%) instructions in affected programs: 242 -> 222 (-8.26%) helped: 10 v2: Don't break non-NIR paths (just skip this optimization). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Move GS_THREAD_END mlen calculations out of the generator.Kenneth Graunke2015-09-262-2/+2
| | | | | | | | | | The visitor was setting a mlen that was wrong for Broadwell, but the generator was ignoring it and doing the right thing regardless. We may as well move the logic fully into the visitor. This will be useful in the next commit as well. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* nir: Add a function to count the number of vertices a GS emits.Kenneth Graunke2015-09-263-0/+96
| | | | | | | | | | | Some hardware (such as Broadwell) can run geometry shaders more efficiently when the number of vertices emitted is statically known. This pass provides a way to obtain the constant vertex count, or -1 indicating that the vertex count is unknown/non-constant. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Simplify handling of VUE map changes.Kenneth Graunke2015-09-264-42/+17
| | | | | | | | | | | | | | | | | | | | The old code was disasterously complex - spread across multiple atoms which may not even run, inspecting the dirty bits to try and decide whether it was necessary to do checks...storing VS information in brw_context...extra flagging... This code tripped me and Carl up very badly when working on the shader cache code. It's very fragile and hard to maintain. Now that geometry shaders only depend on their inputs and don't have to worry about the VS VUE map, we can dramatically simplify this: just compute the VUE map coming out of the geometry shader stage in brw_upload_programs. If it changes, flag it. Done. v2: Also check vue_map.separable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/gs: Remove the dependency on the VS VUE map.Kenneth Graunke2015-09-262-11/+14
| | | | | | | | | | | | | | | | Because we only support geometry shaders in core profile, we can safely ignore any driver-extending of VS outputs. Those are: - Legacy userclipping (doesn't exist in core profile) - Edgeflag copying (Gen4-5 only, no GS support) - Point coord replacement (Gen4-5 only, no GS support) - front/back color hacks (Gen4-5 only, no GS support) v2: Rebase; leave a comment about why SSO works. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Don't re-layout varyings for separate shader programs.Kenneth Graunke2015-09-265-18/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, our VUE map code always assigned slots to varyings sequentially, in one contiguous block. This was a bad fit for separate shaders - the GS input layout depended or the VS output layout, so if we swapped out vertex shaders, we might have to recompile the GS on the fly - which rather defeats the point of using separate shader objects. (Tessellation would suffer from this as well - we could have to recompile the HS, DS, and GS.) Instead, this patch makes the VUE map for separate shaders use a fixed layout, based on the input/output variable's location field. (This is either specified by layout(location = ...) or assigned by the linker.) Corresponding inputs/outputs will match up by location; if there's a mismatch, we're allowed to have undefined behavior. This may be less efficient - depending what locations were chosen, we may have empty padding slots in the VUE. But applications presumably use small consecutive integers for locations, so it hopefully won't be much worse in practice. 3% of Dota 2 Reborn shaders are hurt, but only by 2 instructions. This seems like a small price to pay for avoiding recompiles. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/vue: Make assign_vue_map() take an explicit slot.Kenneth Graunke2015-09-261-16/+19
| | | | | | | | | | | | Our plan of assigning consecutive slots doesn't work properly for separate shader objects - at least, if we want to avoid recompiling them whenever the interface changes. As a first step, make assign_vue_map take an explicit slot parameter, rather than implicitly incrementing it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Initialize unused VUE map slots to BRW_VARYING_SLOT_PAD.Kenneth Graunke2015-09-261-1/+1
| | | | | | | | | | Nothing actually relies on unused slots being initialized to BRW_VARYING_SLOT_COUNT. Soon, we're going to have VUE maps with holes in them, at which point pre-filling with BRW_VARYING_SLOT_PAD make a lot more sense. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Fix BRW_VARYING_SLOT_PAD handling in the scalar VS backend.Kenneth Graunke2015-09-261-4/+2
| | | | | | | | | | | | | We can't just break for padding slots. Instead, treat them like unwritten output variables, so we handle flushing and incrementing urb_offset correctly. Paul introduced the concept of padding slots back in 2011, but we've never actually used them for anything. So it's unsurprising that the scalar VS backend didn't handle them quite right. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* main/tests: Enable glShaderStorageBlockBinding() check in dispatch_sanity testSamuel Iglesias Gonsalvez2015-09-261-1/+1
| | | | | Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: calculate component size for arrays of arrays when varying packing ↵Timothy Arceri2015-09-261-3/+10
| | | | | | | | disabled Signed-off-by: Timothy Arceri <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: validate binding qualifier for AoATimothy Arceri2015-09-261-1/+1
| | | | | Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: add helper for calculating size of AoATimothy Arceri2015-09-261-0/+19
| | | | | | | V2: return 0 if not array rather than -1 Reviewed-by: Tapani Pälli <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: clean-up link uniform codeTimothy Arceri2015-09-261-11/+6
| | | | | | | These changes are also needed to allow linking of struct and interface arrays of arrays. Reviewed-by: Thomas Helland <[email protected]>
* radeonsi: add scratch buffer to the buffer list when it's re-allocatedMarek Olšák2015-09-261-0/+1
| | | | | Reviewed-by: Michel Dänzer <[email protected]> Cc: [email protected]
* radeon/vce: fix vui time_scale zero errorLeo Liu2015-09-251-0/+3
| | | | | | | | | if app pass 0 as frame_rate_num, it should not be encoded to the VUI. Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Christian König <[email protected]> Cc: "10.6 11.0" <[email protected]>
* mesa: Add locking to programs.Matt Turner2015-09-252-8/+12
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: Add locking to sampler objects.Matt Turner2015-09-252-4/+7
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* mesa: Remove debugging code from _mesa_reference_*.Matt Turner2015-09-256-61/+0
| | | | | Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* glsl: fix packed varyings interface type and add default caseTapani Pälli2015-09-251-0/+4
| | | | | | | | fixes Piglit test: arb_program_interface_query/linker/query-varyings.shader_test Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* glsl: Mark as active all elements of shared/std140 block arraysAntia Puentes2015-09-251-0/+24
| | | | | | | | | | | | | | | | | | | | | | | Commit 1ca25ab (glsl: Do not eliminate 'shared' or 'std140' blocks or block members) considered as active 'shared' and 'std140' uniform blocks and uniform block arrays, but did not include the block array elements. Because of that, it was possible to have an active uniform block array without any elements marked as used, making the assertion ((b->num_array_elements > 0) == b->type->is_array()) in link_uniform_blocks() fail. Fixes the following 5 dEQP tests: * dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.18 * dEQP-GLES3.functional.ubo.random.nested_structs_instance_arrays.24 * dEQP-GLES3.functional.ubo.random.nested_structs_arrays_instance_arrays.19 * dEQP-GLES3.functional.ubo.random.all_per_block_buffers.49 * dEQP-GLES3.functional.ubo.random.all_shared_buffer.36 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83508 Tested-by: Tapani Pälli <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>