summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* i965: Fix missing BRW_NEW_*_PROG_DATA flagging caused by cache reuse.Kenneth Graunke2015-10-282-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the case of two nearly identical GLSL fragment shaders: out vec4 color; void main() { color = vec4(1); } and layout(early_fragment_tests) in; out vec4 color; void main() { color = vec4(1); } These shaders compile to the exact same assembly, but have distinct values for brw_wm_prog_data::early_fragment_tests. Since these are two independent GLSL shaders, they have different program keys - notably, brw_wm_prog_key::program_string_id differs. When uploading the second, brw_upload_cache will find an existing copy of the assembly in the cache BO, which means matching_data will be non-NULL. Although we create a second cache item (with the new key and prog_data), we set item->offset to the existing copy and avoid re-uploading duplicate assembly. However, brw_search_cache() would only flag BRW_NEW_*_PROG_DATA if item->offset differed from the supplied offset. With reuse, both programs have the same offset, but prog_data changed. We have to flag it, but failed to. To fix this, we simply need to check if the aux (prog_data) pointer changed. If either the assembly or the prog_data differs, flag it. This fixes a regression since 1bba29ed403e735ba0bf04ed8aa2e571884f, where Topi fixed brw_upload_cache() to actually reuse identical assembly. Prior to that, reuse basically never happened due to bugs. Unfortunately, this code apparently wasn't prepared to handle reuse! Fixes GPU hangs in Dolphin on Broadwell. Huge thanks to Pierre Bourdon and Ilia Mirkin for debugging this and helping track down the real issue. Cc: "11.0" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92623 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Tested-by: Pierre Bourdon <[email protected]>
* clover: fix building fix clang-3.8Laurent Carlier2015-10-291-1/+5
| | | | | | | | | https://bugs.freedesktop.org/show_bug.cgi?id=92705 v2.1: use Linker::Flags::None instead of 0 and emplace_back() Signed-off-by: Laurent Carlier <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* nv50: add ARB_copy_image supportIlia Mirkin2015-10-282-7/+11
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add ARB_copy_image supportIlia Mirkin2015-10-282-7/+11
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: fix crash when nv50_miptree_from_handle failsJulien Isorce2015-10-281-1/+2
| | | | | Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* vbo: replace assertion with conditional in vbo_compute_max_verts()Brian Paul2015-10-281-1/+2
| | | | | | | | | With just the right sequence of per-vertex commands and state changes, it's possible for this assertion to fail (such as with viewperf11's lightwave-06-1 test). Instead of asserting, return 0 so that the caller knows the VBO is full and needs to be flushed. Reviewed-by: Charmaine Lee <[email protected]>
* mesa: minor formatting fix in get_tex_rgba_compressed()Brian Paul2015-10-281-2/+1
|
* st/mesa: implement ARB_copy_imageMarek Olšák2015-10-286-51/+616
| | | | | | I wonder if the craziness was worth it. Reviewed-by: Brian Paul <[email protected]>
* gallium: add PIPE_CAP_COPY_BETWEEN_COMPRESSED_AND_PLAIN_FORMATSMarek Olšák2015-10-2815-1/+17
| | | | | | For ARB_copy_image. Reviewed-by: Brian Paul <[email protected]>
* radeonsi: allow copying between compatible compressed and uncompressed formatsMarek Olšák2015-10-281-1/+1
| | | | | | | | which is where a block in src maps to a pixel in dst and vice versa. e.g. DXT1 <-> R32G32_UINT DXT5 <-> R32G32B32A32_UINT Reviewed-by: Michel Dänzer <[email protected]>
* mesa: set TargetIndex in VDPAURegister*SurfaceNV (v2)Marek Olšák2015-10-281-2/+3
| | | | | | | | | | | We initialized Target, but not TargetIndex. This is required since 7d7dd1871174905dfdd3ca874a09d9. v2: do it in the right place. Noticed by Brian Paul. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92645 Reviewed-by: Brian Paul <[email protected]>
* i965: remove unneeded src_reg copy in emit_shader_time_writeEmil Velikov2015-10-281-1/+1
| | | | | | | | The variable is already of type src_reg. creating a new instance only to destroy it seems unnecessary. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: remove cache_aux_free_func arrayEmil Velikov2015-10-282-12/+5
| | | | | | | | | | | | There is only one function that can be called, which is well known at compilation time. The abstraction used here seems unnecessary, so let's use a direct call to brw_stage_prog_data_free() when appropriate, cut down the size of struct brw_cache. Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* main: fix GL_MAX_NUM_ACTIVE_VARIABLES value for shader storage blocksSamuel Iglesias Gonsalvez2015-10-281-1/+20
| | | | | | | | | | | | | | | | The maximum number of active variables for shader storage blocks should take into account the specific rules for shader storage blocks, i.e. for an active shader storage block member declared as an array, an entry will be generated only for the first array element, regardless of its type. Fixes 3 dEQP-GLES31.functional.* tests: dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.named_block dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.unnamed_block dEQP-GLES31.functional.program_interface_query.shader_storage_block.active_variables.block_array Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* st/vdpau: disable RefPicList for Vdpau HEVCBoyuan Zhang2015-10-271-0/+1
| | | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* st/va: add VAAPI HEVC decode supportBoyuan Zhang2015-10-274-1/+208
| | | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/uvd: implement and add flag for VAAPI HEVC decodeBoyuan Zhang2015-10-272-0/+16
| | | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* vl: add RefPicList defines for VAAPI HEVC decodeBoyuan Zhang2015-10-271-0/+2
| | | | | | Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* mesa: Draw indirect is not allowed if the default VAO is bound.Marta Lofstedt2015-10-271-0/+12
| | | | | | | | | | | From OpenGL ES 3.1 specification, section 10.5: "DrawArraysIndirect requires that all data sourced for the command, including the DrawArraysIndirectCommand structure, be in buffer objects, and may not be called when the default vertex array object is bound." Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* winsys/amdgpu: remove the dcc_enable surface flagMarek Olšák2015-10-273-10/+7
| | | | | | dcc_size is sufficient and doesn't need a further comment in my opinion. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add debug flags that disable DCC and DCC fast clearMarek Olšák2015-10-273-0/+10
| | | | | | | For debugging, bug reports, etc. This is not in the radeonsi directory, but it is about radeonsi. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: properly check if DCC is enabled and allocatedMarek Olšák2015-10-275-8/+8
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: simplify DCC handling in si_initialize_color_surfaceMarek Olšák2015-10-271-7/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* mesa: Draw indirect is not allowed when xfb is active and unpausedMarta Lofstedt2015-10-271-0/+9
| | | | | | | | | | OpenGL ES 3.1 specification, section 10.5: "An INVALID_OPERATION error is generated if transform feedback is active and not paused." Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* mesa: Draw Indirect return wrong error code on unalingedMarta Lofstedt2015-10-271-4/+6
| | | | | | | | | | | | | | | | | | | | From OpenGL 4.4 specification, section 10.4 and Open GL Es 3.1 section 10.5: "An INVALID_VALUE error is generated if indirect is not a multiple of the size, in basic machine units, of uint." However, the current code follow the ARB_draw_indirect: https://www.opengl.org/registry/specs/ARB/draw_indirect.txt "INVALID_OPERATION is generated by DrawArraysIndirect and DrawElementsIndirect if commands source data beyond the end of a buffer object or if <indirect> is not word aligned." V2: After discussions on the list, it was suggested to only keep the INVALID_VALUE error. Signed-off-by: Marta Lofstedt <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* main: Remove interface block array index for doing the name comparisonSamuel Iglesias Gonsalvez2015-10-271-1/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From ARB_program_query_interface spec: "uint GetProgramResourceIndex(uint program, enum programInterface, const char *name); [...] If <name> exactly matches the name string of one of the active resources for <programInterface>, the index of the matched resource is returned. Additionally, if <name> would exactly match the name string of an active resource if "[0]" were appended to <name>, the index of the matched resource is returned. [...]" "A string provided to GetProgramResourceLocation or GetProgramResourceLocationIndex is considered to match an active variable if: [...] * if the string identifies the base name of an active array, where the string would exactly match the name of the variable if the suffix "[0]" were appended to the string; [...] " Fixes the following two dEQP-GLES31 tests: dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element v2: - Add AoA support (Timothy) - Apply it too for GetUniformLocation(), GetUniformName() and others because ARB_program_interface_query says that they are equivalent to GetProgramResourceLocation() and GetProgramResourceName() (Tapani) Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* vc4: Add support for copy propagation with unpack flags present.Eric Anholt2015-10-262-36/+109
| | | | | total instructions in shared programs: 89251 -> 87862 (-1.56%) instructions in affected programs: 52971 -> 51582 (-2.62%)
* vc4: Rewrite the pack instructions as a MOV with a dst pack flagEric Anholt2015-10-263-37/+18
| | | | Another step in reducing the special-casing of instructions.
* vc4: Move dst pack setup out to a helper function with more asserts.Eric Anholt2015-10-261-10/+22
|
* vc4: Switch the unpack ops to being unpack flags on a mov.Eric Anholt2015-10-266-123/+42
| | | | | | | | | | | | This paves the way for copy propagating our unpacks. We end up with a small change on shader-db: total instructions in shared programs: 89390 -> 89251 (-0.16%) instructions in affected programs: 19041 -> 18902 (-0.73%) which appears to be because we no longer convert MOVs for an FMAX dst, r4.unpack, r4.unpack (instead of the previous MOV dst, r4.unpack), and this ends up with a slightly better schedule.
* vc4: Drop some confused code about pack/unpack handling.Eric Anholt2015-10-261-23/+4
| | | | | | | | | At one point I thought packs and unpacks were in the same field of the instruction. They aren't. These instructions therefore never cause a pack. total instructions in shared programs: 89472 -> 89390 (-0.09%) instructions in affected programs: 15261 -> 15179 (-0.54%)
* vc4: Reduce MOV special-casing in QIR-to-QPU.Eric Anholt2015-10-261-8/+11
| | | | | I'm going to introduce some more types of MOV, which also want the elision of raw MOVs.
* vc4: Fix up the test for whether the unpack can be from r4.Eric Anholt2015-10-263-8/+27
| | | | We can do 16a/16b from float as well. No difference on shader-db.
* vc4: Don't try to follow MOVs across a pack.Eric Anholt2015-10-261-1/+2
|
* vc4: Only copy propagate raw MOVs.Eric Anholt2015-10-261-6/+1
| | | | No problems being fixed, but needed for the new unpack changes.
* vc4: If a QIR source has an unpack set, print it.Eric Anholt2015-10-263-3/+13
| | | | Not used yet, but will be.
* glsl: Convert TES gl_PatchVerticesIn into a constant when using a TCS.Kenneth Graunke2015-10-261-0/+16
| | | | | | | | | | | | When a TCS is present, the TES input gl_PatchVerticesIn is actually a constant - it's simply the # of output vertices specified by the TCS layout qualifiers. So, we can replace the system value with a constant, which may allow further optimization, and will likely be more efficient. If the TCS is absent, we can't do this optimization. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965: Add missing close-parenthesis in error messagesIan Romanick2015-10-261-2/+2
| | | | | | Trivial. Signed-off-by: Ian Romanick <[email protected]>
* i965: Fix is-renderable check in intel_image_target_renderbuffer_storageIan Romanick2015-10-261-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | Previously we could create a renderbuffer with format MESA_FORMAT_R8G8B8A8_UNORM, convert that renderbuffer to an EGLImage, then FAIL to convert the EGLImage back to a renderbuffer because reasons. Just use the same check in intel_image_target_renderbuffer_storage that brw_render_target_supported uses. There are more checks in brw_render_target_supported, but I don't think they are necessary here. A different approach would be to refactor brw_render_target_supported to take rb->Format and rb->NumSamples as parameters (instead of a gl_renderbuffer) and use the new function here. Fixes: ES2-CTS.gtf.GL2ExtensionTests.egl_image.egl_image Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Tested-by: Tapani Pälli <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92476 Cc: "10.3 10.4 10.5 10.6 11.0" <[email protected]>
* glsl: keep track of intra-stage indices for atomicsTimothy Arceri2015-10-2711-40/+96
| | | | | | | | | | | | | | | This is more optimal as it means we no longer have to upload the same set of ABO surfaces to all stages in the program. This also fixes a bug where since commit c0cd5b var->data.binding was being used as a replacement for atomic buffer index, but they don't have to be the same value they just happened to end up the same when binding is 0. Reviewed-by: Francisco Jerez <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: Ilia Mirkin <[email protected]> Cc: Alejandro Piñeiro <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90175
* gallivm: disable f16c when not using AVXRoland Scheidegger2015-10-261-0/+3
| | | | | | | | | | | | | | | | | f16c intrinsic can only be emitted when AVX is used. So when we disable AVX due to forcing 128bit vectors we must not use this intrinsic (depending on llvm version, this worked previously because llvm used AVX even when we didn't tell it to, however I've seen this fail with llvm 3.3 since 718249843b915decf8fccec92e466ac1a6219934 which seems to have the side effect of disabling avx in llvm albeit it only touches sse flags really, but with ea421e919ae6e72e1319fb205c42a6fb53ca2f82 it's now really disabled). Albeit being able to use AVX with 128bit vectors also would have its uses, the code as is really was meant to emulate jit code creation for less capable cpus. v2: add some (ifdefed out) missing de-featuring options for simulating less capable cpus. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* st/va: pass picture desc to begin and decodeJulien Isorce2015-10-261-2/+2
| | | | | | | | | | At least vl_mpeg12_decoder uses the picture desc in begin_frame and decode_bitstream. https://bugs.freedesktop.org/show_bug.cgi?id=92634 Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian König <[email protected]>
* mesa: add additional checks for uniform location queryTapani Pälli2015-10-261-0/+8
| | | | | | | | | | | | | | | | Patch adds additional check to make sure we don't return locations for structures or arrays of structures. From page 79 of the OpenGL 4.2 spec: "A valid name cannot be a structure, an array of structures, or any portion of a single vector or a matrix." v2: use without-array() to simplify code (Timothy) No Piglit or CTS regressions observed. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Make brw_varying_to_offset take a const pointer to the VUE map.Kenneth Graunke2015-10-241-2/+2
| | | | | | | It doesn't modify it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* vc4: Fix names of the 16-bit unpacksEric Anholt2015-10-243-6/+6
| | | | | They're only f16-to-f32 on a float operation, otherwise they're i16-to-i32.
* vc4: Don't try to register coalesce into the VPM across non-raw MOVs.Eric Anholt2015-10-241-1/+1
| | | | | No known bugs, just something I noticed while updating optimization code for other changes.
* vc4: Take advantage of the 8888 pack function in pack_unorm_4x8.Eric Anholt2015-10-241-0/+14
| | | | | | | | | | One instruction instead of four, and it turns out you do this a lot for the Over operator. total uniforms in shared programs: 32168 -> 32087 (-0.25%) uniforms in affected programs: 318 -> 237 (-25.47%) total instructions in shared programs: 89830 -> 89472 (-0.40%) instructions in affected programs: 6434 -> 6076 (-5.56%)
* vc4: Fix the test for skipping raw MOVs.Eric Anholt2015-10-243-1/+10
| | | | | I don't know what previous test was trying to do, but it dates back to the first add of vc4_qpu_emit.c. No change to shader-db.
* i965: Remove unused devinfo revisionBen Widawsky2015-10-243-5/+13
| | | | | | | | | | | | | | | | I left the function to obtain the revision because it is, and will continue to be useful in the future. I'd rather not have to dig it up every time we need it. Comments left at the implementation to say as much. This was accidentally left here when I moved the early platform support: commit 28ed1e08e8ba98ebd4ff0b56326372f0df9c73ad Author: Ben Widawsky <[email protected]> Date: Fri Aug 7 13:58:37 2015 -0700 i965/skl: Remove early platform support Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* freedreno: remove unnecessary null checksRob Clark2015-10-244-13/+13
| | | | | | | | According to piglit/xonotic/neverball/stc, blend/rasterize/zsa state will always be bound (never null). And the null checks were in- consistent anyways, so remove them. Signed-off-by: Rob Clark <[email protected]>