aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* st/mesa: remove excessive shader state dirtyingMarek Olšák2016-07-307-57/+33
| | | | | | | | | This just needs to be done by st_validate_state. v2: add "shaders_may_be_dirty" flags for not skipping st_validate_state on _NEW_PROGRAM to detect real shader changes Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: unreference optional shaders when unbindingMarek Olšák2016-07-301-0/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: skip updates of states that have no effectMarek Olšák2016-07-302-9/+28
| | | | | | v2: - also don't check edge flags for GLES Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: completely rewrite state atomsMarek Olšák2016-07-3033-516/+381
| | | | | | | | | | | | | | | | | | | | The goal is to do this in st_validate_state: while (dirty) atoms[u_bit_scan(&dirty)]->update(st); That implies that atoms can't specify which flags they consume. There is exactly one ST_NEW_* flag for each atom. (58 flags in total) There are macros that combine multiple flags into one for easier use. All _NEW_* flags are translated into ST_NEW_* flags in st_invalidate_state. st/mesa doesn't keep the _NEW_* flags after that. torcs is 2% faster between the previous patch and the end of this series. v2: - add st_atom_list.h to Makefile.sources Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: remove st_tracked_state::nameMarek Olšák2016-07-3020-58/+0
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: remove atom debugging codeMarek Olšák2016-07-301-67/+3
| | | | | | This won't be needed after the rewrite. Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Fix move_interpolation_to_top() pass.Kenneth Graunke2016-07-291-21/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | The pass I introduced in commit a2dc11a7818c04d8dc0324e8fcba98d60bae was entirely broken. A missing "break" made the load_interpolated_input case always fall through to "default" and hit a "continue", making it not actually move any load_interpolated_input intrinsics at all. It would only move the simple load_barycentric_* intrinsics, which don't emit any code anyway, making it basically useless. The initial version I sent of the pass worked, but I apparently failed to verify that the simplified version in v2 actually worked. With the obvious fix applied (so we actually tried to move load_interpolated_input intrinsics), I discovered a second bug: we weren't moving the offset SSA def to the top, breaking SSA validation. The new version of the pass actually moves load_interpolated_input intrinsics and all their dependencies, as intended. Papers over GPU hangs on Ivybridge and Baytrail caused by the recent NIR FS input rework by restoring the old behavior. (I'm not honestly sure why they hang with PLN not at the top.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97083 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* st_glsl_to_tgsi: only skip over slots of an input array that are presentNicolai Hähnle2016-07-281-1/+5
| | | | | | | | | | When an application declares varying arrays but does not actually do any indirect indexing, some array indices may end up unused in the consuming shader, so the number of input slots that correspond to the array ends up less than the array_size. Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>
* i965: remove unnecessary null checkTimothy Arceri2016-07-281-4/+1
| | | | | | | | We would have hit a segfault already if this could be null. Fixes Coverity warning spotted by Matt. Reviewed-by: Matt Turner <[email protected]>
* vbo: Fix handling of POS/GENERIC0 attributes.Mathias Fröhlich2016-07-271-3/+16
| | | | | | | | | | | | | In case of split primitives we need to restore the original setting of the vtx.attrsz array to make immediate mode attribute array tracking work. v2: Use bool instead of boolean. Signed-off-by: Mathias Fröhlich <[email protected]> Reviewed-by: Brian Paul <[email protected]> Tested-by: Brian Paul <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96950
* mesa: standardize naming Mesa3D, MESA -> MesaVedran Miletić2016-07-261-1/+1
| | | | | Signed-off-by: Vedran Miletić <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* mesa: Make MESA_SHADER_CAPTURE_PATH skip shaders with Name == -1.Kenneth Graunke2016-07-261-1/+1
| | | | | | | | Shaders with shProg->Name == ~0 (aka 4294967295) are internal meta shaders that we don't really want to capture. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Avoid aliasing violation in uniform_query.cpp.Matt Turner2016-07-261-14/+31
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: Avoid aliasing violation in FXT1.Matt Turner2016-07-261-2/+2
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* swrast: Avoid aliasing violation.Matt Turner2016-07-261-2/+2
| | | | | Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* glsl: Separate overlapping sentinel nodes in exec_list.Matt Turner2016-07-263-3/+3
| | | | | | | | | | | I do appreciate the cleverness, but unfortunately it prevents a lot more cleverness in the form of additional compiler optimizations brought on by -fstrict-aliasing. No difference in OglBatch7 (n=20). Co-authored-by: Davin McCall <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/miptree: Stop multiplying cube depth by 6 in HiZ calculationsJason Ekstrand2016-07-261-17/+2
| | | | | | | | | intel_mipmap_tree::logical_depth0 is now in number of 2D slices so we no longer need to be multiplying by 6. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Cc: "12.0" <[email protected]>
* i965/miptree/isl: Stop multiplying depth by 6 for cubesJason Ekstrand2016-07-261-5/+0
| | | | | | | | | | Now that the logical_depth0 field is in number of 2D slices, we don't need to be multiplying by 6 when creating the surface. It wasn't hurting anything primarily because we get the actual length from the view which was already handling it correctly. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/blorp/gen8: Stop multiplying depth by 6 for cubesJason Ekstrand2016-07-261-4/+1
| | | | | | | | intel_mipmap_tree::logical_depth0 is now in 2-D slices so there is no need for us to multiply by 6 when we go to fill out a blorp surface state. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* main: memcpy larger chunks in _mesa_propagate_uniforms_to_driver_storageNils Wallménius2016-07-251-6/+23
| | | | | | | | | | | | | | | | When possible, do the memcpy on larger blocks. This reduces cycles spent in _mesa_propagate_uniforms_to_driver_storage from 1.51 % to 0.62% according to perf during the Unigine Heaven benchmark. It did not affect the framerate of the benchmark. The system used for testing was an i5 6600K with a Radeon R9 380. Piglit hangs randomly on this system both with and without the patch so i could not make a comparison. v2: fixed whitespace Signed-off-by: Nils Wallménius <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: reuse main extension table to appropriately restrict extensionsIlia Mirkin2016-07-235-26/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were only restricting based on ES/non-ES-ness and whether the overall enable bit had been flipped on. However we have been adding more fine-grained restrictions, such as based on compat profiles, as well as specific ES versions. Most of the time this doesn't matter, but it can create awkward situations and duplication of logic. Here we separate the main extension table into a separate object file, linked to the glsl compiler, which makes use of it with a custom function which takes the ES-ness of the shader into account (thus allowing desktop shaders to properly use ES extensions that would otherwise have been disallowed.) We can also now use this logic to generate #define's for all supported extensions automatically, removing the duplicate (and often inaccurate) list in glcpp. The effect of this change should be nil in most cases. However in some situations, extensions like GL_ARB_gpu_shader5 which were formerly available in compat contexts on the GLSL side of things will now become inaccessible. This regresses two ES CTS tests: ES3-CTS.shaders.shader_integer_mix.define ES31-CTS.shader_integer_mix.define however that is due to them using #version 100 instead of 300 es. As the extension is only defined for ES3, I believe this is the correct behavior. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> (v2) v2 -> v3: integrate glcpp defines into the same mechanism
* gallium: split transfer_inline_write into buffer and texture callbacksMarek Olšák2016-07-232-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | to reduce the call indirections with u_resource_vtbl. The worst call tree you could get was: - u_transfer_inline_write_vtbl - u_default_transfer_inline_write - u_transfer_map_vtbl - driver_transfer_map - u_transfer_unmap_vtbl - driver_transfer_unmap That's 6 indirect calls. Some drivers only had 5. The goal is to have 1 indirect call for drivers that care. The resource type can be determined statically at most call sites. The new interface is: pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data) pipe_context::texture_subdata(ctx, resource, level, usage, box, data, stride, layer_stride) v2: fix whitespace, correct ilo's behavior Reviewed-by: Nicolai Hähnle <[email protected]> Acked-by: Roland Scheidegger <[email protected]>
* mesa: Don't call GenerateMipmap if Width or Height == 0.Kenneth Graunke2016-07-221-0/+5
| | | | | | | | | | | | | | | | | One of the WebGL 2.0 conformance tests is trying to call glGenerateMipmaps with a width and height of 0. With the meta implementation, this generates a "framebuffer attachment incomplete" status, and falls back to the CPU path, calling MapTextureImage. Except that there's no actual texture to map, and we assert fail. There's no work to do in this case. The test expects it to succeed, so just return early with no error and avoid hassling the driver. Cc: [email protected] Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96911 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Get rid of the do_lower_unnormalized_offsets passJason Ekstrand2016-07-224-109/+0
| | | | | | | | | We can do this in NIR now. No need to keep a GLSL pass lying around for it. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* i965/nir: Enable NIR lowering of txf and rect offsetsJason Ekstrand2016-07-221-0/+2
| | | | | | | | | | | | | | This fixes the following piglit tests on gen6+: tex-miplevel-selection textureProjGradOffset 2DRect tex-miplevel-selection textureGradOffset 2DRect tex-miplevel-selection textureGradOffset 2DRectShadow tex-miplevel-selection textureProjGradOffset 2DRect_ProjVec4 tex-miplevel-selection textureProjGradOffset 2DRectShadow Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* gallium: add PIPE_FLUSH_DEFERREDMarek Olšák2016-07-221-1/+1
| | | | | | | | | | | | | There are 2 uses: - Asynchronous flushing for multithreaded drivers. - Return a fence without flushing (mid-command-buffer fence). The driver can defer flushing until fence_finish is called. This is required to make Bioshock Infinite faster, which creates 1000 fences (flushes) per frame. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* i965: fix varying output setupTimothy Arceri2016-07-231-1/+1
| | | | | | | | | | | | Since 7f53fead5c we treat every location as using all four components so we only need special handling for doubles when they cross multiple locations. This fixes a crash in GL45-CTS.enhanced_layouts.varying_locations where the outputs array would overflow when a dmat2 was stored at the max varying location i.e 30. Reviewed-by: Iago Toral Quiroga <[email protected]>
* mesa: Add GL_BGRA_EXT to the list of GenerateMipmap internal formats.Kenneth Graunke2016-07-211-0/+5
| | | | | | | | | | | | | | | | | | | | The GL_EXT_texture_format_BGRA8888 extension specification defines a GL_BGRA_EXT unsized internal format (which is a little odd - usually BGRA is a pixel transfer format). The extension is written against the ES 1.0 specification, so it's a little hard to map, but I believe it's effectively adding it to the table used here, so we should allow it here as well. Note that GL_EXT_texture_format_BGRA8888 is always enabled (dummy_true), so we don't need to check if it's enabled here. This fixes mipmap generation in Skia and ChromeOS. Signed-off-by: Kenneth Graunke <[email protected]> References: https://bugs.chromium.org/p/chromium/issues/detail?id=630371 Reviewed-by: Ian Romanick <[email protected]> Reported-by: Stéphane Marchesin <[email protected]> Cc: [email protected]
* i965: Fix "operation operation" in comment.Kenneth Graunke2016-07-211-1/+1
| | | | | | From the redundant redundant department. Reported-by: Michael Schellenberger Costa <[email protected]>
* i965: Fix shared atomic intrinsics to pay attention to base.Kenneth Graunke2016-07-211-1/+12
| | | | | | Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Include VUE handles for GS with invocations > 1.Kenneth Graunke2016-07-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We always resort to the pull model for instanced GS inputs. So, we'd better include the VUE handles, or else we can't actually pull anything. Ian reports that on his branch with OES_geometry_shader enabled, this fixes a bunch of dEQP-GLES31.functional.geometry_shading tests:: - instanced.draw_2_instances_geometry_2_invocations - instanced.draw_2_instances_geometry_8_invocations - instanced.draw_4_instances_geometry_2_invocations - instanced.draw_4_instances_geometry_8_invocations - instanced.draw_8_instances_geometry_2_invocations - instanced.draw_8_instances_geometry_8_invocations - instanced.geometry_2_invocations - instanced.geometry_32_invocations - instanced.geometry_8_invocations - instanced.geometry_max_invocations - instanced.geometry_output_different_2_invocations - instanced.geometry_output_different_32_invocations - instanced.geometry_output_different_8_invocations - instanced.geometry_output_different_max_invocations - instanced.invocation_output_vary_by_attribute - instanced.invocation_output_vary_by_texture - instanced.invocation_output_vary_by_uniform - query.primitives_generated_instanced Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Tested-by: Ian Romanick <[email protected]>
* i965: print error messages if gs fails to compileTimothy Arceri2016-07-211-0/+6
| | | | | | We do this for all other stages. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: enable GL4.4 for Gen8+Timothy Arceri2016-07-212-2/+2
| | | | Acked-by: Kenneth Graunke <[email protected]>
* i965: enable ARB_enhanced_layouts for gen6+Timothy Arceri2016-07-211-1/+1
| | | | Reviewed-by: Edward O'Callaghan <[email protected]>
* i965/vec4: add packing support for tcs load outputsTimothy Arceri2016-07-213-7/+17
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965/vec4: add support for packing tes inputsTimothy Arceri2016-07-211-4/+10
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965/vec4: add support for packing tcs outputsTimothy Arceri2016-07-211-0/+7
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965/vec4: support packing tcs inputsTimothy Arceri2016-07-212-2/+7
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965/vec4: add component packing for gsTimothy Arceri2016-07-211-0/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965/vec4: add support for packing vs/gs/tes outputsTimothy Arceri2016-07-213-4/+45
| | | | | | | | | | | Here we create a new output_generic_reg array with the ability to store the dst_reg for each component of user defined varyings. This is needed as the previous code only stored the dst_reg based on the varying location which meant packed varyings would overwrite each other. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965/vec4: add support for packing inputsTimothy Arceri2016-07-211-0/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: add helper for creating packing writemaskTimothy Arceri2016-07-211-0/+7
| | | | | | | | | | | For example where n=3 first_component=1 this will give us 0xE (WRITEMASK_YZW). V2: Add assert to check first component is <= 4 (Suggested by Ken) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: add helpers for creating component layout swizzleTimothy Arceri2016-07-211-0/+3
| | | | | | | | | This will be used to swizzle components to the beginning or end of the vector based on the component layout qualifier and whether we are doing a load or store. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* i965: enable ARB_enhanced_layouts for gen8+Timothy Arceri2016-07-211-0/+1
| | | | Acked-by: Edward O'Callaghan <[email protected]>
* i965: add component packing support for load_output intrinsicsTimothy Arceri2016-07-211-5/+33
| | | | | | | | Here we use the component qualifier (which is the first component) as an offset when loading output varyings. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: enable component packing for vs and fsTimothy Arceri2016-07-214-25/+16
| | | | | | | | | | | Rather than trying to work out the total number of components used at a location we simply treat all outputs as vec4s. This removes the need for complex code looping over varyings to match packed locations and the need for storing the total number of components used at each location. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: bring back type_size_vec4_times_4()Timothy Arceri2016-07-212-0/+14
| | | | | | | | We will use this for output varyings. To make component packing simpler we will just treat all varyings as vec4s. Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move VS load_input handling to nir_emit_vs_intrinsic().Kenneth Graunke2016-07-201-31/+30
| | | | | | | | | | TCS/TES/GS and now FS all handle these in stage-specific functions. CS don't have inputs, so VS was the only one left using this code. Move it to the VS-specific function for clarity. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Delete the FS_OPCODE_INTERPOLATE_AT_CENTROID virtual opcode.Kenneth Graunke2016-07-204-10/+0
| | | | | | | | | We no longer use this message. As far as I can tell, it's fairly useless - the equivalent information is provided in the payload. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Rewrite FS input handling to use the new NIR intrinsics.Kenneth Graunke2016-07-205-341/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This eliminates the need to walk the list of input variables, recurse into their types (via logic largely redundant with nir_lower_io), and interpolate all possible inputs up front. The backend no longer has to care about variables at all, which eliminates complications from trying to pack multiple variables into the same location. Instead, each intrinsic specifies exactly what's needed. This should unblock Timothy's work on GL_ARB_enhanced_layouts. Each load_interpolated_input intrinsic corresponds to PLN instructions, while load_barycentric_at_* intrinsics correspond to pixel interpolator messages. The pixel/centroid/sample barycentric intrinsics simply refer to payload fields (delta_xy[]), and don't actually generate any code. Because we use a single intrinsic for both centroid-qualified variables and interpolateAtCentroid(), they become indistinguishable. We stop sending pixel interpolator messages for those, and instead use the payload provided data, which should be considerably faster. On Broadwell: total instructions in shared programs: 9067751 -> 9067570 (-0.00%) instructions in affected programs: 145902 -> 145721 (-0.12%) helped: 422 HURT: 209 total spills in shared programs: 2849 -> 2899 (1.76%) spills in affected programs: 760 -> 810 (6.58%) helped: 0 HURT: 10 total fills in shared programs: 3910 -> 3950 (1.02%) fills in affected programs: 617 -> 657 (6.48%) helped: 0 HURT: 10 LOST: 3 GAINED: 3 The differences mostly appear to be slight changes in MOVs. v2: Use nir_shader_compiler_options::use_interpolated_input_intrinsics flag rather than passing it directly to nir_lower_io. Use the unreachable() macro rather than assert in one place. (Review feedback from Chris Forbes.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Acked-by: Jason Ekstrand <[email protected]>