summaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
* glsl: fully split apart buffer block arraysTimothy Arceri2016-04-065-58/+19
| | | | | | | | | | | | With this change we create the UBO and SSBO arrays separately from the beginning rather than putting them into a combined array and splitting it apart later. A bug is with UBO and SSBO stage reference querying is also fixed as we now use the block index to lookup the references in the separate arrays not the combined buffer block array. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965/fs: Move the code for load/store_shared to emit_cs_intrinsicJason Ekstrand2016-04-041-76/+76
| | | | | | | They are compute-shader only and that's where the code for doing atomics on shared variables lives so it seemes to make sense. Reviewed-by: Jordan Justen <[email protected]>
* i965/nir: Provide a default LOD for buffer texturesJason Ekstrand2016-04-042-0/+8
| | | | | | | | | Our hardware requires an LOD for all texelFetch commands even if they are on buffer textures. GLSL IR gives us an LOD of 0 in that case, but the LOD is really rather meaningless. This commit allows other NIR producers to be more lazy and not provide one at all. Reviewed-by: Jordan Justen <[email protected]>
* i965: Fix invalid pointer read in dead_control_flow_eliminate().Kenneth Graunke2016-04-041-0/+4
| | | | | | | | | | | There may not be a previous block. In this case, there's no real work to do, so just continue on to the next one. v2: Update for bblock->prev() API change. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Make bblock_t::next and friends return NULL at sentinels.Kenneth Graunke2016-04-042-1/+13
| | | | | | | | | The bblock_t::prev/prev_const/next/next_const API returns bblock_t pointers, rather than exec_nodes. So it's a bit surprising. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/peephole_ffma: Only match a mul+add if none of the ops are exactJason Ekstrand2016-04-041-0/+11
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Add an INTEL_PRECISE_TRIG=1 option to fix SIN/COS output range.Kenneth Graunke2016-04-044-4/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | The SIN and COS instructions on Intel hardware can produce values slightly outside of the [-1.0, 1.0] range for a small set of values. Obviously, this can break everyone's expectations about trig functions. According to an internal presentation, the COS instruction can produce a value up to 1.000027 for inputs in the range (0.08296, 0.09888). One suggested workaround is to multiply by 0.99997, scaling down the amplitude slightly. Apparently this also minimizes the error function, reducing the maximum error from 0.00006 to about 0.00003. When enabled, fixes 16 dEQP precision tests dEQP-GLES31.functional.shaders.builtin_functions.precision. {cos,sin}.{highp,mediump}_compute.{scalar,vec2,vec4,vec4}. at the cost of making every sin and cos call more expensive (about twice the number of cycles on recent hardware). Enabling this option has been shown to reduce GPUTest Volplosion performance by about 10%. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Allow 8x MSAA on >= 64bpp formats on Gen8+.Kenneth Graunke2016-04-041-1/+2
| | | | | | | | | | | See commit 3b0279a69 - this restriction is documented in the "Surface Format" field of RENDER_SURFACE_STATE. Looking at newer documentation, this restriction appears to exist on Haswell, but no longer applies on Gen8+. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* mesa/get: fix MAX_GEOMETRY_SHADER_STORAGE_BLOCKSDave Airlie2016-04-041-1/+1
| | | | | | | this was returning the fragment shader value. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa: expose EXT_base_instance in ES3 contextsIlia Mirkin2016-04-033-1/+7
| | | | | | | | This extension is identical to ARB_base_instance. Reuse the same entrypoints. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: expose EXT_polygon_offset_clamp in ES contextsIlia Mirkin2016-04-033-4/+10
| | | | | | | | The extension spec was extended to also support ES. This functionality is provided all the way back to ES 1.0. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: add always-false-for-now enables for GL 4.3, 4.4, 4.5.Ilia Mirkin2016-04-031-2/+49
| | | | | | | | | As the relevant extensions get implemented, the lines should be uncommented. I believe this is (almost) everything needed for those GL versions though. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: add ES3_1_compatibility extension enableIlia Mirkin2016-04-032-0/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: remove unrequired elseTimothy Arceri2016-04-031-42/+39
| | | | | | The if always returns so no need for an else. Reviewed-by: Brian Paul <[email protected]>
* glsl: store stage reference in gl_uniform_blockTimothy Arceri2016-04-023-15/+4
| | | | | | | | | This allows us to simplify the code and drop InterfaceBlockStageIndex which is a per stage array of integers the size of all blocks in the program combined including duplicates across stages. Adding a stage ref per block will use less memory. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Fix prorgram interface query locations biasing for SSO.Kenneth Graunke2016-04-011-8/+3
| | | | | | | | | | | | | | | | | | | | | With SSO, the GL_PROGRAM_INPUT and GL_PROGRAM_OUTPUT interfaces refer to the first and last shader stage linked into a program. This may not be the vertex and fragment shader stages. So, subtracting VERT_ATTRIB_GENERIC0 and FRAG_RESULT_DATA0 is bogus. We need to subtract VERT_ATTRIB_GENERIC0 for VS inputs, FRAG_RESULT_DATA0 for FS outputs, and VARYING_SLOT_VAR0 for other cases. Note that built-in variables get a location of -1. Fixes 4 dEQP-GLES31.functional.program_interface_query tests: - program_input.location.separable_fragment.var_explicit_location - program_input.location.separable_fragment.var_array_explicit_location - program_output.location.separable_vertex.var_array_explicit_location - program_output.location.separable_vertex.var_array_explicit_location Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Return -1 for program interface query locations in many cases.Kenneth Graunke2016-04-011-53/+9
| | | | | | | | | | | | | | | | | | | We were recording locations for all variables, even ones without an explicit location set. Implement the rules from the spec, and record -1 in the resource list accordngly. Make program_resource_location stop doing math on negative values. Remove hacks that are no longer necessary now that we've stopped doing that. Fixes 4 dEQP-GLES31.functional.program_interface_query tests: - program_input.location.separable_fragment.var - program_input.location.separable_fragment.var_array - program_output.location.separable_vertex.var_array - program_output.location.separable_vertex.var_array v2: Delete more code Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Consolidate gl_VertexIDMESA -> gl_VertexID query hacks.Kenneth Graunke2016-04-011-17/+0
| | | | | | | | | A program will either have gl_VertexID or gl_VertexIDMESA (the lowered zero-based version), not both. Just spoof it in the resource list so the hacks are done in a single place. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* glsl: Add all system variables to the input resource list.Kenneth Graunke2016-04-011-8/+1
| | | | | | | | | | | | | | | | | | | | System values are just built-in input variables that we've opted to special-case out of convenience. We need to consider all inputs, regardless of how we've classified them. Unfortunately, there's one exception: we shouldn't add gl_BaseVertex unless ARB_shader_draw_parameters is enabled, because it doesn't actually exist in the language, and shouldn't be counted in the GL_ACTIVE_RESOURCES query. Fixes dEQP-GLES31.functional.program_interface_query.program_input. resource_list.compute.empty, which expects gl_NumWorkGroups to appear in the resource list. v2: Delete more code Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* mesa: Make _mesa_choose_tex_format() handle stencil textures.Kenneth Graunke2016-04-011-0/+5
| | | | | | | | This is necessary for ARB_texture_stencil8 support on classic drivers. Presumably Gallium works because it implements its own ChooseTexFormat. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* gallium: distinguish between shader IR in get_compute_paramBas Nieuwenhuizen2016-04-021-6/+7
| | | | | | | | | | | | | For radeonsi, native and TGSI use different compilers and this results in different limits for different IR's. The set we strictly need for radeonsi is only the MAX_BLOCK_SIZE and MAX_THREADS_PER_BLOCK params, but I added a few others as shader related that seemed like they would also typically depend on the compiler. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* gallium: add threads per block TGSI propertyBas Nieuwenhuizen2016-04-021-0/+18
| | | | | | | | | | The value 0 for unknown has been chosen to so that drivers using tgsi_scan_shader do not need to detect missing properties if they zero-initialize the struct. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* gallium: add compute shader IR typeBas Nieuwenhuizen2016-04-021-0/+1
| | | | | | Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* i965: Add an implemnetation of nir_op_fquantize2f16Jason Ekstrand2016-04-012-0/+53
| | | | Reviewed-by: Matt Turner <[email protected]>
* Android: fix x86 gallium buildsRob Herring2016-04-015-5/+55
| | | | | | | | | | | | | | | Builds with gallium enabled fail on x86 with linker error: external/mesa3d/src/mesa/vbo/vbo_exec_array.c:127: error: undefined reference to '_mesa_uint_array_min_max' The problem is sse_minmax.c is not included in the libmesa_st_mesa library. Since the SSE4.1 files are needed for both libmesa_st_mesa and libmesa_dricore, move SSE4.1 files into a separate static library that can be used by both. Cc: "11.1 11.2" <[email protected]> Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* mesa: add GL_OES/EXT_draw_buffers_indexed supportIlia Mirkin2016-03-312-0/+12
| | | | | | | | This is the same ext as ARB_draw_buffers_blend (plus some core functionality that already exists). Add the alias entrypoints. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* i965: Use brw->urb.min_vs_urb_entries instead of 32 for BLORP.Kenneth Graunke2016-03-311-4/+1
| | | | | | | | | | | Haswell GT2 and GT3 have a minimum of 64 entries. Hardcoding 32 is not legal. v2: Delete stale comment (caught by Alejandro). Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Fix textureSize() depth value for 1 layer surfaces on Gen4-6.Kenneth Graunke2016-03-312-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | According to the Sandybridge PRM's description of the resinfo message, the .z value returned will be Depth == 0 ? 0 : Depth + 1. The earlier PRMs have the same table. This means we return 0 for array textures with a single slice, when we ought to return 1. Just override it to max(depth, 1). Fixes 10 dEQP-GLES3.functional tests on Sandybridge: shaders.texture_functions.texturesize.sampler2darray_fixed_vertex shaders.texture_functions.texturesize.sampler2darray_fixed_fragment shaders.texture_functions.texturesize.sampler2darray_float_vertex shaders.texture_functions.texturesize.sampler2darray_float_fragment shaders.texture_functions.texturesize.isampler2darray_vertex shaders.texture_functions.texturesize.isampler2darray_fragment shaders.texture_functions.texturesize.usampler2darray_vertex shaders.texture_functions.texturesize.usampler2darray_fragment shaders.texture_functions.texturesize.sampler2darrayshadow_vertex shaders.texture_functions.texturesize.sampler2darrayshadow_fragment Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* ptn: Fix all users of ptn_swizzleIan Romanick2016-03-311-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | None of the callers actually wanted what it did. In ptn_xpd, you only ever want a vec3 swizzle. In ptn_tex, you want a swizzle that matches the number of required texture coordinates. shader-db results: G45: total instructions in shared programs: 4011240 -> 4010911 (-0.01%) instructions in affected programs: 59232 -> 58903 (-0.56%) helped: 114 HURT: 0 total cycles in shared programs: 84314194 -> 84313220 (-0.00%) cycles in affected programs: 779150 -> 778176 (-0.13%) helped: 110 HURT: 13 Ironlake: total instructions in shared programs: 6397262 -> 6396605 (-0.01%) instructions in affected programs: 117402 -> 116745 (-0.56%) helped: 227 HURT: 0 total cycles in shared programs: 128889798 -> 128888524 (-0.00%) cycles in affected programs: 1214644 -> 1213370 (-0.10%) helped: 179 HURT: 44 Sandy Bridge: total instructions in shared programs: 8467391 -> 8467384 (-0.00%) instructions in affected programs: 3107 -> 3100 (-0.23%) helped: 10 HURT: 6 total cycles in shared programs: 117580120 -> 117573448 (-0.01%) cycles in affected programs: 103158 -> 96486 (-6.47%) helped: 84 HURT: 11 Ivy Bridge: total instructions in shared programs: 7774255 -> 7774258 (0.00%) instructions in affected programs: 1677 -> 1680 (0.18%) helped: 8 HURT: 6 total cycles in shared programs: 65743828 -> 65739190 (-0.01%) cycles in affected programs: 89312 -> 84674 (-5.19%) helped: 78 HURT: 23 Haswell: total instructions in shared programs: 7107172 -> 7107150 (-0.00%) instructions in affected programs: 2048 -> 2026 (-1.07%) helped: 16 HURT: 0 total cycles in shared programs: 64653636 -> 64647486 (-0.01%) cycles in affected programs: 86836 -> 80686 (-7.08%) helped: 85 HURT: 17 Broadwell and Skylake: total instructions in shared programs: 8447529 -> 8447507 (-0.00%) instructions in affected programs: 2038 -> 2016 (-1.08%) helped: 16 HURT: 0 total cycles in shared programs: 66418670 -> 66413416 (-0.01%) cycles in affected programs: 90110 -> 84856 (-5.83%) helped: 83 HURT: 20 Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ptn: Silence unused parameter warningIan Romanick2016-03-311-2/+2
| | | | | | | | | | | | | | The KIL instruction doesn't have a destination, so ptn_kil never uses dest. program/prog_to_nir.c: In function ‘ptn_kil’: program/prog_to_nir.c:547:38: warning: unused parameter ‘dest’ [-Wunused-parameter] ptn_kil(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src) ^ Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: add GL_EXT_copy_image supportIlia Mirkin2016-03-301-0/+1
| | | | | | | | The extension is identical to GL_OES_copy_image. But dEQP has tests that want the EXT variant. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: add GL_OES_copy_image supportIlia Mirkin2016-03-306-1/+128
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: remove duplicate MAX_GEOMETRY_SHADER_INVOCATIONS entryIlia Mirkin2016-03-301-3/+0
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* st/mesa: add ES sample-shading supportIlia Mirkin2016-03-301-0/+6
| | | | | | | | | We require the full ARB_gpu_shader5 for now, but in the future some other CAP could get exposed to indicate that only the multisample-related behavior of ARB_gpu_shader5 is available. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* mesa: add GL_OES_shader_multisample_interpolation supportIlia Mirkin2016-03-303-3/+14
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: add GL_OES_sample_shading supportIlia Mirkin2016-03-304-3/+8
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* mesa: add OES_sample_variables to extension table, add enable bitIlia Mirkin2016-03-302-0/+2
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Dave Airlie <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Don't add barrier deps for FB write messages.Matt Turner2016-03-301-1/+2
| | | | | | | Ken did this earlier, and this is just me reimplementing his patch a little differently. Reviewed-by: Francisco Jerez <[email protected]>
* i965: Add and use is_scheduling_barrier() function.Matt Turner2016-03-301-4/+17
|
* i965: Remove NOP insertion kludge in scheduler.Matt Turner2016-03-301-20/+5
| | | | | | | | | | | Instead of removing every instruction in add_insts_from_block(), just move the instruction to its scheduled location. This is a step towards doing both bottom-up and top-down scheduling without conflicts. Note that this patch changes cycle counts for programs because it begins including control flow instructions in the estimates. Reviewed-by: Francisco Jerez <[email protected]>
* i965: Assert that an instruction is not inserted around itself.Matt Turner2016-03-301-0/+4
| | | | Reviewed-by: Francisco Jerez <[email protected]>
* i965: Relax restriction on scheduling last instruction.Matt Turner2016-03-301-20/+3
| | | | | | | | | | | | | | | | | | | | | | I think when this code was written, basic blocks were always ended by a control flow instruction or an end-of-thread message. That's no longer the case, and removing this restriction actually helps things: instructions in affected programs: 7267 -> 7244 (-0.32%) helped: 4 total cycles in shared programs: 66559580 -> 66431900 (-0.19%) cycles in affected programs: 28310152 -> 28182472 (-0.45%) helped: 9577 HURT: 879 GAINED: 2 The addition of the is_control_flow() checks is not a functional change, since the add_insts_from_block() does not put them in the list of instructions to schedule. I plan to change this in a later patch. Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4/tcs: Set conditional mod on TCS_OPCODE_SRC0_010_IS_ZERO.Matt Turner2016-03-302-2/+3
| | | | | | | | | | | | | | | | | | Missing this causes an assertion failure in the scheduler with the next patch. Additionally, this gives cmod propagation enough information to optimize code better. total instructions in shared programs: 7112991 -> 7112852 (-0.00%) instructions in affected programs: 25704 -> 25565 (-0.54%) helped: 139 total cycles in shared programs: 64812898 -> 64810674 (-0.00%) cycles in affected programs: 127224 -> 125000 (-1.75%) helped: 139 Acked-by: Francisco Jerez <[email protected]>
* Revert "i965: Don't add barrier deps for FB write messages."Matt Turner2016-03-301-4/+3
| | | | | | | | | | | | | | This reverts commit d0e1d6b7e27bf5f05436e47080d326d7daa63af2. The change in the vec4 code is a mistake -- there's never an FS_OPCODE_FB_WRITE in vec4 code. The change in the fs code had the (harmless) effect of not recognizing an FB_WRITE as a scheduling barrier even if it was marked EOT -- harmless because the scheduler marked the last instruction of a block as a barrier, something I'm changing in the following patches. This will be reimplemented later in the series.
* i965: Simplify full scheduling-barrier conditions.Matt Turner2016-03-301-27/+8
| | | | | | | All of these were simply code for "architecture register file" (and in the case of destinations, "not the null register"). Reviewed-by: Francisco Jerez <[email protected]>
* i965: Remove incorrect cycle estimates.Matt Turner2016-03-301-10/+0
| | | | | | | | These printed the cycle count the last basic block (sched.time is set per basic block!). We have accurate, full program, data printed elsewhere. Reviewed-by: Francisco Jerez <[email protected]>
* st/mesa: fix fallout from xfb changes.Dave Airlie2016-03-311-2/+2
| | | | | | | Failed to update state tracker with new buffer interface. Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* mesa: add query support for GL_TRANSFORM_FEEDBACK_BUFFER interfaceTimothy Arceri2016-03-313-2/+51
| | | | Reviewed-by: Dave Airlie <[email protected]>
* glsl: add transform feedback buffers to resource listTimothy Arceri2016-03-313-3/+3
| | | | Reviewed-by: Dave Airlie <[email protected]>
* mesa: add support to query GL_TRANSFORM_FEEDBACK_BUFFER_INDEXTimothy Arceri2016-03-312-0/+7
| | | | Reviewed-by: Dave Airlie <[email protected]>