aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa
Commit message (Collapse)AuthorAgeFilesLines
...
* glsl: move xfb BufferStride into gl_transform_feedback_infoTimothy Arceri2016-09-241-2/+3
| | | | | | | | It makes more sense to have this here where we store the other values from xfb qualifiers. The struct it was previously part of is now only used to store values that come from the api. Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: Enable EGL_KHR_gl_texture_3D_imageAdam Jackson2016-09-231-0/+3
| | | | | Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* i915: Enable EGL_KHR_gl_texture_3D_imageAdam Jackson2016-09-231-0/+3
| | | | | Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* i965: get rid of duplicated values from gen_device_infoLionel Landwerlin2016-09-2326-79/+71
| | | | | | | | Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/i965: make gen_device_info mutableLionel Landwerlin2016-09-2318-53/+52
| | | | | | | | | | | | Make gen_device_info a mutable structure so we can update the fields that can be refined by querying the kernel (like subslices and EU numbers). This patch does not make any functional change, it just makes gen_get_device_info() fill a structure rather than returning a const pointer. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* st/mesa: turn on OES_viewport_array when dependencies are metIlia Mirkin2016-09-221-0/+5
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* mesa: add implementations for new float depth functionsIlia Mirkin2016-09-221-1/+18
| | | | | | | | This just up-converts them to doubles. Not great, but this is what all the other variants also do. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: move ARB_viewport_array params to a GLES 3.1-accessible sectionIlia Mirkin2016-09-221-6/+6
| | | | | | | This is needed for GL_OES_viewport_array. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: add GL_OES_viewport_array to the extension stringIlia Mirkin2016-09-222-0/+2
| | | | | | | | | The expectation is that drivers will set this based on OES_geometry_shader and ARB_viewport_array support. This is a separate enable on the same reasoning as for OES_texture_cube_map_array. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: add new entrypoints for GL_OES_viewport_arrayIlia Mirkin2016-09-223-0/+29
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <[email protected]> (v1)
* i965: Enable ES 3.2 on Skylake.Kenneth Graunke2016-09-211-1/+2
| | | | | | | | | | | It's already advertised because the version.c extension checks are fulfilled, but we didn't actually claim support, so trying to create a ES 3.2 context would fail. It's all done, and the CTS results look good, so let's turn it on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: implement querying __DRI_IMAGE_ATTRIB_OFFSET.Chuanbo Weng2016-09-211-2/+7
| | | | | | | | | Implement querying this attribute in intelImageExtension and bump version of intelImageExtension. Signed-off-by: Chuanbo Weng <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/ir: Test thread dispatch packing assumptions.Francisco Jerez2016-09-211-0/+30
| | | | | | | | | | | | | | | | Not [originally] intended for upstream. Should cause a GPU hang if some thread is executed with a non-contiguous dispatch mask breaking assumptions of brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or Piglit regressions, while replacing brw_stage_has_packed_dispatch() with a dummy implementation that unconditionally returns true on top of this patch causes multiple GPU hangs. v2: Refactor into a separate function instead of emitting the test code directly from emit_nir_code(), drop VEC4 test and clean up slightly for upstream. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* i965/ir: Pass identity mask to brw_find_live_channel() in the packed ↵Francisco Jerez2016-09-212-3/+11
| | | | | | | | | dispatch case. This avoids emitting a few extra instructions required to take the dispatch mask into account when it's known to be tightly packed. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread ↵Francisco Jerez2016-09-213-0/+65
| | | | | | | | | | | | | | | | | | dispatch. The eliminate_find_live_channel optimization eliminates FIND_LIVE_CHANNEL instructions in cases where control flow is known to be uniform, and replaces them with 'MOV 0', which in turn unblocks subsequent elimination of the BROADCAST instruction frequently used on the result of FIND_LIVE_CHANNEL. This is however not correct in per-sample fragment shader dispatch because the PSD can dispatch a fully unlit sample under certain conditions. Disable the optimization in that case. Reviewed-by: Jason Ekstrand <[email protected]> v2: Add devinfo argument to brw_stage_has_packed_dispatch() to implement hardware generation check.
* i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNELJason Ekstrand2016-09-215-13/+50
| | | | | | | | | | | | | | | | | | | On at least Sky Lake, ce0 does not contain the full story as far as enabled channels goes. It is possible to have completely disabled channels where the corresponding bits in ce0 are 1. In order to get the correct execution mask, you have to mask off those channels which were disabled from the beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on the shader stage. Failure to do so can result in FIND_LIVE_CHANNEL returning a completely dead channel. Signed-off-by: Jason Ekstrand <[email protected]> Cc: Francisco Jerez <[email protected]> [ Francisco Jerez: Fix a couple of typos, add mask register type assertion, clarify reason why ce0 can have bits set for disabled channels, clarify that this may only be a problem when thread dispatch doesn't pack channels tightly in the SIMD thread. Apply same treatment to Align16 path. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/reg: Make brw_sr0_reg take a subnr and return a vec1 regJason Ekstrand2016-09-212-13/+9
| | | | | | | | | | | The state register sr0 is really a collection of dwords not a SIMD8 anything. It's much more convenient for brw_sr0_reg to return the particular dword you're looking for rather than a giant blob you have to massage into what you want. Signed-off-by: Jason Ekstrand <[email protected]> [ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ] Reviewed-by: Francisco Jerez <[email protected]>
* i965: Rename intelScreen to screen.Kenneth Graunke2016-09-2028-170/+170
| | | | | | | | "intelScreen" is wordy and also doesn't fit our style guidelines. "screen" is shorter, which is nice, because we use it fairly often. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965: Rename __DRIScreen pointers to "dri_screen".Kenneth Graunke2016-09-206-83/+85
| | | | | | | | | | | | | | | | I want to use "screen" as the variable name for a struct intel_screen pointer. This means that we can't use it for __DRIscreen pointers. Sometimes we called it "screen", sometimes "sPriv", sometimes "driScrnPriv", and sometimes "psp" (Pointer to Screen Private?). The last one is particularly confusing because we use "psp" to refer to the Gen4 PIPELINED_STATE_POINTERS packet as well. Let's be consistent. "dri_screen" is clear, and it's not used often enough that I'm worried about the verbosity. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* mesa: Implement ARB_shader_viewport_layer_array for i965Dylan Baker2016-09-203-0/+3
| | | | | | | | | | | | | This extension is a combination of AMD_vertex_shader_viewport_index and AMD_vertex_shader_layer, making it rather trivial to implement. For gallium I *think* this needs a new cap because of the addition of support in tessellation evaluation shaders, and since I don't have any hardware to test it on, I've left that for someone else to wire up. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop assertion about buffer offset at draw time.Eric Anholt2016-09-171-11/+0
| | | | | | | | | Given robust access, we should just be returning zeroes if the user gives us a base pointer that's too big, which is what was happens on a release build. This was caught by a webgl conformance test for out-of-bounds draws on servo. Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Move buffers-unmapped earlier in check_valid_to_render().Kenneth Graunke2016-09-161-6/+6
| | | | | | | | | | | This needs to be above the switch on API, as that can return true (valid to render) before this error check even had a chance to run. Fixes ESEXT-CTS.draw_elements_base_vertex_tests.invalid_mapped_bos, which worked before commit 72f1566f90c434c7752d8405193eec68d6743246. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Mathias Fröhlich <[email protected]>
* mesa: Expose GL_CONTEXT_FLAGS in ES 3.2.Kenneth Graunke2016-09-161-3/+5
| | | | | | | Fixes four ES32-CTS.context_flags.* tests. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* mesa: fix glGetFramebufferAttachmentParameteriv w/ on-demand FRONT_BACK allocMarek Olšák2016-09-161-2/+14
| | | | | | | This fixes 66 CTS tests on st/mesa. Cc: 12.0 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* st/mesa: update comment in st_atom_msaa.cBrian Paul2016-09-161-2/+2
| | | | | | The old comment was a copy and paste mistake. Indent another comment. Reviewed-by: Roland Scheidegger <[email protected]>
* st/mesa: only enable MSAA coverage options when we have a MSAA bufferBrian Paul2016-09-162-4/+7
| | | | | | | | | | | | | | | | | Regardless of whether GL_MULTISAMPLE is enabled (it's enabled by default) we should not set the alpha_to_coverage or alpha_to_one flags if the current drawing buffer does not do MSAA. This fixes the new piglit gl-1.3-alpha_to_coverage_nop test. ETQW is a game that enables GL_SAMPLE_ALPHA_TO_COVERAGE without MSAA. Shrubs along the side of roads were invisible because fragments with alpha < 0.5 were being discarded (zero coverage). v2: remove ctx->DrawBuffer != NULL check. Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* glsl: add subpass image type (v2)Dave Airlie2016-09-161-0/+2
| | | | | | | | | | | | | | | | | | SPIR-V/Vulkan have a special image type for input attachments called the subpass type. It has different characteristics than other images types. The main one being it can only be an input image to fragment shaders and loads from it are relative to the frag coord. This adds support for it to the GLSL types. Unfortunately we've run out of space in the sampler dim in types, so we need to use another bit. v2: Fixup subpass input name (Jason) Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* i965: enable ARB_ES3_2_compatibility on gen8+Ilia Mirkin2016-09-151-0/+1
| | | | | | | | Note that ASTC support is not actually mandated for this extension to be exposed. Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/nir: Roll set_default_interpolation into lower_fs_inputsJason Ekstrand2016-09-153-39/+26
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Use NIR for handling forced per-sample interpolationJason Ekstrand2016-09-153-40/+12
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add a flag to lower_io to force "sample" interpolationJason Ekstrand2016-09-153-12/+13
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Use sample interpolation for interpolateAtCentroid in persample modeJason Ekstrand2016-09-151-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | From the ARB_gpu_shader5 spec: The built-in functions interpolateAtCentroid() and interpolateAtSample() will sample variables as though they were declared with the "centroid" or "sample" qualifiers, respectively. When running with persample dispatch forced by the API, we interpolate anything that isn't flat as if it's qualified by "sample". In order to keep interpolateAtCentroid() consistent with the "centroid" qualifier, we need to make interpolateAtCentroid() do sample interpolation instead. Nothing in the GLSL spec guarantees that the result of interpolateAtCentroid is uniform across samples in any way, so this is a perfectly fine thing to do. Fixes 8 of the new dEQP-VK.pipeline.multisample_interpolation.* Vulkan CTS tests that specifically validate consistency between the "sample" qualifier and interpolateAtSample() Signed-off-by: Jason Ekstrand <[email protected]> Cc: "12.0" <[email protected]> Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: check for no matrix change in _mesa_LoadMatrixf()Brian Paul2016-09-151-3/+5
| | | | | | | | | | | | | Some apps issue redundant glLoadMatrixf() calls with the same matrix. Try to avoid setting dirty state in that situation. This reduces the number of constant buffer updates by about half in ET Quake Wars. Tested with Piglit, ETQW, Sauerbraten, Google Earth, etc. Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: Expose RESET_NOTIFICATION_STRATEGY with KHR_robustness.Kenneth Graunke2016-09-152-3/+10
| | | | | | | | | | | | | | | This is supposed to be exposed with the GL_KHR_robustness extension, which we support on ES 2.0 and later. On desktop GL, it's also exposed by GL_ARB_robustness, which is supported by all drivers ("dummy_true"). so we also allow desktop GL. Fixes: - ES32-CTS.robust.robustness.noResetNotification - ES32-CTS.robust.robustness.loseContextOnReset Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* intel/blorp: Stop setting 3DSTATE_DRAWING_RECTANGLEJason Ekstrand2016-09-141-0/+5
| | | | | | | | | | | | The Vulkan driver sets 3DSTATE_DRAWING_RECTANGLE once to MAX_INT x MAX_INT at the GPU initialization time and never sets it again. The GL driver sets it every time the framebuffer changes. Originally, blorp set it to the size of the drawing area but meant we had to set it back in the Vulkan driver. Instead, we can easily just do that in the GL driver's blorp_exec implementation and not set it in blorp core. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel/blorp: Emit 3DSTATE_MULTISAMPLE directlyJason Ekstrand2016-09-141-13/+0
| | | | | | | | | Previously, we relied on a driver hook for 3DSTATE_MULTISAMPLE. However, now that Vulkan and GL use the same sample positions, we can set up 3DSTATE_MULTISAMPLE directly in blorp and delete the driver hook. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* i965/vec4: Assert that pull constant load offsets are 16B-aligned.Francisco Jerez2016-09-141-0/+1
| | | | | | | | Non-16B-aligned pull constant loads are unlikely to be particularly useful given that you can get roughly the same effect by using swizzles on the result. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Assert that ATTR regions are register-aligned.Francisco Jerez2016-09-141-0/+1
| | | | | | | It might be useful to actually handle this once copy propagation becomes smarter about register-misaligned offsets. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Don't spill non-GRF-aligned register regions.Francisco Jerez2016-09-142-2/+5
| | | | | | | | | | | A better fix would be to do something along the lines of the FS back-end spilling code and emit a scratch read before any instruction that overwrites the register to spill partially due to a non-zero sub-register offset. In the meantime mark registers used with a non-zero sub-register offset as no-spill to prevent the spilling code from miscompiling the program. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Fix copy propagation for non-register-aligned regions.Francisco Jerez2016-09-141-3/+6
| | | | | | | | | | | | | | This prevents it from trying to propagate a copy through a register-misaligned region. MOV instructions with a misaligned destination shouldn't be treated as a direct GRF copy, because they only define the destination GRFs partially. Also fix the interference check implemented with is_channel_updated() to consider overlapping regions with different register offset to interfere, since the writemask check implemented in the function is only valid under the assumption that the source and destination regions are aligned component by component. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Compare full register offsets in cmod propagation.Francisco Jerez2016-09-141-1/+1
| | | | | | | | | Cmod propagation would misoptimize the program if the destination offset of the generating instruction wasn't exactly the same as the source region offset of the copy instruction. In preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Assign correct destination offset to rewritten instruction in ↵Francisco Jerez2016-09-141-2/+1
| | | | | | | | | | | | register coalesce. Because the pass already checks that the destination offset of each 'scan_inst' that needs to be rewritten matches 'inst->src[0].offset' exactly, the final offset of the rewritten instruction is just the original destination offset of the copy. This is in preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Don't coalesce registers with overlapping writes not matching the ↵Francisco Jerez2016-09-141-4/+6
| | | | | | | | MOV source. In preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Compare full register offsets in opt_register_coalesce nop move ↵Francisco Jerez2016-09-141-1/+1
| | | | | | | | check. In preparation for adding support for sub-GRF offsets to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Check that the write offsets match when setting dependency controls.Francisco Jerez2016-09-141-0/+2
| | | | | | | | | For simplicity just assume that two writes to the same GRF with different sub-GRF offsets will potentially interfere and break the dependency control chain. This is in preparation for adding sub-GRF offset support to the VEC4 IR. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Change opt_vector_float to keep track of the last offset seen in ↵Francisco Jerez2016-09-141-3/+3
| | | | | | | | | bytes. This simplifies things slightly and makes the pass more correct in presence of sub-GRF offsets. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/vec4: Simplify src/dst_reg to brw_reg conversion by using byte_offset().Francisco Jerez2016-09-141-7/+8
| | | | | | | This should also have the side effect of fixing convert_to_hw_regs() to handle sub-GRF register offsets. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/ir: Update several stale comments.Francisco Jerez2016-09-145-26/+22
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/ir: Don't print ARF subnr values twice.Francisco Jerez2016-09-142-8/+0
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>