summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965: Eliminate brw->tes.prog_data pointer.Kenneth Graunke2016-10-0510-25/+29
| | | | | | | | | | | | Just say no to: - brw->tes.base.prog_data = &brw->tes.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_tes_prog_data as needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Eliminate brw->tcs.prog_data pointer.Kenneth Graunke2016-10-058-25/+28
| | | | | | | | | | | | Just say no to: - brw->tcs.base.prog_data = &brw->tcs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_tcs_prog_data as needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Eliminate brw->vs.prog_data pointer.Kenneth Graunke2016-10-0518-96/+121
| | | | | | | | | | | | Just say no to: - brw->vs.base.prog_data = &brw->vs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_vs_prog_data as needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Introduce downcast helpers for prog_data structures.Kenneth Graunke2016-10-057-62/+66
| | | | | | | Similar to brw_context(...), intel_texture_object(...), and so on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965/sync: Rename awkward variableChad Versace2016-10-051-6/+6
| | | | | | | | | | What is the difference between a 'driver_fence' and a 'fence'? Do the characters 'driver_' add anything helpful? Nope. They do, though, add an extra 7 chars and pull your eyeballs away to ask "huh? what's that?" one microsecond too many. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/sync: Rename intel_syncobj.c -> brw_sync.cChad Versace2016-10-053-2/+2
| | | | | Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/sync: Replace 'intel' prefix with 'brw'Chad Versace2016-10-053-37/+37
| | | | | | | This is yet another patch for the great renaming begun long ago. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/sync: Fix uninitalized usage and leak of mutexChad Versace2016-10-051-2/+12
| | | | | | | | | | | | | | | | | We locked an unitialized mutex in the callstack glClientWaitSync intel_gl_client_wait_sync brw_fence_client_wait_sync because we forgot to initialize it in intel_gl_fence_sync. (The EGLSync codepath didn't have this bug. It initialized the mutex in intel_dri_create_sync). We also forgot to tear down (mtx_destroy) the mutex when destroying the sync object. Cc: [email protected] Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: use L3 data cache for SSBOsLionel Landwerlin2016-10-051-1/+2
| | | | | | | | | | | | | Anv programs the hardware to use L3 data cache if we use either SSBOs or images in the shaders, we can program i965 the same way. gl_shader_program has a bit of a confusing named field with 'NumAtomicBuffers'. It doesn't tell how many buffers are accessed by the shader in an atomic way but instead the number of atomic counters manipulated by the shader. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Drop _NEW_TRANSFORM from 3DSTATE_VS atom on Gen7.Kenneth Graunke2016-10-041-1/+1
| | | | | | | | | | | | | | The atom that uploads push constants listens to _NEW_TRANSFORM for legacy clip plane handling. On Sandybridge, the gen6_vs_state atom emits 3DSTATE_CONSTANT_VS as well as 3DSTATE_VS, so it needs to listen to the same set of conditions. However, it looks like Gen7 doesn't need this. The push constant atom emits 3DSTATE_CONSTANT_VS directly, and the gen7_vs_state atom that emits 3DSTATE_VS doesn't have a dependency on ctx->Transform. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix brw_clear_cache to clean up TCS/TES shaders.Kenneth Graunke2016-10-041-0/+2
| | | | | | | We need to free prog_data for TCS/TES too. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Add missing BRW_CS_PROG_DATA to CS work group surface atom.Kenneth Graunke2016-10-041-2/+5
| | | | | | Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add missing BRW_NEW_CS_PROG_DATA to compute constant atom.Kenneth Graunke2016-10-041-1/+2
| | | | | | | | | CACHE_NEW_CS_PROG hasn't existed in quite a long time...the old comment was there, but not the actual bit. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add missing BRW_NEW_FS_PROG_DATA to render target reads.Kenneth Graunke2016-10-041-1/+3
| | | | | | Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Move BRW_NEW_FRAGMENT_PROGRAM from 3DSTATE_PS to PS_EXTRA.Kenneth Graunke2016-10-041-1/+1
| | | | | | | | | 3DSTATE_PS doesn't need this. 3DSTATE_PS_EXTRA however does, for brw_color_buffer_write_enabled(). Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add missing BRW_NEW_VS_PROG_DATA to 3DSTATE_CLIP.Kenneth Graunke2016-10-041-0/+2
| | | | | | Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Fix missing _NEW_TRANSFORM in Gen8+ 3DSTATE_DS atom.Kenneth Graunke2016-10-041-1/+2
| | | | | | | | | Needed for user clip plane enables. Broken since this code was introduced. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Enable ARB_shader_atomic_counter_opsIan Romanick2016-10-044-6/+50
| | | | | Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Refactor emission of atomic counter operationsIan Romanick2016-10-044-30/+27
| | | | | | | This will make it easier to add more operations. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: fix unused variable warning in brw_emit_gpgpu_walker()Timothy Arceri2016-10-051-2/+1
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: add MAYBE_UNUSED to assert paramTimothy Arceri2016-10-051-1/+1
| | | | | | Fixes unused variable warning in release build. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: wrap unused function in #ifndef NDEBUGTimothy Arceri2016-10-051-0/+2
| | | | | | | This function is only ever used by an assert() this fixes an unused function warning in release builds. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: fix unused variable warning in gen7_block_read_scratch()Timothy Arceri2016-10-051-2/+1
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: add MAYBE_UNUSED to assert paramTimothy Arceri2016-10-051-1/+1
| | | | | | This fixes an unused variable warning on release builds. Reviewed-by: Iago Toral Quiroga <[email protected]>
* anv/gen7: Make use of local variable prog_dataAnuj Phogat2016-10-041-2/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/gen8+: Enable GL_OES_viewport_arrayAnuj Phogat2016-10-042-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch causes 2 regressions in khronos' gles cts tests on various intel platforms. Failing tests: ES3-CTS.functional.state_query.integers.viewport_getinteger ES3-CTS.functional.state_query.integers.viewport_getfloat Here is an explanation of what's causing the failures: CTS tests are not clamping the x, y location of the viewport's bottom-left corner as recommended by ARB_viewport_array and OES_viewport_array: "The location of the viewport's bottom-left corner, given by (x,y), are clamped to be within the implementation-dependent viewport bounds range. The viewport bounds range [min, max] tuple may be determined by calling GetFloatv with the symbolic constant VIEWPORT_BOUNDS_RANGE_OES" Khronos CTS merge request to fix the test case: https://gitlab.khronos.org/opengl/cts/merge_requests/399 V2: Initialize the relevant variables for GL_OES_viewport_array on gen8+ Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965: Only emit 1 viewport when possible.Kenneth Graunke2016-10-0310-30/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | In core profile, we support up to 16 viewports. However, in the majority of cases, only 1 of them is actually used - we only need the others if the last shader stage prior to the rasterizer writes gl_ViewportIndex. Processing all 16 viewports adds additional CPU overhead, which hurts CPU-intensive workloads such as Glamor. This meant that switching to core profile actually penalized Glamor to an extent, which is unfortunate. This patch tracks the number of relevant viewports, switching between 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting viewport state when switching, but hopefully this is offset by doing 1/16th of the work in the common case. The new flag is also lighter weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case. According to Eric Anholt, x11perf -copypixwin10 performance improves by 11.5094% +/- 3.10841% (n=10) on his Skylake. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: rename max_ds_* variable to max_tes_*Timothy Arceri2016-10-034-5/+5
| | | | | | Using consistent naming allows us to create macros more easily. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: rename max_hs_* variables to max_tcs_*Timothy Arceri2016-10-034-5/+5
| | | | | | Using consistent naming allows us to create macros more easily. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop pointless stage == MESA_SHADER_FRAGMENT checks.Kenneth Graunke2016-10-021-5/+1
| | | | | | | There's an assert right above this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i915/i965: remove commented out warningTimothy Arceri2016-10-012-6/+2
| | | | | | | The warning was also the wrong location, it should have been in the else. Reviewed-by: Ian Romanick <[email protected]>
* i965: Remove useless (harmful) assertionBen Widawsky2016-09-281-1/+1
| | | | | | | | | The code already skips doing the depth stall on gen >= 8, and as we enable new platforms this assertion will fail needlessly. Instead of changing the caller, make this simple change. Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* glsl: don't crash when dumping shaders if some come from cacheTimothy Arceri2016-09-281-4/+10
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: create populate key functions for tcs and tesTimothy Arceri2016-09-274-68/+88
| | | | | | These will be used by the on disk shader cache. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: make gs key generation helper available to shader cacheTimothy Arceri2016-09-272-1/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: make vs and fs key generation helpers available to shader cacheCarl Worth2016-09-274-2/+10
| | | | | Signed-off-by: Timothy Arceri <[email protected]> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
* i965: stop passing stage as a function parameterTimothy Arceri2016-09-261-5/+3
| | | | | | We already pass the shader so we can just get the stage from this. Reviewed-by: Jason Ekstrand <[email protected]>
* osmesa: Unbind the current context when given a null context and buffer.Emilio Cobos Álvarez2016-09-231-0/+7
| | | | | | | This is needed to be consistent with other drivers. Signed-off-by: Emilio Cobos Álvarez <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* i965: Enable EGL_KHR_gl_texture_3D_imageAdam Jackson2016-09-231-0/+3
| | | | | Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* i915: Enable EGL_KHR_gl_texture_3D_imageAdam Jackson2016-09-231-0/+3
| | | | | Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Adam Jackson <[email protected]>
* i965: get rid of duplicated values from gen_device_infoLionel Landwerlin2016-09-2326-79/+71
| | | | | | | | Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/i965: make gen_device_info mutableLionel Landwerlin2016-09-2318-53/+52
| | | | | | | | | | | | Make gen_device_info a mutable structure so we can update the fields that can be refined by querying the kernel (like subslices and EU numbers). This patch does not make any functional change, it just makes gen_get_device_info() fill a structure rather than returning a const pointer. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.Eric Anholt2016-09-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VC4 was running into a major performance regression from enabling control flow in the glmark2 conditionals test, because of short if statements containing an ffract. This pass seems like it was was trying to ensure that we only flattened IFs that should be entirely a win by guaranteeing that there would be fewer bcsels than there were MOVs otherwise. However, if the number of ALU ops is small, we can avoid the overhead of branching (which itself costs cycles) and still get a win, even if it means moving real instructions out of the THEN/ELSE blocks. For now, just turn on aggressive flattening on vc4. i965 will need some tuning to avoid regressions. It does looks like this may be useful to replace freedreno code. Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47 fps to 95 fps on vc4. vc4 shader-db: total instructions in shared programs: 101282 -> 99543 (-1.72%) instructions in affected programs: 17365 -> 15626 (-10.01%) total uniforms in shared programs: 31295 -> 31172 (-0.39%) uniforms in affected programs: 3580 -> 3457 (-3.44%) total estimated cycles in shared programs: 225182 -> 223746 (-0.64%) estimated cycles in affected programs: 26085 -> 24649 (-5.51%) v2: Update shader-db output. Reviewed-by: Ian Romanick <[email protected]> (v1)
* i965: Enable ES 3.2 on Skylake.Kenneth Graunke2016-09-211-1/+2
| | | | | | | | | | | It's already advertised because the version.c extension checks are fulfilled, but we didn't actually claim support, so trying to create a ES 3.2 context would fail. It's all done, and the CTS results look good, so let's turn it on. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: implement querying __DRI_IMAGE_ATTRIB_OFFSET.Chuanbo Weng2016-09-211-2/+7
| | | | | | | | | Implement querying this attribute in intelImageExtension and bump version of intelImageExtension. Signed-off-by: Chuanbo Weng <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965/ir: Test thread dispatch packing assumptions.Francisco Jerez2016-09-211-0/+30
| | | | | | | | | | | | | | | | Not [originally] intended for upstream. Should cause a GPU hang if some thread is executed with a non-contiguous dispatch mask breaking assumptions of brw_stage_has_packed_dispatch(). Doesn't cause any CTS, DEQP or Piglit regressions, while replacing brw_stage_has_packed_dispatch() with a dummy implementation that unconditionally returns true on top of this patch causes multiple GPU hangs. v2: Refactor into a separate function instead of emitting the test code directly from emit_nir_code(), drop VEC4 test and clean up slightly for upstream. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* i965/ir: Pass identity mask to brw_find_live_channel() in the packed ↵Francisco Jerez2016-09-212-3/+11
| | | | | | | | | dispatch case. This avoids emitting a few extra instructions required to take the dispatch mask into account when it's known to be tightly packed. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/ir: Skip eliminate_find_live_channel() for stages with sparse thread ↵Francisco Jerez2016-09-213-0/+65
| | | | | | | | | | | | | | | | | | dispatch. The eliminate_find_live_channel optimization eliminates FIND_LIVE_CHANNEL instructions in cases where control flow is known to be uniform, and replaces them with 'MOV 0', which in turn unblocks subsequent elimination of the BROADCAST instruction frequently used on the result of FIND_LIVE_CHANNEL. This is however not correct in per-sample fragment shader dispatch because the PSD can dispatch a fully unlit sample under certain conditions. Disable the optimization in that case. Reviewed-by: Jason Ekstrand <[email protected]> v2: Add devinfo argument to brw_stage_has_packed_dispatch() to implement hardware generation check.
* i965/fs: Take Dispatch/Vector mask into account in FIND_LIVE_CHANNELJason Ekstrand2016-09-215-13/+50
| | | | | | | | | | | | | | | | | | | On at least Sky Lake, ce0 does not contain the full story as far as enabled channels goes. It is possible to have completely disabled channels where the corresponding bits in ce0 are 1. In order to get the correct execution mask, you have to mask off those channels which were disabled from the beginning by taking the AND of ce0 with either sr0.2 or sr0.3 depending on the shader stage. Failure to do so can result in FIND_LIVE_CHANNEL returning a completely dead channel. Signed-off-by: Jason Ekstrand <[email protected]> Cc: Francisco Jerez <[email protected]> [ Francisco Jerez: Fix a couple of typos, add mask register type assertion, clarify reason why ce0 can have bits set for disabled channels, clarify that this may only be a problem when thread dispatch doesn't pack channels tightly in the SIMD thread. Apply same treatment to Align16 path. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/reg: Make brw_sr0_reg take a subnr and return a vec1 regJason Ekstrand2016-09-212-13/+9
| | | | | | | | | | | The state register sr0 is really a collection of dwords not a SIMD8 anything. It's much more convenient for brw_sr0_reg to return the particular dword you're looking for rather than a giant blob you have to massage into what you want. Signed-off-by: Jason Ekstrand <[email protected]> [ Francisco Jerez: Trivial simplification of brw_ud1_reg(). ] Reviewed-by: Francisco Jerez <[email protected]>