aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965: Add Atom graphics names to parse_devid_override()Matt Turner2017-10-041-0/+3
|
* mesa: Remove force_s3tc_enable driconf variableMatt Turner2017-10-023-5/+0
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* mesa: Drop Mesa_DXTn from gl_contextMatt Turner2017-10-026-31/+10
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Implement ARB_indirect_parameters.Plamena Manolova2017-10-024-1/+124
| | | | | | | | | | | We can implement ARB_indirect_parameters for i965 by taking advantage of the conditional rendering mechanism. This works by issuing maxdrawcount draw calls and using conditional rendering to predicate each of them with "drawcount > gl_DrawID" Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Refactor brw_try_draw_prims.Plamena Manolova2017-10-021-117/+119
| | | | | | | | | | | In order to add our ARB_indirect_parameters implementation we need to refactor brw_try_draw_prims so that it operates on a per primitive basis and move the loop into brw_draw_prims. This commit refactors the brw_try_draw_prims function and renames it to brw_draw_single_prim. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Indroduce brw_finish_drawing.Plamena Manolova2017-10-021-7/+14
| | | | | | | | | | | In order to add our ARB_indirect_parameters implementation we need to refactor brw_try_draw_prims so that it operates on a per primitive basis and move the loop into brw_draw_prims. This commit introduces the brw_finish_drawing function where we move the code that executes once after the loop. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Introduce brw_prepare_drawing.Plamena Manolova2017-10-021-19/+27
| | | | | | | | | | | In order to add our ARB_indirect_parameters implementation we need to refactor brw_try_draw_prims so that it operates on a per primitive basis and move the loop into brw_draw_prims. This commit introduces the brw_prepare_drawing function where we move the code that executes once before the loop. Signed-off-by: Plamena Manolova <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: skip reading unused slots at the begining of the URB for the FSIago Toral Quiroga2017-10-021-10/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can start reading the URB at the first offset that contains varyings that are actually read in the URB. We still need to make sure that we read at least one varying to honor hardware requirements. This helps alleviate a problem introduced with 99df02ca26f61 for separate shader objects: without separate shader objects we assign locations sequentially, however, since that commit we have changed the method for SSO so that the VUE slot assigned depends on the number of builtin slots plus the location assigned to the varying. This fixed layout is intended to help SSO programs by avoiding on-the-fly recompiles when swapping out shaders, however, it also means that if a varying uses a large location number close to the maximum allowed by the SF/FS units (31), then the offset introduced by the number of builtin slots can push the location outside the range and trigger an assertion. This problem is affecting at least the following CTS tests for enhanced layouts: KHR-GL45.enhanced_layouts.varying_array_components KHR-GL45.enhanced_layouts.varying_array_locations KHR-GL45.enhanced_layouts.varying_components KHR-GL45.enhanced_layouts.varying_locations which use SSO and the the location layout qualifier to select such location numbers explicitly. This change helps these tests because for SSO we always have to include things such as VARYING_SLOT_CLIP_DIST{0,1} even if the fragment shader is very unlikely to read them, so by doing this we free builtin slots from the fixed VUE layout and we avoid the tests to crash in this scenario. Of course, this is not a proper fix, we'd still run into problems if someone tries to use an explicit max location and read gl_ViewportIndex, gl_LayerID or gl_CullDistancein in the FS, but that would be a much less common bug and we can probably wait to see if anyone actually runs into that situation in a real world scenario before making the decision that more aggresive changes are required to support this without reverting 99df02ca26f61. v2: - Add a debug message when we skip clip distances (Ilia) - we also need to account for this when we compute the urb setup for the fragment shader stage, so add a compiler util to compute the first slot that we need to read from the URB instead of replicating the logic in both places. v3: - Make the util more generic so it can account for all unused slots at the beginning of the URB, that will make it more useful (Ken). - Drop the debug message, it was not what Ilia was asking for. Suggested-by: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/link: Use prog->nir instead of creating a temporaryJason Ekstrand2017-09-281-4/+3
| | | | | | | | | This way, when NIR_PASS_V makes a clone of the shader (for testing nir_clone), the new and lowered version gets re-assigned to prog->nir. [[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/link: Make more use of NIR_PASSJason Ekstrand2017-09-281-6/+6
| | | | | | [[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/link: Make better use of temporary variablesJason Ekstrand2017-09-281-4/+5
| | | | | | | | | | | The way NIR_PASS works (and, by extension, nir_optimize) is that they may clone the shader and throw the old one away. (We use this for testing nir_clone.) It's better if we just make a temporary variable, use it for everything, and re-assign to the gl_program at the end. [[email protected]: Tested NIR_TEST_CLONE=1 with valgrind] Tested-by: Jordan Justen <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: enable up to 32 inputs for geometry shaders in gen8+Iago Toral Quiroga2017-09-281-1/+2
| | | | | | | | | | | | | | | | | | | | | We have been exposing only 16 since 1e3e72e3054de with arguments based on register pressure and the number of available GRFs, however, our scalar backend will always limit the number of push registers for GS threads to 24 and fallback to pull model for anything else, so there is really no reason to lower the number under those arguments. By bumping this up to 32 we make it the same as all the other stages, which is a nice feature to have that can help applications in some cases (I recently fixed a bug in CTS that assumed that the number of input locations in a stage matches the number of output locations in the previous stage for example). Pre-gen8, we use the vector backend and push model, so in that case the arguments in 1e3e72e3054de are still valid. v2: check if we have scalar GS instead of the hw gen to enable this (Ken). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Convert brw->*_program into a brw->programs[i] array.Kenneth Graunke2017-09-2622-126/+147
| | | | | | This makes it easier to loop over programs. Reviewed-by: Alejandro Piñeiro <[email protected]>
* i965: make use of nir linkingTimothy Arceri2017-09-261-0/+56
| | | | | | | | | | | | | | | | | | | | | For now linking is just removing unused varyings between stages. shader-db results BDW: total instructions in shared programs: 13198288 -> 13191693 (-0.05%) instructions in affected programs: 48325 -> 41730 (-13.65%) helped: 473 HURT: 0 total cycles in shared programs: 541184926 -> 541159260 (-0.00%) cycles in affected programs: 213238 -> 187572 (-12.04%) helped: 435 HURT: 8 V2: - lower indirects on demoted inputs as well as outputs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: call brw_shader_gather_info() from the callers of brw_create_nir()Timothy Arceri2017-09-262-7/+18
| | | | | | | This will allow us to insert a nir linking step in brw_link_shader(). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: create a brw_shader_gather_info() helperTimothy Arceri2017-09-262-7/+16
| | | | | | | | This will help us call gather info at a later point and allow us to do some linking in nir. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eduardo Lima Mitev <[email protected]>
* i965: Rename do_flush_locked to submit_batch().Kenneth Graunke2017-09-251-3/+4
| | | | | | | do_flush_locked isn't a great name - especially given that there's no locking going on in our code relating to execbuf. Reviewed-by: Chris Wilson <[email protected]>
* i965: Use atomic ops in get_new_program_id().Kenneth Graunke2017-09-252-6/+1
| | | | | | | | | | | | | We have a nice utility function for this, which eliminates the need for locking stuff. This isn't really performance critical, but it's less code to use the atomic. p_atomic_inc_return does pre-increment rather than post-increment, so we change screen->program_id to be initialized to 0 instead of 1. At which point, we can just delete the initialization because intel_screen is rzalloc'd. Reviewed-by: Chris Wilson <[email protected]>
* i965: Convert brw_bufmgr to use C11 mutexes instead of pthreads.Kenneth Graunke2017-09-251-18/+17
| | | | | | | There's no real advantage or disadvantage here, it's just for stylistic consistency with the rest of the codebase. Reviewed-by: Chris Wilson <[email protected]>
* i965: Delete dead meta stencil blit program fields from brw_context.Kenneth Graunke2017-09-251-3/+0
| | | | These have been unused for a while now.
* i965: Force outputs_written to contain varyings needed by stream-out.Kenneth Graunke2017-09-211-3/+6
| | | | | | | | | | | | | If transform feedback is recording a varying, it needs a slot in the VUE map, regardless of whether or not the shader writes it. Together with the previous patch, this fixes: - KHR-GL45.enhanced_layouts.xfb_capture_struct The test captures a structure where the vertex shader writes the first and third members - but the second still needs a slot. Reviewed-by: Juan A. Suarez Romero <[email protected]>
* i965: Compute VS/GS output VUE map from the NIR info.Kenneth Graunke2017-09-212-2/+2
| | | | | | | | | | unify_interfaces() only updates the NIR program info, not the copy in the gl_program itself. So, by using the old copy, we were missing out on these updates. The TCS/TES ones already did this correctly. Reviewed-by: Juan A. Suarez Romero <[email protected]>
* i965: Fix brw_finish_batch to grow the batchbuffer.Kenneth Graunke2017-09-211-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | brw_finish_batch emits commands needed at the end of every batch buffer, including any workarounds. In the past, we freed up some "reserved" batch space before calling it, so we would never have to flush during it. This was error prone and easy to screw up, so I deleted it a while back in favor of growing the batch. There were two problems: 1. We're in the middle of flushing, so brw->no_batch_wrap is guaranteed not to be set. Using BEGIN_BATCH() to emit commands would cause a recursive flush rather than growing the buffer as intended. 2. We already recorded the throttling batch before growing, which replaces brw->batch.bo with a different (larger) buffer. So growing would break throttling. These are easily remedied by shuffling some code around and whacking brw->no_batch_wrap in brw_finish_batch(). This also now includes the final workarounds in the batch usage statistics. Found by inspection. Fixes: 2c46a67b4138631217141f (i965: Delete BATCH_RESERVED handling.) Reviewed-by: Chris Wilson <[email protected]>
* i965: Move MI_BATCHBUFFER_END handling into brw_finish_batch().Kenneth Graunke2017-09-211-7/+7
| | | | | | This is, by definition, finishing the batch. Reviewed-by: Chris Wilson <[email protected]>
* nv20: Enable ARB_texture_border_clampIlia Mirkin2017-09-211-1/+28
| | | | | | | | | | Fixes quite a few 'texwrap [12]d border color only' tests on NV20 (10de:0201). All told, 40 more tests pass. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Ian RomanicK <[email protected]> Reviewed-by: Francisco Jerez <[email protected]> Tested-by: Ian RomanicK <[email protected]>
* nv20: Fix GL_CLAMPIan Romanick2017-09-212-3/+32
| | | | | | | | | | | | | | | | | | | | v2: Force T and R wrap modes to GL_CLAMP_TO_EDGE for 1D textures. This fixes a regression in tex1d-2dborder. The test uses a 1D texture but it provides S and T texture coordinates. Since the T wrap mode would (correctly) be set to GL_CLAMP, the texture would gradually blend (incorrectly) with the border color. I also tried setting NV20_3D_TEX_FORMAT_DIMS_1D instead of NV20_3D_TEX_FORMAT_DIMS_2D for 1D textures, but that did not help. It is possible that the same problem exists for 2D textures with the R-wrap mode, but I don't think there are any piglit tests for that. No test changes on NV20 (10de:0201). Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/gen8: Remove unused gen8_emit_3dstate_multisample()Topi Pohjolainen2017-09-212-17/+0
| | | | | | Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
* i965: Fix duplication of DRI imagesLouis-Francis Ratté-Boulianne2017-09-201-0/+3
| | | | | | | | | | | Some DRI image properties weren't properly duplicated in the new image. Some properties are still missing, but I'm not certain if there was a good reason to let them out in the first place. Signed-off-by: Louis-Francis Ratté-Boulianne <[email protected]> Reviewed-by: Daniel Stone <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* Revert "i965: Reset miptree aux state on update_image_buffer"Jason Ekstrand2017-09-193-25/+1
| | | | This reverts commit e97f4b748094466567c7f3bad1a02ecee13db9c8.
* i965: Fix batch map failure check in INTEL_DEBUG=bat handling.Kenneth Graunke2017-09-181-1/+1
| | | | | | | | | | I originally wrote the code to call the maps 'batch' and 'state', until I remembered that 'batch' is the intel_batchbuffer struct pointer. The NULL check was still using the wrong variable. Caught by Coverity. CID: 1418109
* i965: Use prepare_external instead of make_shareable in setTexBuffer2Jason Ekstrand2017-09-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The setTexBuffer2 hook from GLX is used to implement glxBindTexImageEXT which has tighter restrictions than just "it's shared". In particular, it says that any rendering to the image while it is bound causes the contents to become undefined. This means that we can do whatever aux tracking we want between glxBindTexImageEXT and glxReleaseTexImageEXT so long as we always transition from external in Bind and to external in Release. The fact that we were using make_shareable before was a problem because it would resolve away 100% of the aux data and then throw away our reference to the aux buffer. If the aux data was shared with some other application (i.e. if we're using I915_FORMAT_MOD_Y_TILED_CCS) then we would forget that the aux data even existed for the rest of eternity. This is fine for the first frame but any subsequent calls to glxBindTexImageEXT would bind the texture as if it has no aux whatsoever and no resolves would happen and texturing would happen as if there is no aux. This was causing rendering corruption in mutter when running on top of X11 with modifiers. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Daniel Stone <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965/tex_image: Reference the renderbuffer miptree in setTexBuffer2Jason Ekstrand2017-09-181-19/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code made a new miptree that referenced the same BO as the renderbuffer and just trusted in the memory aliasing to work. There are only two ways in which the new miptree is liable to differ from the one in the renderbuffer and neither of them matter: 1) It may have a different target. The only targets that we can ever see in intelSetTexBuffer2 are GL_TEXTURE_2D and GL_TEXTURE_RECTANGLE and the difference between the two doesn't matter as far as the miptree is concerned; genX(update_sampler_state) only looks at the gl_texture_object and not the miptree when determining whether or not to use normalized coordinates. 2) It may have a very slightly different format. Again, this doesn't matter because we've supported texture views for quite some time so we always look at the gl_texture_object format instead of the miptree format for hardware setup anyway. On the other hand, because we were recreating the miptree, we were using intel_miptree_create_for_bo which doesn't understand modifiers. We really want this function to work without doing a resolve so long as you have modifiers so we need to fix that. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Daniel Stone <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Reset miptree aux state on update_image_bufferJason Ekstrand2017-09-183-1/+25
| | | | | | | | | | | | | When we get a miptree in through glxBindImageEXT, we don't know the current aux state so we have to assume the worst-case. If the image gets recreated, everything is fine because miptreecreate_for_dri_image sets it to the default. However, if our miptree is recycled, then we may have stale aux_usage and we need to reset to the default otherwise our aux_state tracking will get messed up. Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Daniel Stone <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* intel/isl: Add a drm_modifier_get_default_aux_state helperJason Ekstrand2017-09-181-2/+1
| | | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Daniel Stone <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Warn for GTT fallbacks when mapping the batch/state buffers.Kenneth Graunke2017-09-181-2/+2
| | | | | | | | This shouldn't really happen in practice, but I hit it a couple of times when running a driver with a bad memory leak. We may as well hook up the warning, because if it ever triggers, we'll know something is wrong. Reviewed-by: Chris Wilson <[email protected]>
* i965: Plumb brw through to intel_batchbuffer_reset().Kenneth Graunke2017-09-183-11/+11
| | | | | | We'll want to pass this to brw_bo_map in a moment. Reviewed-by: Chris Wilson <[email protected]>
* i965: emit BRW_NEW_AUX_STATE on aux state changesIago Toral Quiroga2017-09-181-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes a regression introduced with b96313c0e1289b296d7, which removed BRW_NEW_BLORP for a bunch of SURFACE_STATE setup code, including render targets, on the basis that blorp invalidates binding tables but not surface states, however, at least on Broadwell, this caused a regression in a CTS test, which Ken and Jason tracked down to the fact that we are not uploading new render target surface states after allocating new CCS_D surfaces for fast clears (which allocation is deferred until an actual clear occurs). The reason this only fails in BDW is that on SKL+ we use CCS_E which is allocated up front so it exists in the initial surface state, the problem can be reproduced in these platforms too if we use INTEL_DEBUG=norcb to force the CCS_D path. This patch, together with the ones preceding it, fixes the regression by ensuring that we track and flag as dirty all aux state changes. Credit goes to Jason and Ken for figuring out the reason for the regression. Fixes: KHR-GL45.transform_feedback.draw_xfb_test Reviewed-by: Jason Ekstrand <[email protected]>
* i965: emit BRW_NEW_AUX_STATE when we change the fast clear valueIago Toral Quiroga2017-09-183-11/+31
| | | | | | | v2: rename intel_miptree_set_clear_value to intel_miptree_set_clear_color (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* i965: emit BRW_NEW_AUX_STATE if we drop the aux surfaceIago Toral Quiroga2017-09-181-0/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965: rename BRW_NEW_FAST_CLEAR_COLOR to BRW_NEW_AUX_STATEIago Toral Quiroga2017-09-188-14/+14
| | | | | | | We want to use this flag to signal changes to the aux surfaces, so let's not make it about fast clearing only. Suggested by Jason. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add an INTEL_DEBUG=reemit option.Kenneth Graunke2017-09-151-1/+1
| | | | | | | | | Jason and I use this for debugging all the time. Recompiling the driver to enable it is kind of annoying. It's a great thing to try along with always_flush_batch=true and always_flush_cache=true to detect a class of problems - namely, atoms listening to an insufficient set of dirty bits. Reviewed-by: Matt Turner <[email protected]>
* i965: drop unused variablesEric Engestrom2017-09-152-2/+0
| | | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/tex: Unify the TexImage and TexSubImage codeJason Ekstrand2017-09-151-58/+45
| | | | | | | | | | It's nearly the same so there's no good reason why it can't be in a common function. The one difference is that _mesa_store_teximage calls AllocTextureImageBuffer for us, while _mesa_store_texsubimage doesn't, but we don't need that anyway - intelTexImage already does it. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/tex: Remove the for_glTexImage parameter from texsubimage_tiled_memcpyJason Ekstrand2017-09-151-14/+5
| | | | | | | | | It is set to false in both callers. It isn't needed for glTexImage because intelTexImage calls AllocTextureImageBuffer before calling texsubimage_tiled_memcpy. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/tex: Make a couple of helpers staticJason Ekstrand2017-09-152-22/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: Move TexSubImage functions to intel_tex_image.cJason Ekstrand2017-09-155-260/+210
| | | | | | | | These two paths are basically the same. There's no good reason to have them in different files. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/blorp: Set r8stencil_needs_update when writing stencilJason Ekstrand2017-09-151-0/+6
| | | | | | | | | | This fixes a crash on Haswell when we try to upload a stencil texture with blorp. It would also be a problem if someone tried to texture from stencil after glBlitFramebuffers. Cc: "17.2 17.1" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965: fix build warning on clangTapani Pälli2017-09-151-1/+2
| | | | | | | | | | | fixes following warning: warning: format specifies type 'long' but the argument has type 'uint64_t' (aka 'unsigned long long') cast is needed to avoid this change turning in to another warning: warning: format specifies type 'unsigned long long' but the argument has type 'uint64_t' (aka 'unsigned long') Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Print size of validation and relocation lists in INTEL_DEBUG=flushKenneth Graunke2017-09-141-3/+8
| | | | | | | | It's nice to have this information. While we're at it, tweak the formatting to try and vertically align numbers in the common case. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Disentangle batch and state buffer flushing.Kenneth Graunke2017-09-145-40/+30
| | | | | | | | | | | | | | | | | | | | | | We now flush the batch when either the batchbuffer or statebuffer reaches the original intended batch size, instead of when the sum of the two reaches a certain size (which makes no sense now that they're separate buffers). With this change, we also need to update our "are we near the end?" estimate to require separate batch and state buffer space. I obtained these estimates by looking at the size of draw calls in the Unreal 4 Elemental Demo (using INTEL_DEBUG=flush and always_flush_batch=true). This will significantly impact the size of our batches. I've adjusted both down to try and be roughly similar to what we had been doing. On various benchmarks, a 20kB batch and 16kB statebuffer seemed to about right, but we may need to adjust this further. I tried a 16kB batch, but that regressed Synmark OglMultithread performance by a fair bit. 32kB for both would have significantly increased our batch sizes. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>