summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* iris: Zero the compute predicate when changing the render conditionKenneth Graunke2019-02-211-0/+3
| | | | | | | | | | | | | | | | | | | | 1. Set a render condition. We emit it immediately on the render engine, and stash q->bo as ice->state.compute_predicate in case the compute engine needs it. 2. Clear the render condition. We were incorrectly leaving a stale compute_predicate kicking around... 3. Dispatch compute. We would then read the stale compute predicate, and try to load it into MI_PREDICATE_DATA. But q->bo may have been freed altogether, causing us to try and use garbage memory as a BO, adding it to the validation list, failing asserts, and tripping EINVALs in execbuf. Huge thanks to Mark Janes for narrowing this sporadic GL CTS failure down to a list of 48 tests I could easily run to reproduce it. Huge thanks to the Valgrind authors for the memcheck tool that immediately pinpointed the problem.
* iris: always include an extra constbuf0 if using UBOsCaio Marcelo de Oliveira Filho2019-02-214-50/+56
| | | | | | | | | | | | | | | | | In st_nir_lower_uniforms_to_ubo() all UBO access in the shader have its index incremented to open room for uniforms in constbuf0. So if we use UBOs, we always need to include the extra binding entry in the table. To avoid doing this checks both when compiling the shader and when assigning binding tables, store the num_cbufs in iris_compiled_shader. Fixes a bunch of tests from Piglit and CTS that use UBOs but don't use uniforms or system values. Note that some tests fitting this criteria were passing because the UBOs were moved to be push constants (avoiding the problem). Reviewed-by: Kenneth Graunke <[email protected]>
* iris: Do binder address allocations per-context, not globally.Kenneth Graunke2019-02-212-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iris_bufmgr allocates addresses across the entire screen, since buffers may be shared between multiple contexts. There used to be a single special address, IRIS_BINDER_ADDRESS, that was per-context - and all contexts used the same address. When I moved to the multi-binder system, I made a separate memory zone for them. I wanted there to be 2-3 binders per context, so we could cycle them to avoid the stalls inherent in pinning two buffers to the same address in back-to-back batches. But I figured I'd allow 100 binders just to be wildly excessive/cautious. What I didn't realize was that we need 2-3 binders per *context*, and what I did was allocate 100 binders per *screen*. Web browsers, for example, might have 1-2 contexts per tab, leading to hundreds of contexts, and thus binders. To fix this, we stop allocating VMA for binders in bufmgr, and let the binder handle it itself. Binders are per-context, and they can assign context-local addresses for the buffers by simply doing a ringbuffer style approach. We only hold on to one binder BO at a time, so we won't ever have a conflicting address. This fixes dEQP-EGL.functional.multicontext.non_shared_clear. Huge thanks to Tapani Pälli for debugging this whole mess and figuring out what was going wrong. Reviewed-by: Tapani Pälli <[email protected]>
* iris: Fix memzone_for_address for the surface and binder zonesKenneth Graunke2019-02-211-2/+2
| | | | | | | | | | | We use > for IRIS_MEMZONE_DYNAMIC because IRIS_BORDER_COLOR_POOL_ADDRESS lives at the very start of that zone. However, IRIS_MEMZONE_SURFACE and IRIS_MEMZONE_BINDER are normal zones. They used to be a single zone (surface) with a single binder BO at the beginning, similar to the border color pool. But when I moved us to multiple binders, I made them have a real zone (if a small one). So both zones should use >=. Reviewed-by: Tapani Pälli <[email protected]>
* iris: Don't whack SO dirty bits when finishing a BLORP opKenneth Graunke2019-02-211-0/+2
| | | | | Re-emitting 3DSTATE_SO_BUFFERS can be hazardous, as it could zero offsets. Plus, it's just not necessary - BLORP doesn't change these.
* iris: Fix SO issue with INTEL_DEBUG=reemit, set fewer bitsKenneth Graunke2019-02-211-2/+5
| | | | | | | | | INTEL_DEBUG=reemit was breaking streamout tests, by re-emitting 3DSTATE_SO_BUFFER commands that tell the HW to zero the SO write offsets. We would need to alter them to use 0xFFFFFFFF for the offset. Also, have each upload function only flag bits relevant to its own pipeline.
* iris: CS stall on VF cache invalidate workaroundsKenneth Graunke2019-02-212-3/+6
| | | | See commit 31e4c9ce400341df9b0136419b3b3c73b8c9eb7e in i965.
* iris: Pay attention to blit masksKenneth Graunke2019-02-211-11/+22
| | | | | For combined depth/stencil formats, we may want to only blit one half. If PIPE_BLIT_Z is set, blit depth; if PIPE_BLIT_S is set, blit stencil.
* iris: Assert about blits with color maskingKenneth Graunke2019-02-211-0/+4
| | | | | st/mesa never asks for this today, but in theory someone might, and we don't support it.
* iris: Don't enable smooth points when point sprites are enabledKenneth Graunke2019-02-211-4/+3
| | | | dEQP-GLES3.functional.rasterization.fbo.rbo_multisample_*.primitives.points
* iris: Allow sample mask of 0Kenneth Graunke2019-02-211-1/+1
| | | | | | | | I think this was an attempt to work around various sample mask bugs I had early on. It's not correct. A sample mask of 0 is legal and means to disable all samples. Fixes dEQP-GLES31.functional.texture.multisample.*.*sample_mask*
* iris: fail to create screen for older unsupported HWKenneth Graunke2019-02-211-0/+3
| | | | loader shouldn't try, but let's be paranoid
* iris: Switch to the new PIPELINE_STATISTICS_QUERY_SINGLE capabilityKenneth Graunke2019-02-212-44/+6
| | | | | | | I had a hack in place earlier to pass the query type as q->index for the regular statistics query, but we ended up adjusting the interface and adding a new query type. Use that instead, fixing pipeline statistics queries since the rebase.
* iris: Use new PIPE_STAT_QUERY enums rather than hardcoded numbers.Kenneth Graunke2019-02-211-2/+5
|
* iris: Fix Broadwell WaDividePSInvocationCountBy4Kenneth Graunke2019-02-211-7/+7
| | | | | | | | | | We were dividing by 4 in calculate_result_on_gpu(), and also in iris_get_query_result(). We should stop doing the latter, and instead divide by 4 in calculate_result_on_cpu() as well. Otherwise, if snapshots were available, and you hit the calculate_result_on_cpu() path, but requested it be written to a QBO, you'd fail to get a divide.
* iris: Delete genx->bound_vertex_buffersKenneth Graunke2019-02-211-3/+0
| | | | This is actually stored in ice->state, as it isn't gen-specific
* iris: Drop a dead commentKenneth Graunke2019-02-211-2/+0
|
* iris: Don't check other batches for our batch BOKenneth Graunke2019-02-211-25/+27
| | | | | | | | | | This is an awkward corner case. We create batches in order, each of which creates and pins a BO. The other batches may not be set up yet, so it may not be safe to ask whether they reference a BO. Just avoid this for now. We could avoid it for other context-local BOs too, but we currently don't have a flag for that (and I'm not certain whether it's worth it).
* iris: Handle PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE somewhatKenneth Graunke2019-02-211-3/+6
| | | | | | | | | | | | Various places in the transfer code need to know whether they must read the existing resource's values. Rather than checking both flags everywhere, just make PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE also flag PIPE_TRANSFER_DISCARD_RANGE - if we can discard everything, we can discard a subrange, too. Obviously, we can do better for PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE, but eventually u_threaded_context should handle swapping out buffers for new idle buffers, anyway. In the meantime, this is at least better.
* iris: Flush the render cache in flush_and_dirty_for_historyKenneth Graunke2019-02-211-0/+7
| | | | | | | BLORP uses the render engine to write to buffers, and we need to flush that data out to the actual surface (finishing the write). Then, the rest of this function invalidates any caches that might have stale data which needs to be refetched.
* iris: Implement multi-slice copy_regionKenneth Graunke2019-02-211-11/+9
| | | | | | I don't know if this is required - surprisingly, I haven't seen it matter - but I'd like to use it for multi-slice transfer maps. We may as well do the right thing.
* iris: Leave a comment about why Broadwell images are brokenKenneth Graunke2019-02-211-0/+4
| | | | | | There are a variety of ways to fix this, many of which are simple, but I could use some advice on which ones other people prefer, and so we'll punt until after the holidays.
* iris: Fix surface states for Gen8 lowered-to-untype imagesKenneth Graunke2019-02-211-7/+26
| | | | We have to use SURFTYPE_BUFFER and ISL_FORMAT_RAW for these.
* iris: Fill out brw_image_params for storage images on BroadwellKenneth Graunke2019-02-213-9/+138
|
* iris: Don't make duplicate system valuesKenneth Graunke2019-02-212-7/+23
| | | | | | | We were relying on CSE/GVN/etc to coalesce all intrinsics that load the same value, but that's a bad idea. We might have a couple intrinsics that reload the same value. If so, we only want to set up the uniform on the first one we see.
* iris: Don't enable push constants just because there are system valuesKenneth Graunke2019-02-211-2/+1
| | | | | | | | | | | | System values are built-in uniforms. We set them up as UBO values, and might pull or push them. UBO push analysis will take care of that. We only want to enable push constants if there's an actual range being pushed. Otherwise, we might get into a scenario where 3DSTATE_PS enables push constants but 3DSTATE_CONSTANT_PS isn't pushing anything. This fixes GPU hangs in Broadwell image load store tests which have unused image param system values but no other uniforms. (We shouldn't be making those anyway, but that's a separate fix...)
* iris: Fix framebuffer layer countKenneth Graunke2019-02-211-1/+3
| | | | | | | | | | cso_fb->layers is only valid for no-attachment framebuffers. Use the helper function to get the real value, then stash it so we don't have to call the helper function on the old value for comparison, or at draw time for Force Zero RTA Index setting. This fixes Force Zero RTA Index being set even when attempting layered rendering.
* iris: handle qbo fragment shader invocation workaroundDave Airlie2019-02-211-0/+52
|
* iris: add fs invocations query workaround for broadwellDave Airlie2019-02-211-0/+6
|
* iris: setup gen8 capsDave Airlie2019-02-211-4/+4
|
* iris: limit gen8 to 8 samplesDave Airlie2019-02-211-1/+2
|
* iris/WIP: add broadwell supportDave Airlie2019-02-215-11/+58
| | | | This adds all the state changes, MOCS changes,
* iris: Delete bogus comment about cube array counting.Kenneth Graunke2019-02-211-5/+1
| | | | | | | | | Both 'z' and 'depth' are counted in slices, according to the Gallium docs (context.rst). In our temporary memory, we allocate `box.depth` slices, so we need to rebase the starting slice (box.z) down to 0, and back again when writing on unmap. There's nothing strange about cubes here.
* iris: Fix compute scratch pinningKenneth Graunke2019-02-211-2/+1
| | | | Thanks to Eero Tamminen for helping catch this.
* iris: Add a more long term TODO about timebase scalingKenneth Graunke2019-02-211-0/+6
|
* iris: Only resolve inputs for actual shader stagesKenneth Graunke2019-02-213-12/+11
| | | | | We don't need to consider compute at render time, and don't need to consider disabled stages. 4% on drawoverhead.
* iris: Fix assertion in iris_resource_from_handle() tiling usageRhys Kidd2019-02-211-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assertion error: iris_resource_from_handle: Assertion `res->bo->tiling_mode == isl_tiling_to_i915_tiling(res->surf.tiling)' failed. This patch fixes 16 piglit tests on KBL: glx/glx-multithread-texture glx/glx-query-drawable-glx_fbconfig_id-glxpbuffer glx/glx-query-drawable-glx_fbconfig_id-glxpixmap glx/glx-query-drawable-glx_preserved_contents glx/glx-query-drawable-glxpbuffer-glx_height glx/glx-query-drawable-glxpbuffer-glx_width glx/glx-query-drawable-glxpixmap-glx_height glx/glx-query-drawable-glxpixmap-glx_width glx/glx-swap-pixmap glx/glx-swap-pixmap-bad glx/glx-tfp glx/glx-visuals-depth -pixmap glx/glx-visuals-stencil -pixmap spec/egl 1.4/eglcreatepbuffersurface and then glclear spec/egl 1.4/largest possible eglcreatepbuffersurface and then glclear spec/egl_nok_texture_from_pixmap/basic Cc: Kenneth Graunke <[email protected]> Cc: Jason Ekstrand <[email protected]> Signed-off-by: Rhys Kidd <[email protected]>
* iris: Fix scratch space allocation on Icelake.Kenneth Graunke2019-02-211-4/+8
| | | | | | | | | Gen9-10 have fewer than 4 subslices per slice, so they need this to be rounded up. Gen11 isn't documented as needing this hack, and it can also have more than 4 subslices, so the hack actually can break things. Fixes tests/spec/arb_enhanced_layouts/execution/component-layout/ sso-vs-gs-fs-array-interleave
* iris: better MOCSKenneth Graunke2019-02-212-26/+30
|
* iris: fix gpu calcs for timestamp queriesDave Airlie2019-02-211-1/+31
|
* iris: only mark depth/stencil as writable if writes are actually enabledKenneth Graunke2019-02-211-10/+17
|
* iris: more dead commentsKenneth Graunke2019-02-212-12/+0
|
* iris: pin and re-pin the scratch BOKenneth Graunke2019-02-213-14/+29
|
* iris: delete finished commentsKenneth Graunke2019-02-211-2/+0
|
* iris: always pin the binder...in the compute context, too.Kenneth Graunke2019-02-211-0/+7
| | | | not sure why this hasn't tripped things up
* iris: Track blend enables, save outbound for resolve codeKenneth Graunke2019-02-212-1/+17
|
* iris: whitespace fixesKenneth Graunke2019-02-212-7/+7
|
* iris: Make a alloc_surface_state helperKenneth Graunke2019-02-211-18/+22
| | | | This does the gtt_offset addition for us
* iris: Use a surface state fill helperKenneth Graunke2019-02-211-18/+19
| | | | This will check aux_usage eventually
* iris: don't print the pointer in INTEL_DEBUG=submitKenneth Graunke2019-02-211-4/+3
| | | | | lots of noise in diff, hope was it would be useful for gdb, but the the GEM handle is good enough