| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, ATTR was indexed by VERT_ATTRIB_* slots; at the end of
compilation, assign_vs_urb_setup() translated those into GRF units,
and converted ATTR to HW_REGs.
This patch moves the transslation earlier, making ATTR work in terms of
GRF units from the beginning. assign_vs_urb_setup() simply has to add
the number of payload registers and push constants to obtain the final
hardware GRF number. (We can't do this earlier as those values aren't
known.)
ATTR still supports reg_offset; however, it's simply added to reg.
It's not clear whether this is valuable or not.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For the VS and FS stages that use ARB_vertex_program or
ARB_fragment_program we don't have a shader program, however,
when debuging is enabled, we call brw_dump_ir like this:
brw_dump_ir("vertex", prog, &vs->base, &vp->program.Base);
where vs will be NULL (since prog is NULL).
As pointed out by Chris, this &vs->base is not really a dereference,
it simply computes a new address that just happens to be 0x0 because
the offset of base in brw_shader is 0. Then brw_dump_ir will see a
NULL pointer and not do anything. This is why this does not crash at
the moment. However, this does not look very safe (it would crash
for any location of base that is not the first in brw_shader), so
patch it to prevent a potential (even if unlikely) problem in the
future.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Curro added this in commit 3ee2daf23d (before the vec4/NIR backend was
added) but it was missed in the new NIR backend. Add it there as well.
instructions in affected programs: 1857 -> 1810 (-2.53%)
helped: 15
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
| |
For scalar VS, I'll need this in brw_fs.cpp as well. It seems silly to
redeclare it in three places.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we used nir_lower_io with the scalar type_size function,
which mapped VERT_ATTRIB_* locations to...some numbers. Then, in
fs_visitor::nir_setup_inputs(), we created temporaries indexed by
those numbers, and emitted MOVs from the actual ATTR registers to
those temporaries. Virtually all of these were copy propagated away,
but it's still ugly.
This patch reworks our input lowering to produce NIR lower_input
intrinsics that properly index into the ATTR file, so we can access
it directly.
No changes in shader-db.
v2: Fix unreachable() message (Ken), update commit message (Matt).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
nr_attributes is used to compute first_non_payload_grf, which is the
first register we're allowed to use for ordinary register allocation.
The hardware requires us to read at least one pair of values, but we're
completely free to overwrite that garbage register with whatever we like.
Instead of altering nr_attributes, we should alter urb_read_length, which
only affects the amount we ask the VF to read. This should save us a
register in trivial cases (which admittedly isn't very useful).
While we're at it, improve the explanation in the comments.
v2: Actually do what I said (caught by Ilia).
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both the vec4 and scalar VS backends had virtually identical URB entry
size and read length calculations. We can move those up a level to
backend-agnostic code and reuse it for both.
Unfortunately, the backends need to know nr_attributes to compute
first_non_payload_grf, so I had to store that in prog_data. We could
use urb_read_length, but that's nr_attributes rounded up to a multiple
of two, so doing so would waste a register in some cases.
There's more code to be removed in the vec4 backend, but that will
come in a follow-on patch.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Switch statements introduce a bogus loop with an unconditional break at
the end of the loop, just before the while...so the while is unreachable
and has no immediate dominator.
v2: With less exuberance
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Some assertions in gen8_surface_state.c checked for gen < 8.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The (gen < 9) check in brw_clear() was too broad. It disabled all types
of fast color clears:
a. singlesample rep clears
b. singlesample MCS fast clears
c. multisample MCS fast clears
The MCS clears are still buggy, but the rep clear works well. So let's
enable it.
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
|
|
| |
Fast color clears are disabled for gen9 (see the checks in
brw_meta_fast_clear), so there is no reason to allocate the MCS and
track its clear/resolve state.
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
|
|
|
| |
They didn't do anything useful.
Reviewed-by: Emil Velikov <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
| |
gl_image_unit::_Valid will be removed in a future commit.
Tested-by: Ye Tian <[email protected]>
CC: "11.0" <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware documentation relating to the UAV HW-assisted coherency
mechanism and UAV access enable bits is scarce and sometimes
contradictory, and there's quite some guesswork behind this commit, so
let me summarize the background first: HSW and later hardware have
infrastructure to support a stricter form of data coherency between
shader invocations from separate primitives. The mechanism is
controlled by the "Accesses UAV" bits on 3DSTATE_VS, _HS, _DS, _GS and
_PS (or _PS_EXTRA on BDW+), and the "UAV Coherency Required" bit on
the 3DPRIMITIVE command.
Regardless of whether "UAV Coherency Required" is set, the hardware
fixed-function units will increment a per-stage semaphore for each
request received if "Accesses UAV" is set for the same or any lower
stage. An implicit DC flush is emitted by the lowermost stage with
"Accesses UAV" set once it's done processing the request, this also
happens regardless of the value of "UAV Coherency Required". The
completion of the DC flush will cause the same stage and all previous
ones to decrement the semaphore, marking the UAV accesses for the
primitive as coherent with L3.
The "UAV Coherency Required" 3DPRIMITIVE bit will cause a pipeline
stall before any threads are dispatched for the first FF stage with
"Accesses UAV" set until the semaphore is cleared for the same stage.
Effectively this guarantees that UAV memory accesses performed by
previous primitives from any stage will be strictly ordered (and
thanks to the implicit DC flush visible in memory) with UAV accesses
from the following primitives.
None of this is required by the usual image, atomic counter and SSBO
GL APIs which have very relaxed cross-primitive coherency and ordering
requirements, so we don't actually ever set the "UAV Coherency
Required" bit -- Ordering with respect to shader invocations from
previous stages on the same primitive where there is a data dependency
is of course already guaranteed as the spec requires, regardless of
this mechanism being enabled. We do set the "Accesses UAV" bits
though since my commit ac7664e493655e290783c23a0412b9c70936da50 (which
this patch partially reverts), mainly because of comments like the
following from the BDW PRM:
> 3DSTATE_GS
>[...]
> 12 Accesses UAV
> Format: Enable
> This field must be set when GS has a UAV access.
There are similar comments in the documentation for the other
3DSTATE_*S commands. The "must" part is misleading and unjustified
AFAIK. Most of the "Accesses UAV" bits don't seem to have any side
effects other than the implicit DC flushes and the related
book-keeping in anticipation for a subsequent primitive with "UAV
Coherency Required" set, so in most cases they are unnecessary and may
incur a performance penalty. There is an exception though. On Gen8+
the PS_EXTRA UAV access bit influences the calculation of the PS
UAV-only and ThreadDispatchEnable signals which on previous
generations were set explicitly by the driver, so we cannot always
avoid enabling it on the PS stage.
The primary motivation for this change is that in fact the hardware
coherency mechanism is buggy and will cause a rather non-deterministic
hang on Gen8 when VS is the only stage with "Accesses UAV" set and the
processing of a request terminates immediately after the implicit DC
flush is sent for a previous primitive with no additional vertices
being emitted for the second primitive, what will cause the hardware
to skip sending a second DC flush and cause the VS to stall
indefinitely waiting for a response from the DC (BDWGFX HSD 1912017).
This hardware bug can be reproduced on current master with the
spec@arb_shader_image_load_store@host-mem-barrier@Indirect/RaW piglit
subtest (if you have the patience to run it a few dozen times).
The proposed workaround is to insert CS STALLs speculatively between
3DPRIMITIVE commands when "Accesses UAV" is enabled for the VS stage
only. Because this would affect one of the hottest paths in the
driver and likely decrease performance even further due to the
unnecessary serialization, and because we don't actually need the
implicit DC flushes, it seems better to just disable them.
Cc: 11.0 <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a non-const sample number is given to interpolateAtSample it will
now generate an indirect send message with the sample ID similar to
how non-const sampler array indexing works. Previously non-const
values were ignored and instead it ended up using a constant 0 value.
The generator will try to determine if the sample ID is dynamically
uniform via nir_src_is_dynamically_uniform. If not it will query the
pixel interpolator in a loop, once for each different live sample
number. The next live sample number is found using emit_uniformize. If
multiple live channels have the same sample number then they will be
handled in a single iteration of the loop. The loop is necessary
because the indirect send message doesn't seem to have a way to
specify a different value for each fragment.
This fixes the following two Piglit tests:
arb_gpu_shader5-interpolateAtSample-nonconst
arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform
v2: Handle dynamically non-uniform sample ids.
v3: Remove the BREAK instruction and predicate the WHILE directly.
Make the tokens arrays const. (Matt Turner)
v4: Iterate over the live channels instead of each possible sample
number.
v5: Don't special case immediate values in
brw_pixel_interpolator_query. Make a better wrapper for the
function to set up the PI send instruction. Ensure that the SHL
instructions are scalar. (Francisco Jerez).
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
| |
It is possible to directly predicate the WHILE instruction. In this
case there will be a second successor block because the execution can
resume from the instruction after the loop. This will be used in a
subsequent patch.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we were unconditionally assigning the TargetIndex field in
_mesa_BindTexture(), even if it was already set properly. Now we
initialize TargetIndex wherever we initialize the Target field, in
_mesa_initialize_texture_object(), finish_texture_init(), etc.
v2: also update the meta_copy_image code. In make_view() the
view_tex_obj->Target field was set, but not the TargetIndex field.
Also, remove a second, redundant assignment to view_tex_obj->Target.
Add sanity check assertions too.
Reviewed-by: Anuj Phogat <[email protected]>
Tested-by: Mark Janes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can now link the unit tests against just libi965_compiler.la. This
lets us drop a lot of DRI driver dependencies, but we still pull in all
of libmesa and more.
This also provides a few standalone users of libi965_compiler.la, which
will help us accidentally using i965_dri.so functions from the compiler.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This introduces a new libtool helper library, libi965_compiler.la. This
library is moderately self-contained, but still needs to link to all of
libmesa.la among other things.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
brw_get_shader_time_index() is all tangled up in brw_context state and
we can't call it from the compiler. Thanks the Jasons recent
refactoring, we can just get the index and pass to the emit functions
instead.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
We call this from the compiler so move it to brw_shader.cpp.
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function computes the next power of two, but at least 1024. We can
do that by bitwise or'ing in 1023 and calling util_next_power_of_two().
We use brw_get_scratch_size() from the compiler so we need it out of
brw_program.c. We could move it to brw_shader.cpp, but let's make it a
small inline function instead.
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
brw_program.c won't be part of the compiler library, but we need
brw_mark_surface_used() in the compiler. Move to brw_shader.cpp.
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The initial motivation for this patch was to avoid calling
brw_cs_prog_local_id_payload_dwords() in gen7_cs_state.c from the
compiler. This commit ends up refactoring things a bit more so as to
split out the logic to build the local id payload to brw_fs.cpp. This
moves the payload building closer to the compiler code that uses the
payload layout and makes it available to other users of the compiler.
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to use the rest of brw_shader.cpp with the rest of the compiler
without pulling in the GLSL linking code.
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need the debug flag parsing and INTEL_DEBUG in the compiler, but we
don't want the dependency on bufmgr (libdrm_intel) in there. Move to
intel_screen.c.
There are now only two lines left in brw_process_intel_debug_variable(),
but we keep it in intel_debug.h to avoid having to expose
'debug_control' as a global variable.
Reviewed-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We want to use intel_debug.c in code that doesn't link to dri common.
v2: Remove unnecessary stddef.h include (Topi), use util/debug.h
in all DRI driver and remove driParseDebugString() (Iago).
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
We move these calls one level up into the codegen functions.
Reviewed-by: Topi Pohjolainen <[email protected]>
Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
|
|
|
|
|
|
|
| |
Comit d48ac9306619 addressed this for VS, but we forgot to do the same for
URB writes generated by the gen6 GS.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
That should make tracking where we do spills and pull loads a bit easier.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
So they do not conflict with our (un)spills (MRF 21..23) or our
URB writes (MRF 1..15)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bring the following commit over to i915:
commit ec542d74578bbef6b55125dd6aba1dc7f5079e65
Author: Eric Anholt <[email protected]>
Date: Mon Mar 3 10:43:10 2014 -0800
i965: Drop broken front_buffer_reading/drawing optimization.
Not sure if it might fix anything, but since the i965 and i915 used to
share a bunch of that code, it would seem reasonable the same problems
could be present in the i915 code still, and the i965 approach is well
tested by now so bringing it over seems fairly safe.
No piglit regressions on 855.
v2: Rebase on _mesa_is_front_buffer_* refactor.
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
There are multiple similar implementations of these functions, and a
later patch was going to add another.
v2: Move removing intel_framebuffer to a different patch.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Flip the cull bits when rendering to a user fbo on gen2. This
was already done on gen3 (since before git history starts)
but was missing from the gen2 code.
Fixes rendering of the driver+kart model in supertuxkart kart
selection screen.
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The hardware can draw lines 0.5 to 7.5 pixels wide. Adjust the limits
to 1.0-7.0. The old limits seems to be from the era when i915 and i965
were sharing this code.
Not really sure if 1.0-7.0 is correct. Maybe it could be 0.5.7.5 as
those are the hw limits, or maybe some combination of the two?
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sub-pixel adjustment for points was killed off in
commit 60d762aa625095a8c1f9597d8530bb5a6fa61b4c
Author: Xiang, Haihao <[email protected]>
Date: Wed Jan 2 11:38:51 2008 +0800
i915: Needn't adjust pixel centers. fix #12944
so if we don't need it in intel_tris.c we don't need it in
intel_render.c either, which means we can allow intel_render.c to render
points.
No apparent regressions on PNV in ES1 or ES2 conformance.
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sub-pixel adjustment for points was killed off in
commit 60d762aa625095a8c1f9597d8530bb5a6fa61b4c
Author: Xiang, Haihao <[email protected]>
Date: Wed Jan 2 11:38:51 2008 +0800
i915: Needn't adjust pixel centers. fix #12944
so we can just as well use COPY_DWORDS().
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
_tnl_RenderClippedPolygon and _tnl_RenderClippedLine already do most of
what we want so use them.
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
intelFastRenderClippedPoly() renders the polygon using triangles. For
polygons the provoking vertex is always the first one, and currently
this function assumes that the provoking vertex for triangles is the
last one. In case the user changed the provoking vertex convention,
the hardware may be configured to treat the first vertex of triangles
as the provoking vertex. So check the convention and emit the triangles
in the appropriate order to avoid having to change the hardware
provoking vertex convention for rendering polygons.
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Using C99 initializers for the primitive arrays makes things more
readable.
Signed-off-by: Ian Romanick <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Using C99 initializers for the primitive arrays makes things more
readable.
Signed-off-by: Ian Romanick <[email protected]>
Suggested-by: Matt Turner <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Using C99 initializers for the primitive arrays makes things more
readable.
Signed-off-by: Ville Syrjälä <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|