| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Setting a couple of bits is the same cost or less as conditionally
setting a couple of bits.
|
|
|
|
|
|
|
| |
We've often created the CFG immediately before, so use it when
available.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
cfg, for instance, is a pointer to a local variable in
calculate_live_intervals, certainly not valid after that function has
returned.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Acked-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Acked-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Makes it more clear what we're doing and requires less knowledge of
exec_list.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Acked-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
UniformBufferSize is in bytes so we need to divide by 16 to get the
number of constant buffer slots. Also, the ureg_DECL_constant2D()
function takes first..last parameters so we need to subtract one
for the last value.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we were always using the address register and indirect addressing
to index into a UBO constant buffer. With this change we only do that
when necessary.
Using the piglit bin/arb_uniform_buffer_object-rendering test as an
example:
Shader code:
uniform ub_rot {float rotation; };
...
m[1][1] = cos(rotation);
Before:
IMM[1] INT32 {0, 1, 0, 0}
1: UARL ADDR[0].x, IMM[1].xxxx
2: MOV TEMP[0].x, CONST[3][ADDR[0].x].xxxx
3: COS TEMP[1].x, TEMP[0].xxxx
After:
0: COS TEMP[0].x, CONST[3][0].xxxx
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise, if we were creating a const buffer src register for a UBO
the index into the UBO was always zero.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To implement the unlit_centroid_workaround, previously we emitted
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 1Q };
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 1Q };
where the flag register contains the channel enable bits from g0.
Since the predicates are complementary, the pair of pln instructions
write to non-overlapping components of the destination, which is the
case that the dependency control hints are designed for.
Typically setting dependency control hints on predicated instructions
isn't safe (if an instruction doesn't execute due to the predicate, it
won't update the scoreboard, leaving it in a bad state) but since we
must have at least one channel executing (i.e., +f0 is true for some
channel) by virtue of the fact that the thread is running, we can put
the +f0 pln instruction last and set the hints:
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 NoDDClr 1Q };
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 NoDDChk 1Q };
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
| |
Maybe lets us skip some PLN instructions if whole subspans are disabled?
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
| |
And plumb them through. Also make the assert in the generator look like
the vec4 one.
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This sequence (where both x and w are used afterwards) wasn't handled.
mul.sat x, y, z
...
mov.sat w, x
We assumed that if x was used after the mov.sat, that we couldn't
propagate the saturate modifier, but in fact x was already saturated.
So ignore the live range check if the producing instruction already
saturates its result. Cuts one instruction from hundreds of TF2 shaders.
total instructions in shared programs: 1995631 -> 1994951 (-0.03%)
instructions in affected programs: 155248 -> 154568 (-0.44%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Cuts 10k of .text and saves a bunch of useless struct copies.
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
4231165 123200 39648 4394013 430c1d i965_dri.so
4186277 123200 39648 4349125 425cc5 i965_dri.so
Cuts 43k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
4244821 123200 39648 4407669 434175 i965_dri.so
4231165 123200 39648 4394013 430c1d i965_dri.so
Cuts 13k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
4270747 123200 39648 4433595 43a6bb i965_dri.so
4244821 123200 39648 4407669 434175 i965_dri.so
Cuts 25k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DRI_PRIME is not very handy, because you have to launch the executable
with it set, which is not always easy to do.
By using drirc, the user specifies the target executable
and the device to use. After that the program will be launched everytime
on the target device.
For example if .drirc contains:
<driconf>
<device driver="loader">
<application name="Glmark2" executable="glmark2">
<option name="device_id" value="pci-0000_01_00_0" />
</application>
</device>
</driconf>
Then glmark2 will use if possible the render-node of
ID_PATH_TAG pci-0000_01_00_0.
v2: Fix compilation issue
v3: Add "-lm" and rebase.
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
This in theory changes ABI for the boolean->bool I think,
but nothing in the tree uses configQueryb AFAICS.
Reviewed-by: Axel Davy <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This just drops all the GL types from the xmlconfig and use
std C types from stdint and stdbool.
v2: drop further double and header include.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Apparently INTEL_DEBUG=fs has crashed on Broadwell for anything using
ARB_fragment_program since commit 9cee3ff5. We need to NULL-check the
right field.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
| |
The functionality has been merged into brw_disasm.c; use that instead.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
| |
At this point, brw_disassemble can do everything gen8_disassemble can
do - and, thanks to the new brw_inst API, it supports all generations.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we decoded render target write messages as:
render ( RT write, 0, 16, 12, 0) mlen 8 rlen 0
which made you remember (or look up) what the numbers meant:
1. The binding table index
2. The raw message control, undecoded:
- Last Render Target Select
- Slot Group Select
- Message Type (SIMD8, normal SIMD16, SIMD16 replicate data, ...)
3. The dataport message type, again (already decoded as "RT write")
4. The write commit bit (0 or 1)
Needless to say, having to decipher that yourself is annoying. Now, we
do:
render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0
with optional "Hi" and "WriteCommit" for slot group/write commit.
Thanks to the new brw_inst API, we can also stop duplicating code on a
per-generation basis.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We haven't used the name "message target" in a while - there are a lot
of things called "target", and it gets confusing. SFID ("Shared
Function ID") is the term commonly used in the modern documentation.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
| |
The name of this message is "Render Target UNORM Write" (Sandybridge
PRM, Volume 4 Part 1, Page 210). Drop the bogus 'c'.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Most developers will recognize the Gen6+ SFID names more quickly than
the Gen4-5 ones. Given that they're the same values, just use the new
names.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
We should print something properly, but I'm not sure how to properly
print an HF, and we don't have any DFs today to test with.
This is at least better than the current Gen8 disassembler, which would
simply assert fail.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Making a helper function saves us from cut and pasting this four times.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is a port of Abdiel's 6f9f916b9b042a294813ab0542390846a38739da
to brw_disasm.c.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This backports the atomic message disassembly support from
gen8_disasm.c, which additionally offers support for decoding atomic
surface read/write messages, and showing SIMD modes and other details.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
I never bothered implementing the disassembler for Gen7+ URB opcodes, so
we were just disassembling them as Ironlake/Sandybridge ones. This
looked pretty bad when running Paul's GS EndPrimitive tests, as the
"write OWord" message was decoded at ff_sync, which doesn't exist.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
We don't use these yet, but we may as well disassemble them.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
| |
While we're adding things, use symbolic constants rather than magic
numbers.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
These have existed since Ivybridge. We don't use them today, but the
Gen8+ disassembler supports them, and I'd like to use symbolic names
rather than magic numbers.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This makes brw_disasm.c able to disassemble ELSE instructions correctly
on Broadwell. (gen8_disasm.c already handles this correctly.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kristian Høgsberg <[email protected]>
|