| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Setting a couple of bits is the same cost or less as conditionally
setting a couple of bits.
|
|
|
|
|
|
|
| |
We've often created the CFG immediately before, so use it when
available.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
cfg, for instance, is a pointer to a local variable in
calculate_live_intervals, certainly not valid after that function has
returned.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
foreach_list_typed_const was never used as far as I can tell.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Acked-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Acked-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Makes it more clear what we're doing and requires less knowledge of
exec_list.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Acked-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
We want to have the dri common files compiled to define USE_DRICONF.
We need to check both NEED_OPENGL_COMMON and HAVE_DRICOMMON
Signed-off-by: Axel Davy <[email protected]>
Tested-by: Brian Paul <[email protected]>
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
| |
UniformBufferSize is in bytes so we need to divide by 16 to get the
number of constant buffer slots. Also, the ureg_DECL_constant2D()
function takes first..last parameters so we need to subtract one
for the last value.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we were always using the address register and indirect addressing
to index into a UBO constant buffer. With this change we only do that
when necessary.
Using the piglit bin/arb_uniform_buffer_object-rendering test as an
example:
Shader code:
uniform ub_rot {float rotation; };
...
m[1][1] = cos(rotation);
Before:
IMM[1] INT32 {0, 1, 0, 0}
1: UARL ADDR[0].x, IMM[1].xxxx
2: MOV TEMP[0].x, CONST[3][ADDR[0].x].xxxx
3: COS TEMP[1].x, TEMP[0].xxxx
After:
0: COS TEMP[0].x, CONST[3][0].xxxx
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Otherwise, if we were creating a const buffer src register for a UBO
the index into the UBO was always zero.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To implement the unlit_centroid_workaround, previously we emitted
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 1Q };
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 1Q };
where the flag register contains the channel enable bits from g0.
Since the predicates are complementary, the pair of pln instructions
write to non-overlapping components of the destination, which is the
case that the dependency control hints are designed for.
Typically setting dependency control hints on predicated instructions
isn't safe (if an instruction doesn't execute due to the predicate, it
won't update the scoreboard, leaving it in a bad state) but since we
must have at least one channel executing (i.e., +f0 is true for some
channel) by virtue of the fact that the thread is running, we can put
the +f0 pln instruction last and set the hints:
(-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 NoDDClr 1Q };
(+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 NoDDChk 1Q };
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
| |
Maybe lets us skip some PLN instructions if whole subspans are disabled?
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
| |
And plumb them through. Also make the assert in the generator look like
the vec4 one.
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This sequence (where both x and w are used afterwards) wasn't handled.
mul.sat x, y, z
...
mov.sat w, x
We assumed that if x was used after the mov.sat, that we couldn't
propagate the saturate modifier, but in fact x was already saturated.
So ignore the live range check if the producing instruction already
saturates its result. Cuts one instruction from hundreds of TF2 shaders.
total instructions in shared programs: 1995631 -> 1994951 (-0.03%)
instructions in affected programs: 155248 -> 154568 (-0.44%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Cuts 10k of .text and saves a bunch of useless struct copies.
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
4231165 123200 39648 4394013 430c1d i965_dri.so
4186277 123200 39648 4349125 425cc5 i965_dri.so
Cuts 43k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
4244821 123200 39648 4407669 434175 i965_dri.so
4231165 123200 39648 4394013 430c1d i965_dri.so
Cuts 13k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
text data bss dec hex filename
4270747 123200 39648 4433595 43a6bb i965_dri.so
4244821 123200 39648 4407669 434175 i965_dri.so
Cuts 25k of .text and saves a bunch of useless struct copies.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves GLX DRI3 GPU offloading significantly on CPU
bound benchmarks particularly.
No performance impact for DRI2 GPU offloading.
v2: Add missing tests
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: Marek Olšák<[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The differences with DRI2 GPU offloading are:
a) There's no logic for GPU offloading needed in the Xserver
b) for DRI2, the card would render to a back buffer, and
the content would be copied to the front buffer (the same buffers
everytime). Here we can potentially use several back buffers and copy
to buffers with no tiling to share with X. We send them with the
Present extension.
That means than the DRI2 solution is forced to have tearings with GPU
offloading. In the ideal scenario, this DRI3 solution doesn't have this
problem.
However without dma-buf fences, a race can appear (if the card is slow
and the rendering hasn't finished before the server card reads the buffer),
and then old content is displayed. If a user hits this, he should probably
revert to the DRI2 solution (LIBGL_DRI3_DISABLE). Users with cards fast
enough seem to not hit this in practice (I have an Amd hd 7730m, and I
don't hit this, except if I force a low dpm mode)
c) for non-fullscreen apps, the DRI2 GPU offloading solution requires
compositing. This DRI3 solution doesn't have this requirement. Rendering
to a pixmap also works.
d) There is no need to have a DDX loaded for the secondary card.
V4: Fixes some piglit tests
Signed-off-by: Axel Davy <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
DRI_PRIME is not very handy, because you have to launch the executable
with it set, which is not always easy to do.
By using drirc, the user specifies the target executable
and the device to use. After that the program will be launched everytime
on the target device.
For example if .drirc contains:
<driconf>
<device driver="loader">
<application name="Glmark2" executable="glmark2">
<option name="device_id" value="pci-0000_01_00_0" />
</application>
</device>
</driconf>
Then glmark2 will use if possible the render-node of
ID_PATH_TAG pci-0000_01_00_0.
v2: Fix compilation issue
v3: Add "-lm" and rebase.
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: Fix the leak of device_name
v3: Rebased
It enables to use the DRI_PRIME env var to specify
which gpu to use.
Two syntax are supported:
If DRI_PRIME is 1 it means: take any other gpu than the default one.
If DRI_PRIME is the ID_PATH_TAG of a device: choose this device if
possible.
The ID_PATH_TAG is a tag filled by udev.
You can check it with 'udevadm info' on the device node.
For example it can be "pci-0000_01_00_0".
Render-nodes need to be enabled to choose another gpu,
and they need to have the ID_PATH_TAG advertised.
It is possible for not very recent udev that the tag
is not advertised for render-nodes, then
ones need to add a file containing:
SUBSYSTEM=="drm", IMPORT{builtin}="path_id"
in /etc/udev/rules.d/
Signed-off-by: Axel Davy <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Dave Airlie <[email protected]>
Signed-off-by: Axel Davy <[email protected]>
|
|
|
|
|
|
|
|
| |
This in theory changes ABI for the boolean->bool I think,
but nothing in the tree uses configQueryb AFAICS.
Reviewed-by: Axel Davy <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This just drops all the GL types from the xmlconfig and use
std C types from stdint and stdbool.
v2: drop further double and header include.
Reviewed-by: Eric Anholt <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
| |
This is just prep work for the dri3 prime patches.
Signed-off-by: Dave Airlie <[email protected]>
|