| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows the load-propagation pass to swap the sources in presence
of immediate values.
Maxwell (GM107):
total instructions in shared programs :1928187 -> 1927634 (-0.03%)
total gprs used in shared programs :330741 -> 330154 (-0.18%)
total local used in shared programs :28032 -> 28032 (0.00%)
local gpr inst bytes
helped 0 271 425 425
hurt 0 0 194 194
Fermi (GF114):
total instructions in shared programs :2334474 -> 2333829 (-0.03%)
total gprs used in shared programs :380934 -> 380215 (-0.19%)
total local used in shared programs :33304 -> 33264 (-0.12%)
local gpr inst bytes
helped 5 314 521 521
hurt 0 4 195 195
No regressions on GM107 and GF114 with full piglit.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Edward O'Callaghan <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are 2 uses:
- Asynchronous flushing for multithreaded drivers.
- Return a fence without flushing (mid-command-buffer fence). The driver
can defer flushing until fence_finish is called.
This is required to make Bioshock Infinite faster, which creates
1000 fences (flushes) per frame.
Reviewed-by: Edward O'Callaghan <[email protected]>
Reviewed-by: Rob Clark <[email protected]>
|
|
|
|
|
|
| |
v2: handle EINTR, remove backslashes
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression introduced in
1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved
from 0x180 to 0x1a0, and the macros have to be re-compiled.
Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression introduced in
1da704a94c57aa0b0cf8faaa3236fe47dfb8f88c because the offset has moved
from 0x600 to 0x620, and the kernels used for reading MP perf counters
have to be re-assembled.
This also fixes amd_performance_monitor_measure piglit.
Fixes: 1da704a ("nvc0: increase the tex handles area size in the driver")
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
This is as close as we get to a name for the 3D blocks.
|
|
|
|
|
|
| |
We don't want to bring up an old userspace driver on a kernel for
newer hardware. We'll also want to look at the other ident fields in
the future.
|
| |
|
|
|
|
|
| |
The required version is set to .69 for the getparam ioctl that will be
used in the next commit.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The build was failing because the official CL headers have a few defines,
like: # define cl_khr_gl_sharing 1
Which have the same name as some class members of clang's OpenCLOptions class.
If we include the cl headers first, this breaks the build because the member
names of this class are replaced by the literal 1.
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Vedran Miletić <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
clang commit r275822 removed unnecessary includes from header files,
so we now need to explicitly include clang/Lex/PreprocessorOptions.h
v2:
- Use <> instead of "" for the include path.
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Vedran Miletić <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Refactoring to leave existing simd_* intrinsics in "simdintrin.h" unchanged,
adding corresponding simd16_* intrinsics in "simd16intrin.h" on the side,
with emulation, that we can use piecemeal, rather than the all-or-nothing
approach to bring up avx512.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
Makes these names semantically correct.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
Fixes Linux warnings.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add support for enhanced attribute swizzling. Currently supports constant
source overrides to handle PrimitiveID support. No support yet for input
select swizzling or wrap shortest. Removes obsoleted linkageMask and
associated code.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
| |
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
| |
Moved the setting into the existing component control code. Fixes bad
interaction between attribute/component setting for vertex/instance ID
and component packing.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
| |
Enabling KNOB_SIMD_WIDTH = 16 for AVX512 pre-work and low level simd utils
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
| |
Adjust viewport rounding when scissor rect is disabled during macro
tile scissor setup.
Signed-off-by: Tim Rowley <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An earlier patch fixed the problem for classic drivers, however Gallium
was still left broken. This patch applies the same workaround to
Gallium, when compiled for Android. Following is a quote from the
original patch:
0cbc90c57cfc mesa: dri: Add shared glapi to LIBADD on Android
/system/vendor/lib/dri/*_dri.so actually depend on libglapi: without
this, loading the so file fails with:
cannot locate symbol "__emutls_v._glapi_tls_Context"
On non-Android (non-bionic) platform, EGL uses the following
workflow, which works fine:
dlopen("libglapi.so", RTLD_LAZY | RTLD_GLOBAL);
dlopen("dri/<driver>_dri.so", RTLD_NOW | RTLD_GLOBAL);
However, bionic does not respect the RTLD_GLOBAL flag, and the dri
library cannot find symbols in libglapi.so, so we need to link
to libglapi.so explicitly. Android.mk already does this.
Cc: "12.0" <[email protected]>
Signed-off-by: Tomasz Figa <[email protected]>
Signed-off-by: Nicolas Boichat <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Józef Kucia <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Józef Kucia <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows Gallium drivers to advertise the subpixel precision
for floating point viewports bounds.
v2:
- Set ViewportSubpixelBits in st_init_limits.
Signed-off-by: Józef Kucia <[email protected]>
Signed-off-by: Marek Olšák <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
| |
MS images have to be handled explicitly and I don't plan to implement
them for now.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
On Maxwell, images binding is slightly different (and much better)
regarding Fermi and Kepler because a texture view needs to be uploaded
for each image and this is going to simplify the thing a lot.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, we can store 32 tex handles of 32-bits integer each and
that fits perfectly with the underlying hardware except on GM107+
which requires to upload a texture view for each images.
This patch increases the number of storable texture handles in the
driver constant buffer from 32 to 40 because we expose 8 images.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This makes Bioshock Infinite with deferred flushing 2.2% faster.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
This makes Bioshock Infinite with deferred flushing 2% faster.
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This effectively removes s_waitcnt instructions after FP16 exports.
Before:
v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300
v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702
exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100
s_waitcnt expcnt(0) ; BF8C0F0F
v_cvt_pkrtz_f16_f32_e32 v0, v4, v5 ; 5E000B04
v_cvt_pkrtz_f16_f32_e32 v1, v6, v7 ; 5E020F06
exp 15, 1, 1, 0, 0, v0, v1, v0, v0 ; F800041F 00000100
s_waitcnt expcnt(0) ; BF8C0F0F
v_cvt_pkrtz_f16_f32_e32 v0, v8, v9 ; 5E001308
v_cvt_pkrtz_f16_f32_e32 v1, v10, v11 ; 5E02170A
exp 15, 2, 1, 0, 0, v0, v1, v0, v0 ; F800042F 00000100
s_waitcnt expcnt(0) ; BF8C0F0F
v_cvt_pkrtz_f16_f32_e32 v0, v12, v13 ; 5E001B0C
v_cvt_pkrtz_f16_f32_e32 v1, v14, v15 ; 5E021F0E
exp 15, 3, 1, 1, 1, v0, v1, v0, v0 ; F8001C3F 00000100
s_endpgm ; BF810000
After:
v_cvt_pkrtz_f16_f32_e32 v0, v0, v1 ; 5E000300
v_cvt_pkrtz_f16_f32_e32 v1, v2, v3 ; 5E020702
v_cvt_pkrtz_f16_f32_e32 v2, v4, v5 ; 5E040B04
v_cvt_pkrtz_f16_f32_e32 v3, v6, v7 ; 5E060F06
exp 15, 0, 1, 0, 0, v0, v1, v0, v0 ; F800040F 00000100
v_cvt_pkrtz_f16_f32_e32 v4, v8, v9 ; 5E081308
v_cvt_pkrtz_f16_f32_e32 v5, v10, v11 ; 5E0A170A
exp 15, 1, 1, 0, 0, v2, v3, v0, v0 ; F800041F 00000302
v_cvt_pkrtz_f16_f32_e32 v6, v12, v13 ; 5E0C1B0C
v_cvt_pkrtz_f16_f32_e32 v7, v14, v15 ; 5E0E1F0E
exp 15, 2, 1, 0, 0, v4, v5, v0, v0 ; F800042F 00000504
exp 15, 3, 1, 1, 1, v6, v7, v0, v0 ; F8001C3F 00000706
s_endpgm ; BF810000
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
ported from Vulkan
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
| |
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
| |
always set
Reviewed-by: Nicolai Hähnle <[email protected]>
|
|
|
|
|
|
|
|
| |
ADD only allows to emit 19-bits immediates.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: <[email protected]>
|
|
|
|
|
|
|
|
| |
Like FADD32I, the NEG modifier of src0 is at position 56.
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Cc: [email protected]
|
|
|
|
| |
Signed-off-by: Andreas Boll <[email protected]>
|
|
|
|
|
|
|
| |
CovID: 1363008
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Nayan Deshmukh <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add entrypoint to distinguish H.264 decode and encode. For example, in patch
5/11 when is calling "VaCreateContext", "pps" and "sps" shouldn't be allocated
for H.264 encoding. So we need to use the entry_point to determine this is
H.264 decode or H.264 encode. We can use config to determine the entrypoint
since config_id is passed to us for VaCreateContext call. However, for
VaDestoyContext call, only context_id is passed to us. So we need to know the
entrypoint in order to not free the pps/sps for encoding case.
Signed-off-by: Boyuan Zhang <[email protected]>
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Mark both L8_SRGB and L8A8_SRGB as non-renderable (the latter already
didn't have the bind flags). This makes the state tracker pick a
different format when rendering is required, or mark the fb as
incomplete. This fixes:
bin/getteximage-formats init-by-clear-and-render -auto -fbo
bin/getteximage-formats init-by-rendering -auto -fbo
which previously ran into srgb-encoding differences.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Samuel Pitoiset <[email protected]>
Cc: [email protected]
|