| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
Already implicitly handled by the build system.
Signed-off-by: Emil Velikov <[email protected]>
Tested-by: Aaron Watry <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Cc: "12.0 13.0" <[email protected]>
Cc: Aaron Watry <[email protected]>
Cc: Francisco Jerez <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Cc: "12.0 13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Cc: "12.0 13.0" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The latter can contain stale generated file, which, as-is, we'll end up
using.
Fixes: bfd17c76c12 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Cc: "12.0 13.0" <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Analogous to previous commit.
Fixes: 4610e5ef28e "freedreno/ir3: fix sin/cos"
Cc: "12.0 13.0" <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Nicolas Dechesne <[email protected]>
Reported-by: Nicolas Dechesne <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Tested-by: Nicolas Dechesne <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we might end up w/o the respective folder (depending on
autotools version) and fail at build time.
Fixes: bfd17c76c12 "i965: Port INTEL_PRECISE_TRIG=1 to NIR."
Cc: "12.0 13.0" <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Eric Engestrom <[email protected]>
|
|
|
|
|
|
|
| |
v2(Bas): Remove the extra VK_ERROR_FRAGMENTED_POOL cases.
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Eric Engestrom <[email protected]>
Reviewed-by: Bas Nieuwenhuizen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
All of these have had support for the TGSI opcodes since before most of
the glsl compiler work landed.
Also update the docs accordingly, including the missing note about i965.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2: add conversion opcodes.
v3 (idr): Rebase on replacemtn of TGSI_OPCODE_I2U64 with
TGSI_OPCODE_I2I64.
v4 (idr): "cut them down later" => Remove ir_unop_b2u64 and
ir_unop_u642b. Handle these with extra i2u or u2i casts just like
uint(bool) and bool(uint) conversion is done.
v5 (nha): add clarifying comment about a subtle assumption
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v1.1: move to using a normal CAP. (Marek)
v2: fill in the cap everywhere
Signed-off-by: Dave Airlie <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
Blorp clears already have an equivalent.
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
which is not applicable for "all slices at each lod". Current
logic makes one to believe it has some purpose. When miptree
layout is calculated brw_miptree_layout_texture_array() sets
the qpitch unconditionally but later on ignores it altogether
for ALL_SLICES_AT_EACH_LOD.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Such as comment states for intel_miptree_hiz_buffer::mt, hiz_mt
only exists for gen6. In addition, intel_hiz_miptree_buf_create()
uses MIPTREE_LAYOUT_FORCE_ALL_SLICE_AT_LOD unconditionally.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In intel_hiz_miptree_buf_create() intel_miptree_aux_buffer::bo
is unconditionally initialised to point to the same buffer
object as hiz_mt does. The same goes for
intel_miptree_aux_buffer::pitch/qpitch.
This will make following patches simpler to read.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In intel_hiz_miptree_buf_create() intel_miptree_aux_buffer::bo
is unconditionally initialised to point to the same buffer
object as hiz_mt does. Also intel_miptree_aux_buffer::offset
is initialised to zero (calloc()).
This will make following patches significantly simpler to read.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
There are is no alternative.
Reviewed-by: Samuel Iglesias Gons\341lvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Only caller, brw_workaround_depthstencil_alignment(), returns
early for gen6+.
While at it, reduce scope for brw_get_depthstencil_tile_masks() as
well.
Reviewed-by: Samuel Iglesias Gons\341lvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
There exact same check earlier in brw_miptree_layout() which
intel_miptree_create_layout() in turn calls unconditionally.
Reviewed-by: Samuel Iglesias Gons\341lvez <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
In addition, let intel_miptree_create_layout() release the
miptree - it is the allocator.
Reviewed-by: Jason Ekstrand <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Samuel Iglesias Gons<C3><A1>lvez <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
| |
Improves 1024x1024 TexSubImage2D by 41.2371% +/- 3.52799% (n=10).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had a lot of memcpy call overhead because gpu_stride wasn't being
inlined. But if you split out the stride==8 and stride==16 cases like
this code does while still using memcpy, you'd no longer have glibc's
NEON memcpy applied at which point we'd be doing 16 uncached reads
instead of 64/(NEON memcpy granularity), for about a 30% performance
hit. By hand writing the assembly, we can get a whole cacheline
loaded at a time.
Unfortunately, NEON intrinsics turned out to be unusable -- they
didn't have the vldm instruction available.
Note that, for now, the NEON code is only enabled when building for ARMv7
(Pi 2+). We may want to do runtime detection for the Raspbian case, in
the future.
Improves 1024x1024 GetTexImage by 208.256% +/- 7.07029% (n=10).
|
|
|
|
| |
This paves the way for building it twice, with NEON assembly or not.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This new query returns the current visible usage of VRAM accessed
by the CPU. It will return 0 on radeon because it's unimplemented.
Signed-off-by: Samuel Pitoiset <[email protected]>
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
R600_DEBUG="info" can be used to display that size, as well as
the total amount of VRAM/GTT.
Signed-off-by: Samuel Pitoiset <[email protected]>
Tested-by: Edmondo Tommasina <[email protected]>
Reviewed-by: Nicolai Hähnle <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Saves a measly 20 bytes on IA32 and nothing on x64. Depending on
exactly when this is applied, a lot of variation is possible due to
function alignment.
text data bss dec hex filename
6670131 228340 22552 6921023 699b3f lib/i965_dri.so before
6670111 228340 22552 6921003 699b2b lib/i965_dri.so after
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so before
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so after
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By putting the parameters first that match the parameters to the call
site, 4 (of 14) instructions are saved at _mesa_Uniform4fv on x64. On
IA32, the details of the instructions change, but it is the same count
and mix of instructions.
Before:
0000000000000830 <_mesa_Uniform4fv>:
830: 48 83 ec 10 sub $0x10,%rsp
834: 49 89 d0 mov %rdx,%r8
837: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 83e <_mesa_Uniform4fv+0xe>
83e: 89 f8 mov %edi,%eax
840: 89 f1 mov %esi,%ecx
842: 41 b9 02 00 00 00 mov $0x2,%r9d
848: 64 48 8b 3a mov %fs:(%rdx),%rdi
84c: 48 8b 97 c8 01 02 00 mov 0x201c8(%rdi),%rdx
853: 48 8b 72 70 mov 0x70(%rdx),%rsi
857: 6a 04 pushq $0x4
859: 89 c2 mov %eax,%edx
85b: e8 00 00 00 00 callq 860 <_mesa_Uniform4fv+0x30>
860: 48 83 c4 18 add $0x18,%rsp
864: c3 retq
After:
00000000000007f0 <_mesa_Uniform4fv>:
7f0: 48 83 ec 10 sub $0x10,%rsp
7f4: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 7fb <_mesa_Uniform4fv+0xb>
7fb: 41 b9 02 00 00 00 mov $0x2,%r9d
801: 64 48 8b 08 mov %fs:(%rax),%rcx
805: 48 8b 81 c8 01 02 00 mov 0x201c8(%rcx),%rax
80c: 6a 04 pushq $0x4
80e: 4c 8b 40 70 mov 0x70(%rax),%r8
812: e8 00 00 00 00 callq 817 <_mesa_Uniform4fv+0x27>
817: 48 83 c4 18 add $0x18,%rsp
81b: c3 retq
Saves a measly 416 bytes of text on x64. Depending on exactly when this
is applied, a lot of variation is possible due to function alignment.
text data bss dec hex filename
6670131 228340 22552 6921023 699b3f lib/i965_dri.so before
6670131 228340 22552 6921023 699b3f lib/i965_dri.so after
6343348 293872 29880 6667100 65bb5c lib64/i965_dri.so before
6342932 293872 29880 6666684 65b9bc lib64/i965_dri.so after
There is likely to be no performance change with just this patch.
_mesa_uniform immediately calls validate_uniform_parameters with
parameters in the "wrong" (different from the call site) order.
v2: Rebase on GL_ARB_gpu_shader_fp64.
v3: Rebase on GL_ARB_gpu_shader_int64.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By putting the parameters first that match the parameters to the call
site, 4 (of 16) instructions are saved at _mesa_UniformMatrix4fv on
x64. On IA32, the details of the instructions change, but it is the
same count and mix of instructions.
Before:
0000000000001380 <_mesa_UniformMatrix4fv>:
1380: 48 83 ec 10 sub $0x10,%rsp
1384: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 138b <_mesa_UniformMatrix4fv+0xb>
138b: 41 89 f8 mov %edi,%r8d
138e: 41 89 f1 mov %esi,%r9d
1391: 0f b6 d2 movzbl %dl,%edx
1394: 64 48 8b 38 mov %fs:(%rax),%rdi
1398: 48 8b b7 c8 01 02 00 mov 0x201c8(%rdi),%rsi
139f: 48 8b 76 70 mov 0x70(%rsi),%rsi
13a3: 68 06 14 00 00 pushq $0x1406
13a8: 51 push %rcx
13a9: 52 push %rdx
13aa: b9 04 00 00 00 mov $0x4,%ecx
13af: ba 04 00 00 00 mov $0x4,%edx
13b4: e8 00 00 00 00 callq 13b9 <_mesa_UniformMatrix4fv+0x39>
13b9: 48 83 c4 28 add $0x28,%rsp
13bd: c3 retq
After:
0000000000001360 <_mesa_UniformMatrix4fv>:
1360: 48 83 ec 10 sub $0x10,%rsp
1364: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 136b <_mesa_UniformMatrix4fv+0xb>
136b: 0f b6 d2 movzbl %dl,%edx
136e: 64 4c 8b 00 mov %fs:(%rax),%r8
1372: 49 8b 80 c8 01 02 00 mov 0x201c8(%r8),%rax
1379: 68 06 14 00 00 pushq $0x1406
137e: 6a 04 pushq $0x4
1380: 6a 04 pushq $0x4
1382: 4c 8b 48 70 mov 0x70(%rax),%r9
1386: e8 00 00 00 00 callq 138b <_mesa_UniformMatrix4fv+0x2b>
138b: 48 83 c4 28 add $0x28,%rsp
138f: c3 retq
Saves a measly 576 bytes of text on x64.
text data bss dec hex filename
6670131 228340 22552 6921023 699b3f lib/i965_dri.so before
6670131 228340 22552 6921023 699b3f lib/i965_dri.so after
6343924 293872 29880 6667676 65bd9c lib64/i965_dri.so before
6343348 293872 29880 6667100 65bb5c lib64/i965_dri.so after
v2: Rebase on GL_ARB_gpu_shader_fp64.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is C++, so we can mix code and declarations. Doing so allows
constification.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Plamena Manolova <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Fixes:
dEQP-VK.spirv_assembly.instruction.compute.opspecconstantop.vector_related
dEQP-VK.spirv_assembly.instruction.graphics.opspecconstantop.vector_related*
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Looking at the following bit of SPIRV shader :
...
%zero = OpConstant %i32 0
%ivec3_0 = OpConstantComposite %ivec3 %zero %zero %zero
%vec3_undef = OpUndef %ivec3
%sc_0 = OpSpecConstant %i32 0
%sc_1 = OpSpecConstant %i32 0
%sc_2 = OpSpecConstant %i32 0
...
Our compiler currently stops parsing variables & types on the OpUndef
and switches to instructions, leaving the following sc_[0-2] variables
untreated.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size of the pool is slightly smaller than the size of the
structure containing the whole pool. We need to take that into account
on when setting up the internals.
Fixes a crash due to out of bound memory access in:
dEQP-VK.api.descriptor_pool.out_of_pool_memory
v2: Drop debug traces (Lionel)
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0 13.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When trying to blit larger tiled surfaces, the pitch can be larger than
32768 bytes, which means it won't fit in a GLshort. Passing it in will
truncate the stride to 0, which has...surprising results.
The pitch can be up to 32,768 DWords, or 128kB. We measure it in bytes,
but divide by 4 when programming it. So we need to handle values up to
131,072. Switch from GLshort to int32_t to avoid the truncation.
Fixes GL45-CTS.gtf30.GL3Tests.depth_texture.depth_texture_copyteximage
at widths greater than 8192.
v2: Use int32_t as negative values can be used (Jason).
Cc: "17.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
SIMD16 compute shaders use a send(16) with mlen 1 for the EOT message,
using a source of g127 for the single register. With a UD type, this
supposedly could read g128, which doesn't exist, causing the simulator
to get cranky. Use a UW type to avoid this.
Cc: "17.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to GL_KHR_vulkan_glsl, the signature of subpassLoad() is:
gvec4 subpassLoad(gsubpassInput subpass);
gvec4 subpassLoad(gsubpassInputMS subpass, int sample);
So the multisampled case always receives an explicit sample index that we
should use. The current implementation was ignoring this parameter
and using gl_SampleID value instead.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*
Reviewed-by: Jason Ekstrand <[email protected]>
Cc: "17.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I hadn't bothered to set this bit because I figured it would just
paper over us getting the rectangle wrong. But it turns out that
there is a legitimate reason to use it, so let's do so.
The alternative would be to chop up 16k clears to multiple 8k clears,
which is pointlessly painful.
Cc: "17.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Nanley Chery <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jason Ekstranad <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Add a helper function, anv_get_queue_family_properties(), which fills the
struct. This patch reduces churn in the following patch that implements
vkGetPhysicalDeviceQueueFamilyProperties2KHR.
Reviewed-by: Jason Ekstranad <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Add a helper function, anv_get_image_format_properties(), which does all
the work and has a VkPhysicalDeviceImageFormatInfo2KHR parameter. This
patch reduces churn in the following patch that implements
vkGetPhysicalDeviceImageFormatProperties2KHR.
Reviewed-by: Jason Ekstranad <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The struct was deleted by:
commit efe9d1cde3340d3a9d17e5560b609a4fb839d43d
Author: Edward O'Callaghan <[email protected]>
Subject: anv: Clean up some unused variables
Unlike the original anv_common, the new one has a non-const pNext
pointer because we will use it for the output structs of
VK_KHR_get_physical_device_properties2.
v2:
- Retype pNext from void* to struct anv_common*.
Reviewed-by: Jason Ekstranad <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|
|
|
|
|
|
|
|
| |
This is a printf-like macro that prints a debug message to stderr when
built with DEBUG. If no DEBUG, then do nothing.
Reviewed-by: Jason Ekstranad <[email protected]>
Reviewed-by: Lionel Landwerlin <[email protected]>
|