| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch causes 2 regressions in khronos' gles cts tests
on various intel platforms.
Failing tests:
ES3-CTS.functional.state_query.integers.viewport_getinteger
ES3-CTS.functional.state_query.integers.viewport_getfloat
Here is an explanation of what's causing the failures:
CTS tests are not clamping the x, y location of the viewport's
bottom-left corner as recommended by ARB_viewport_array and
OES_viewport_array:
"The location of the viewport's bottom-left corner, given by (x,y), are
clamped to be within the implementation-dependent viewport bounds range.
The viewport bounds range [min, max] tuple may be determined by
calling GetFloatv with the symbolic constant VIEWPORT_BOUNDS_RANGE_OES"
Khronos CTS merge request to fix the test case:
https://gitlab.khronos.org/opengl/cts/merge_requests/399
V2: Initialize the relevant variables for GL_OES_viewport_array on gen8+
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In core profile, we support up to 16 viewports. However, in the
majority of cases, only 1 of them is actually used - we only need
the others if the last shader stage prior to the rasterizer writes
gl_ViewportIndex.
Processing all 16 viewports adds additional CPU overhead, which hurts
CPU-intensive workloads such as Glamor. This meant that switching to
core profile actually penalized Glamor to an extent, which is
unfortunate.
This patch tracks the number of relevant viewports, switching between
1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new
BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting
viewport state when switching, but hopefully this is offset by doing
1/16th of the work in the common case. The new flag is also lighter
weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case.
According to Eric Anholt, x11perf -copypixwin10 performance improves by
11.5094% +/- 3.10841% (n=10) on his Skylake.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Acked-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we have gen_device_info mutable, we can update its values and drop
all copies we had in brw_context.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make gen_device_info a mutable structure so we can update the fields that
can be refined by querying the kernel (like subslices and EU numbers).
This patch does not make any functional change, it just makes
gen_get_device_info() fill a structure rather than returning a const
pointer.
Signed-off-by: Lionel Landwerlin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
"intelScreen" is wordy and also doesn't fit our style guidelines.
"screen" is shorter, which is nice, because we use it fairly often.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I want to use "screen" as the variable name for a struct intel_screen
pointer. This means that we can't use it for __DRIscreen pointers.
Sometimes we called it "screen", sometimes "sPriv", sometimes
"driScrnPriv", and sometimes "psp" (Pointer to Screen Private?).
The last one is particularly confusing because we use "psp" to refer to
the Gen4 PIPELINED_STATE_POINTERS packet as well.
Let's be consistent. "dri_screen" is clear, and it's not used often
enough that I'm worried about the verbosity.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 6ba88bce64b343761aabe3a6c7ee285c6020a959. The commit
was erroneous because GL has a separate limit, GL_MAX_FRAMEBUFFER_LAYERS
which guards the number of layers you are allowed to render into.
The GL 4.5 spec says:
"The framebuffer attachment point attachment is said to be framebuffer
attachment complete if [...] all of the following conditions are true:
[...]
If image is a three-dimensional, one- or two-dimensional array, or
cube map array texture and the attachment is layered, the depth or
layer count of the texture is less than or equal to the value of the
implementation-dependent limit MAX_FRAMEBUFFER_LAYERS."
and goes on to say that "framebuffer complete" requires all attachments to
be "framebuffer attachment complete".
On Sandy Bridge, we set GL_MAX_FRAMEBUFFER_LAYERS to 512 so creating a 3D
texture bigger than 512 is fine; you just can't render into all of the
slices at once.
Fixes ES3-CTS.gtf.GL3Tests.npot_textures.npot_tex_image on Sandy Bridge
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Original commit added documentation explaining lossless compression
case:
commit 56f29911ec9da25c78fbd3d4945d499e65ca4b5a
Author: Topi Pohjolainen <[email protected]>
Date: Tue Feb 2 10:00:41 2016 +0200
i965: Add a flag telling color resolve pass to ignore CCS_E
It, however, easily gives the impression that the sole purpose
of the intel_miptree_resolve_color() is to address lossless
compression. Original intention is to document the lack of
INTEL_MIPTREE_IGNORE_CCS_E flag given for the resolve call.
This patch fixes this along with a typo found spotted further
down.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v3:
- Actually set the flags when needed instead of falsely
overwriting them (Jason).
- Use more generic name for flag (dropped RENDERBUFFER)
- Consult also shader images
v4:
- Consult only lossless compressd shader images
v5:
- Check the existence of renderbuffer before considering
if it matches the given miptree
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Generated by:
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp
sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
thread_width_max in the GPGPU walker command limits us to a maximum of
64 threads.
This fixes a crash on Haswell in the OpenGLES 3.1 conformance test
suite which tests the advertised limits of the max invocation counts.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From about kernel 4.9, GTT mmaps are virtually unlimited. A new
parameter, I915_PARAM_MMAP_GTT_VERSION, is added to advertise the
feature so query it and use it to avoid limiting tiled allocations to
only fit within the mappable aperture.
A couple of caveats:
- fence support is still limited by stride to 262144 and the stride
needs to be a multiple of tile_width (as before, and same limitation as
the current 3D pipeline in hardware)
- the max_gtt_map_object_size forcing untiled may be hiding a few bugs
in handling of large objects, though none were spotted in piglits.
See kernel commit 4cc6907501ed ("drm/i915: Add I915_PARAM_MMAP_GTT_VERSION
to advertise unlimited mmaps").
v2: Include some commentary on mmap virtual space vs CPU addressable
space.
Signed-off-by: Chris Wilson <[email protected]>
Cc: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Daniel Vetter <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is required because the sampler unit used to fetch from the
framebuffer is unable to interpret non-color-compressed fast-cleared
single-sample texture data. Roughly the same limitation applies for
surfaces bound to texture or image units, but unlike texture sampling,
non-coherent framebuffer fetch is by definition non-coherent with
previous rendering, so the brw_render_cache_set_check_flush() call can
be omitted except after resolve.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some games are sloppy.. perhaps because it is defined behavior for DX or
perhaps because nv blob driver defaults things to zero.
So add driconf param to force uninitialized variables to default to zero.
This issue was observed with rust, from steam store. But has surfaced
elsewhere in the past.
Signed-off-by: Rob Clark <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are two distinctly different uses of this struct. The first
is to store GL shader objects. The second is to store information
about a shader stage thats been linked.
The two uses actually share few fields and there is clearly confusion
about their use. For example the linked shaders map one to one with
a program so can simply be destroyed along with the program. However
previously we were calling reference counting on the linked shaders.
We were also creating linked shaders with a name even though it
is always 0 and called the driver version of the _mesa_new_shader()
function unnecessarily for GL shader objects.
Acked-by: Iago Toral Quiroga <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The RenderTargetViewExtent field of RENDER_SURFACE_STATE is supposed to be
set to the depth of a 3-D texture when rendering. Unfortunatley, that
field is only 9 bits on Sandy Bridge and prior so we can't actually bind
a 3-D texturing for rendering if it has depth > 512. On Ivy Bridge, this
field was bumpped to 11 bits so we can go all the way up to 2048. On Iron
Lake and prior, we don't support layered rendering and we use OffsetX/Y
hacks to render to particular layers so 2048 is ok there too.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Cc: "11.1 11.2 12.0" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We still need to recompile the passthrough shader when this value
changes, as it also affects the output vertex count. But otherwise,
we can eliminate recompiles on Gen8+.
We probably want to do this for Gen7 as well, but that requires
rewriting the input release code to use a loop, which is a trade-off
I'd need to consider in more detail.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes three GL44-CTS.tessellation_shader subtests:
- max_patch_vertices
- single.max_patch_vertices
- tessellation_control_to_tessellation_evaluation.gl_PatchVerticesIn
These use gl_PatchVerticesIn in the TES, but don't link against a
TCS (which would allow the linker to lower it to a constant). We had
no handling for the system value in the backend, so it would just
assert fail.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Alejandro Piñeiro <[email protected]>
Cc: [email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The images struct is an uninitialized local variable on the stack. If the
callback returns 0, the struct might not have been updated and so should
be considered uninitialized. Currently the code ignores the return value,
which (depending on stack contents) might end up in reading a non-zero
value from images.image_mask and dereferencing further fields.
Another solution would be to initialize image_mask with 0, but checking
the return value seems more sensible and it is what Gallium is doing.
v2: fix typos in commit message,
fix indentation,
remove unnecessary parentheses and pointer dereference to keep line
length reasonable.
Cc: 11.2 12.0 <[email protected]>
Signed-off-by: Tomasz Figa <[email protected]>
Reviewed-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This was removed in d9546b0c5d and replced with the precise_trig driconf
option. However, we still need precise trig in the Vulkan driver so this
commit brings back the environment variable and compiler->precise_trig is
effectively the logical OR of the two.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=96484
Reviewed-by: Kenneth Graunke <[email protected]>
Cc: "12.0" <[email protected]>
|
|
|
|
|
|
|
| |
These need to be freed too.
Cc: "12.0" <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this change, to enable precise SIN and COS instructions
on Intel hardware, one can put
<option name="precise_trig" value="true"/>
in the proper drirc file.
V2: Make option name more generic
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Stephane Marchesin <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The cross thread constant support appears on Haswell. It allows us to
upload a set of uniform data for all threads without duplicating it
per thread.
One complication is that cross-thread constants are loaded into
registers before per-thread constants. Previously, our local IDs were
loaded before the uniform data and treated as 'payload' data, even
though they were actually pushed into the registers like the other
uniform data.
Therefore, in this patch we simultaneously enable a newer layout where
each thread now uses a single uniform slot for a unique local ID for
the thread. This uniform is handled specially to make sure it is added
last into the uniform push constant registers. This minimizes our
usage of push constant registers, and maximizes our ability to use
cross-thread constants for registers.
To swap from the old to the new layout, we also need to flip some
lowering pass switches to let our driver handle the lowering instead.
We also no longer force thread_local_id_index to -1.
v4:
* Minimize size of patch that switches from the old local ID layout
to the new layout (Jason)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
v2:
* Move lower flag to context constants. (Ken)
Cc: "12.0" <[email protected]>
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]> (v1)
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes rendering in Shadow of Mordor with rbc. Application writes
RGBA_UNORM texture filling it with values the application wants to
later on treat as SRGB_ALPHA.
Intel driver enables lossless compression for the buffer by the time
of writing. However, the driver fails to make sure the buffer can be
sampled as something else later on and unfortunately there is
restriction in the hardware for using lossless compression for srgb
formats which looks to extend itself to the sampling engine also.
Requesting srgb to linear conversion on top of compressed buffer
results the color values to be pretty much garbage.
Fortunately none of tracked benchmarks showed a regression with
this.
v2 (Matt): Add missing space
Cc: "12.0" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This should have the side effect of enabling the ARB_compute_shader
extension on Gen8+ hardware and all Gen7 platforms that didn't
previously expose it (VLV and IVB GT1) due to the number of hardware
threads per subslice being insufficient in SIMD16 mode.
v2: Bump workgroup size limit for GLES too (Jordan).
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Technically, this was introduced with GL 4.4. However, I believe it
was intended to be retroactive. As far as I know, AMD has never
supported primitive restart with patches, while NVidia and Intel do.
This necessitated the need for a query which would allow applications
to figure out whether this was usable or not.
I decided to expose it everywhere ARB_tessellation_shader is exposed.
(It's also in both OES and EXT_tessellation_shader.)
Enable this for i965 and Gallium drivers which expose the capability.
v2: Fix a bug in the state_tracker code (caught by Ilia Mirkin).
Bugzilla: https://cvs.khronos.org/bugzilla/show_bug.cgi?id=10364
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
We used to use a meta path on gen8 but we haven't since c7cf17ae758. We
might as well delete the meta path since blorp works on all gens.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
My old implementation accumulated <start, end> pairs in a buffer,
and eventually processed that data on the CPU. This meant flushing
the batchbuffer and waiting for it to completely execute before we
could map it, resulting in really long stalls. We could also run out
of space in the buffer, and have to do this early.
Instead, we can use Haswell's MI_MATH command to do the (end - start)
subtraction, as well as the multiplication by 2 or 3 to convert from
the number of primitives written to the number of vertices written.
We still need to CS stall to read the counters, but otherwise everything
is completely pipelined - there's no CPU<->GPU synchronization required.
It also uses only 80 bytes in the buffer, no matter what.
Improves performance in Manhattan on Skylake GT3e at 800x600 by
6.1086% +/- 0.954166% (n=9). At 1920x1080, improves performance
by 2.82103% +/- 0.148596% (n=84).
v2: Fix number of primitives -> number of vertices calculation for
GL_TRIANGLES (I was multiplying by 4 instead of 3.) Caught by
Jordan Justen.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
v2:
* Declare loop index variable at loop site (idr)
* Make arrays of MI_MATH instructions 'static const' (idr)
* Remove commented debug code (idr)
* Updated comment in set_query_availability (Ken)
* Replace switch with if/else in hsw_result_to_gpr0 (Ken)
* Only divide GL_FRAGMENT_SHADER_INVOCATIONS_ARB by 4 on
hsw and gen8 (Ken)
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
It is incorrect to assume BGRA byte order for the GLES3 sRGB workaround.
v2: use _mesa_get_srgb_format_linear to handle all formats
Signed-off-by: Haixia Shi <[email protected]>
Reviewed-by: Stéphane Marchesin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"Braswell" is a Cherryview based *thing*. It unfortunately requires extra
information to determine its marketing name. Unlike all previous products, and
hopefully all future ones, there is no unique 1:1 mapping of PCI device ID to
brand string.
I put up a fight about adding any complexity to our GL renderer string code for
a very long time. However, a wise man made a comment to me that I couldn't argue
with: if a user installs Windows on their hardware, the brand string should be
the same as what we display in Linux. The Windows driver apparently does this
check, so we should too.
Note that I did manage to find a good use for this info anyway in the compute
shader thread counts.
v2: memcpy instead of strncpy, and some minor changes (Matt)
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Jordan Justen <[email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The way we are organizing this code, the statically configured max_cs_threads
should always be the minimum value we actually support (ie. are aware of). As a
result, we can fall back to that if we get invalid numbers from the kernel (ie.
when the query succeeds, but the result is lower than expected).
I was originally planning to use an assert, but there is no reason to be so
mean.
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Jordan Justen <[email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With the previous patches, the code can find out the actual number of available
compute threads. It is enabled only for Cherryview since that is the only
platform I know for a fact has shipped devices which can benefit from this. It
seems like other platforms /might/ benefit from this because of fused
configurations which /might/ have shipped. Fallback code is still there.
v2: Some minor adjustments from Matt
Signed-off-by: Ben Widawsky <[email protected]>
Reviewed-by: Jordan Justen <[email protected]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
dEQP-GLES31.functional.fbo.no_attachments.maximums.{all,height,size,width}
started hitting assertion failures when emitting SURFACE_STATE, after
commit e8fd60e7891c7 where Samuel increased the maximum viewport size to
32768, from 16384.
MaxFramebufferWidth/Height were being set to the maximum viewport size,
but are actually limited by the SURFACE_STATE width/height field range,
which is 16384 on Gen7+ (where ARB_framebuffer_no_attachments is
exposed). So, reduce these to 16384 explicitly.
Fixes assert fails in the above mentioned dEQP tests. (Those tests
still fail, however.)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This effectively disables old QuerySamplesForFormat driver hook, since it is
never called by Mesa anymore.
v2: Call brw_query_samples_for_format() with a dummy buffer to calculate num
samples, to avoid modifying the original buffer.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
Now that there is a dedicated source file for internal format queries, this
function belongs there.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
| |
By default, we call back the driver's hook fallback function that has generic
implementations for the all the queries.
Reviewed-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
From ARB_viewport_array spec:
" * On GL3-capable hardware the VIEWPORT_BOUNDS_RANGE should be at least
[-16384, 16383].
* On GL4-capable hardware the VIEWPORT_BOUNDS_RANGE should be at least
[-32768, 32767]."
This range is set using ctx->Const.MaxViewportWidth value, so just bump
those constants to 32k for gen7+ which can support OpenGL 4.0.
Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
See Ivy Bridge PRM, Volume 2, Part 2, 1.8.4 INTERFACE_DESCRIPTOR_DATA:
DWORD 5, bits 20:16: "This field indicates how much shared local
memory the thread group requires. The amount is specified in 4k
blocks, but only powers of 2 are allowed: 0, 4k, 8k, 16k, 32k and 64k
per half-slice."
For Haswell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA:
DWORD 5, bits 20:16: With text identical to the Ivy Bridge PRM.
For Broadwell, see Volume 2d, INTERFACE_DESCRIPTOR_DATA:
DWORD 6, bits 20:16: With text identical to the Ivy Bridge PRM.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression (intel_miptree_is_lossless_compressed()).
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
| |
v2 (Ben): Use combination of msaa_layout and number of samples
instead of introducing explicit type for lossless
compression (intel_miptree_is_lossless_compressed()).
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
MAX_COMPUTE_SHARED_SIZE should be set to 32768. This fixes a regression
introduced in be27f77 (mesa: do not use a constant for
MAX_COMPUTE_SHARED_SIZE).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94139
Signed-off-by: Samuel Pitoiset <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until now there has been only one type of color buffer that needs
to resolved - namely single sampled fast clear. As even the
sampler engine in GPU doesn't understand the associated meta data,
the color values need to be always resolved prior to reading them.
From SKL onwards there is new scheme supported called the lossless
compression of single sampled color buffers. This is something that
is understood by the sampling engine and therefore resolving of
these types of buffers is not necessary before sampling.
This patch adds means to make the distinction when considering if
resolve is needed.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ben Widawsky <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the framebuffer default sample count was taken directly
from the value given by the application. On the i965 driver on HSW if
the value wasn't one that is supported by the hardware it would hit an
assert when it tried to program the state for it. This patch fixes it
by adding a derived sample count to the state for the default
framebuffer. The driver can then quantize this to one of the valid
values in its UpdateState handler when the _NEW_BUFFERS state changes.
_mesa_geometric_samples is changed to use the new derived value.
Fixes the piglit test arb_framebuffer_no_attachments-query
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93957
Cc: Ilia Mirkin <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|