| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we would consider any optimistically colored nodes for
spilling. However, spilling any optimistically colored nodes below the
node that we failed to color on the stack wouldn't help us make
progress, since it wouldn't help with allowing us to find a color for
the node currently failing to get colored. Only consider nodes
which were above the failing node on the stack for spilling, which
simplifies the logic, and comment the code better so people know what's
going on here.
No shader-db changes with BRW_MAX_GRF reduced to 90 (or with the normal
number of GRF's).
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can store the q total that pq_test() would've calculated in the node
itself, updating it when we add a node to the stack. This way, we only
have to walk the adjacency list when we push a node on the stack (i.e.
when the p, q test succeeds) instead of every time we do the p, q test.
No difference in shader-db run times, but I'm keeping this in because
the q total that it calculates will also be used in the next few commits.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, there were 3 entrypoints into parts of the actual allocator,
and an API called ra_allocate_no_spills() that called all 3. Nobody
would ever want to call any of the 3 entrypoints by themselves, so
everybody just used ra_allocate_no_spills(). So just make them static
functions, and while we're at it rename ra_allocate_no_spills() to
ra_allocate() since there's no equivalent "with spills," because the
backend is supposed to handle spilling.
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
This would try to allocate 0-sized bo's when the max level was below the
base level.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Francisco Jerez <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The ARB_gpu_shader5 extension is made up of a lot of small sub-parts.
Instead of adding PIPE_CAP's for each of these, just rely on the GLSL
version reported by the pipe driver. The remaining extensions lend
themselves naturally to being checked through a single CAP.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Will make it easier on us as CleanSpec.mk comes along and improves
consistency across the Android build.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
No longer needed as of last commit.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
| |
Will allow us to nuke an include or two from the drivers.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
| |
Saves us a few lines and brings us closer to the automake build.
Drop DRM_TOP as it's not longer used.
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
The variable i915_C_FILES changed to i915_FILES with commit
34d4216e641 back in mesa 9.1/9.2. Yet we've missed to update the
the android build, essentially creating an dummy/empty driver that
can never work.
Cc: "10.1 10.2" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the compiler is not capable/does not accept -msse4.1 while the target
has the instruction set we'll blow up as _mesa_streaming_load_memcpy is
going to be undefined.
To make sure that never happens, wrap the runtime cpu check+caller in an
ifdef thus do not compile that hunk of the code.
Fix the android build by enabling the optimisation and adding the define
where applicable.
v2: autoconf conditionals end with "fi" rather than endif.
v3: Wrap the definition and call to intel_miptree_{un,}map_movntdqa in
if defined(USE_SSE41). Spotted by Matt.
Cc: Matt Turner <[email protected]>
Cc: Adrian Negreanu <[email protected]>
Cc: "10.1 10.2" <[email protected]>
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
Reviewed-by: Brian Paul <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
| |
We now use the brw_eu_emit.c code instead.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Everything should be in place to unify code generation between Gen4-7
and Gen8+. We should be able to drop the Gen8 generators at this point.
However, leave them hooked up for a brief moment, for testing and
comparison purposes. Set GEN8=1 to use the old Gen8+ code generator
paths.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Kind of a moot point since we're deleting Gen8 code generation, but
this at least helps make it match the Gen4-7 code. It's probably more
reasonable than using float.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This improves performance in Trine 2 at 1280x720 (windowed) on "Very
High" settings by 30% (in the interactive menu) to 45% (in the forest
by the giant frog) on Haswell GT3e.
It also now generates the same assembly on Gen7 as it does on Gen8,
which always used the sampler for both types.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
I don't see any reason for this to exist.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the documentation, we need to set the source 0 register
type to IMM for flow control instructinos that have both JIP and UIP.
Out of paranoia, just make all flow control instructions use IMM;
there's no benefit to using ARF anyway, and it could trouble that's
difficult to diagnose.
See commit 9584959123b0453cf5313722357e3abb9f736aa7, which did the
analogous change in the gen8_generator code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
We're going to add a Gen8+ case shortly, which would need to duplicate
this code again. Instead, share it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we combine the Gen4-7 and Gen8+ generators, we'll need to handle
half float packing/unpacking functions somehow. The Gen8+ generator
code today just emulates the behavior of the Gen7 F32TO16/F16TO32
instructions, including the align16 mode bugs.
Rather than messing with fs_generator/vec4_generator, I decided to just
emulate the instructions at the brw_eu_emit.c layer.
v2: Change gen >= 7 asserts to gen == 7 (suggested by Chris Forbes).
Fix regressions on Haswell in VS tests due to type assertions.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Broadwell requires the number of vertices written by the geometry shader
to be specified in a separate register, as part of the terminating
message's payload.
This also means GS_OPCODE_THREAD_END needs to increment mlen.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
Either should work.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
g0.5 has nothing of value to contribute to m0.5. In both the VS and GS
payload, g0.5 contains the scratch space pointer - which is definitely
not of any use. The GS payload also contains FFTID, but the URB write
message header doesn't want FFTID.
The only reason I used OR was because Eric originally requested it.
On Broadwell, I used MOV, and that's worked out fine.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Haswell, we implement "discard" via predicated SEND messages, using
f0.1 instead of f0.0. To accomplish this, we set inst->flag_subreg to 1
on the FS_OPCODE_FB_WRITE.
Most instructions using fs_inst::flag_subreg expand to a single assembly
instruction. However, FS_OPCODE_FB_WRITE can generate several MOVs for
setting up header information. We don't want to set flag_subreg on
those, so override the default state back to 0.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
| |
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
| |
Comparing ~0u with a packed enum (i.e., 1 byte) always evaluates to
false. Shouldn't gcc warn about this?
Reported-by: Connor Abbott <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously the Meta implementation of glGetTexImage would fall back to
_mesa_get_teximage if the texturing is not using an unsigned normalised
format. However in order to support the half-float formats of BPTC textures we
can make it render to a floating-point renderbuffer instead. This patch makes
decompression_state have two FBOs, one for the GL_RGBA format and one for
GL_RGBA32F. If a floating-point texture is encountered it will try setting up
a floating-point FBO. It will now also check the status of the FBO and fall
back to _mesa_get_teximage if the FBO is not complete.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
Enables BPTC texture compression on the software rasterizer.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
Enables the BPTC extension on Gen>=7 and adds the necessary format mappings to
get the right surface type value.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Once we add BPTC texture support we will need to generate mipmaps for
compressed floating point textures too. Most of the code seems to already be
there but it just needs a few extra lines to get it to use GL_FLOAT instead of
GL_UNSIGNED_BYTE as the type for the temporary buffers.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds compressors for all four of the BPTC compressed-texture formats. The
compressor is written from scratch and takes a very simple approach. It always
uses a single mode of the BPTC format (4 for unorm and 3 for half-floats) and
picks the two endpoints by dividing the texels into those which have more or
less than the average luminance of the block and then calculating an average
color of the texels within each division.
It's probably not really sensible to try to use BPTC compression at runtime
because for example with the Nvidia offline compression tool it can take in
the order of an hour to compress a full-screen image. With that in mind I
don't think it's worth having a proper compressor in Mesa and this approach
gives reasonable results for a usage that is basically a corner case.
v2: Always use the custom compressor, even for the unorm formats. Fix the
quantization step for the half-float format compressor. Fixed a typo which
was breaking the right-hand edge of half-float textures with a width that
isn't a multiple of four.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Adds functions to fetch from any of the four BPTC-compressed formats.
v2: Set the alpha component to 1.0 when fetching from the half-float formats
instead of leaving it uninitialised. Don't linearize the alpha component
when fetching from sRGB.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the following four Mesa image format enums which correspond to the
four BPTC compressed texture formats:
MESA_FORMAT_BPTC_RGBA_UNORM
MESA_FORMAT_BPTC_SRGB_ALPHA_UNORM
MESA_FORMAT_BPTC_RGB_SIGNED_FLOAT
MESA_FORMAT_BPTC_RGB_UNSIGNED_FLOAT
It also updates the format information functions to handle these and the
corresponding GL enums.
v2: Also modify _mesa_get_format_color_encoding, _mesa_get_srgb_format_linear
and _mesa_get_uncompressed_format
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the ‘bptc’ layout to get_channel_bits. The channel bits for BPTC depend
on the mode but as it only has to be an approximation this sets it to 8 for
the two UNORM formats and 16 for the two half-float formats. These represent
the minimum number of bits of variation that can be generated by the
interpolation of the two formats.
This doesn't quite match what we do for S3TC which only returns 4 even though
it can similarly generate 8 bits from the interpolation. However it does match
what we return for ETC2. For reference, NVidia seems to return 8 bits for the
UNORM formats and 32 bits for the half-float formats.
v2: Change the number of bits to 8/8/8/8 for the UNORM formats and 16/16/16
for the half-float formats.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
If the name of a compressed texture format has ‘FLOAT’ in it it will now set
the data type of the format to GL_FLOAT. This will be needed for the BPTC
half-float formats.
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
The signed and unsigned half-float BPTC-compressed formats were being reported
as having a base format of GL_RGBA but they don't store an alpha channel so it
should be GL_RGB.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
This adds a boolean in the gl_extensions struct for
GL_ARB_texture_compression_bptc as well as an entry in extension_table.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
| |
To fix MSVC build failure.
Trivial.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the plain sampler index with a register reference to a sampler.
We also need to keep track of the sampler array size when there is a
relative reference so that we can mark the whole array used.
To facilitate implementation, we add a separate ADDR register that
exclusively handles the sampler relative address. Other approaches would
be more invasive.
Signed-off-by: Ilia Mirkin <[email protected]>
Acked-by: Marek Olšák <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the array index is not a constant expression, the existing support
will assume a zero offset (giving us the sampler index of the base of
the array).
For dynamically uniform indexing of sampler arrays, we need both that
and the indexing expression.
Signed-off-by: Chris Forbes <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
| |
Fixes non-termination in various Piglit tests.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This limit is fixed in Mesa core and cannot be changed.
It only affects ARB_vertex_program and ARB_fragment_program.
The minimum value for ARB_vertex_program is 1 according to the spec.
The maximum value for ARB_vertex_program is limited to 1 by Mesa core.
The value should be zero for ARB_fragment_program, because it doesn't
support ARL.
Finally, drivers shouldn't mess with these values arbitrarily.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
This computes all GL versions before any context is created.
It's a requirement for GLX_MESA_query_renderer.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
|
|
| |
So move it from dri_context to dri_screen.
This will be needed for version computations.
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
| |
Reviewed-by: Ilia Mirkin <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Ilia Mirkin <[email protected]>
|