| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Piglit quick tests including sqrt pass, no other regressions,
tested on radeon 6670.
Reviewed-by: Alex Deucher <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we know that the pool is defragmented, we positively know
that allocated + unallocated will be the total size of the
current pool plus all the items that will be promoted. So we only
need to grow the pool once.
This will allow us to just add the new items to the end of the
item_list without the need of looking for a place to the new item.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
| |
This way we can avoid defragmenting the pool, even if it is needed
to defragment it, and looping again through the list of unallocated
items.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new member to the pool to track its status.
For now it is used only for the 'fragmented' status, but if
needed it could be used for more statuses.
The pool will be considered fragmented if: An item that isn't
the last is freed or demoted.
This 'strategy' has a problem, although it shouldn't cause any bug.
If for example we have two items, A and B. We choose to free A first,
now the pool will have the 'fragmented' status. If we now free B,
the pool will retain its 'fragmented' status even if it isn't
fragmented.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
This new function will move items forward in the pool, so that
there's no gap between them, effectively defragmenting the pool.
For now this function is a bit dumb as it just moves items
forward without trying to see if other items in the pool could
fit in the gaps.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function will be used in the future by compute_memory_defrag
to move items forward in the pool.
It does so by first checking for overlaping ranges, if the ranges
don't overlap it will copy the contents directly. If they overlap
it will try first to make a temporary buffer, if this buffer fails
to allocate, it will finally fall back to a mapping.
Note that it will only be needed to move items forward, it only
checks for overlapping ranges in that case. If needed, it can
easily be added by changing the first if.
Reviewed-by: Tom Stellard <[email protected]>
|
|
|
|
|
|
|
|
| |
Actually what we currently handle is just the SCALED versions, and not
the int versions. The difference probably matters more when we actually
support integer in the compiler.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Teach new compiler scheduling and register assignment how to deal with
relative addressing. This gets us what we need to avoid falling back to
old compiler for CONST[ADDR[0].x+n]. It is also a prerequisite for temp
file relative addressing, although that is going to also need some
cleverness in register assignment to keep arrays grouped together.
NOTE: doing address calculation in full precision and then narrowing to
s16 in the mov to addr reg seems to sometimes cause lockups (and
sometimes work?!). It seems more reliable to do the address calculation
in s16, like the blob does. Which means teaching RA how to deal with
mixed half and full precision allocation. Fortunately that didn't turn
out to be too hard, so that is a nice bonus which we could probably take
better advantage of elsewhere.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Technically we should not need these. CP_LOAD_STATE can be pipelined.
But removing them broke a few piglit tests, like fbo-depth-
GL_DEPTH_COMPONENT24-readpixels. I expect these are just masking a
problem elsewhere, or perhaps they are only needed under some more
specific circumstances. But until that is understood properly, give
back a bit of the perf boost we got from c63450e8.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
| |
The kernel driver name is either "kgsl" (downstream/android) or "msm"
(upstream).
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds an implementation of the ClearTexSubImage driver entry point that tries
to set up an FBO to render to the texture and then calls glClearBuffer with a
scissor to perform the actual clear. If an FBO can't be created for the
texture then it will fall back to using _mesa_store_ClearTexSubImage.
When used in combination with _mesa_store_ClearTexSubImage this should provide
an implementation that works for all DRI-based drivers. However as this has
only been tested with the i965 driver it is currently only enabled there.
v2: Only enable the extension for the i965 driver instead of all DRI drivers.
Remove an unnecessary goto. Don't require GL_ARB_framebuffer_object. Add
some more comments.
v3: Use glClearBuffer* to avoid having to modify glClearColor and friends.
Handle sRGB textures. Explicitly disable dithering.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen at intel.com>
|
|
|
|
|
|
|
|
|
| |
The Meta implementation of glClearTexSubImage is going to want to ensure that
dithering is disabled so that it can get a consistent color across the whole
texture when clearing. This adds a state flag to easily save it and set it to
the default value when performing meta operations.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Adds an implmentation of the ClearTexSubImage driver entry point that just
maps the texture and writes the values in. The extension is not yet enabled by
default because it doesn't work with multisample textures as they don't have a
simple linear layout.
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This adds the driver entry point for glClearTexSubImage and fills in the
_mesa_ClearTexImage and _mesa_ClearTexSubImage functions that call it.
v2: Don't clear some of the images if only one of them makes an error
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
| |
In texture_error_check() there was a snippet of code to check whether the
given format and internal format are basically compatible. This has been split
out into its own static helper function so that it can be used by an
implementation of glClearTexImage too.
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Neil Roberts <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
Should reduce overhead because the caching buffer manager doesn't need to
consider buffers of the wrong type.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
since some bits are done in tree, but nobody is working on it anymore.
Reviewed-by: Chris Forbes <[email protected]>
Signed-off-by: Dave Airlie <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
because float depth texture data needs clamping to [0.0, 1.0]. Let the
_mesa_texstore() fallback to slower path.
Fixes Khronos GLES3 CTS tests:
shadow_execution_vert
shadow_execution_frag
V2: Move the check to _mesa_texstore_can_use_memcpy() function.
Add check for floating point data types.
Cc: <[email protected]>
Signed-off-by: Anuj Phogat <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
We actually want to use mov(16), not mov(8).
Fixes 7 Piglit tests: ARB_sample_shading/builtin-gl-sample-mask [2468]
and ARB_sample_shading/builtin-gl-sample-mask-simple [468].
Signed-off-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We might be able to do this without an extra program key field, but this
is non-invasive and fixes the bug, for now.
This fixes the following Piglit tests on Broadwell:
- ARB_sample_shading/builtin-gl-sample-id 2
- ARB_sample_shading/builtin-gl-sample-position 2
- EXT_framebuffer_multisample/multisample-blit 2 color
- EXT_framebuffer_multisample/multisample-blit 2 color linear
- EXT_framebuffer_multisample/multisample-blit 2 depth
- EXT_framebuffer_multisample/no-color 2 depth combined
- EXT_framebuffer_multisample/no-color 2 depth separate
- EXT_framebuffer_multisample/no-color 2 depth single
- EXT_framebuffer_multisample/no-color 2 depth-computed combined
- EXT_framebuffer_multisample/no-color 2 depth-computed separate
- EXT_framebuffer_multisample/no-color 2 depth-computed single
- EXT_framebuffer_multisample/unaligned-blit 2 color msaa
- EXT_framebuffer_multisample/unaligned-blit 2 depth msaa
Signed-off-by: Kenneth Graunke <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991
Reviewed-by: Matt Turner <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
| |
Otherwise, the performance warning for shader recompiles will just say
"something else".
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jordan Justen <[email protected]>
|
|
|
|
|
|
|
|
| |
It doesn't exist, so attempting to read it will trigger generation
assertions in the brw_inst API.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Printing the hex offsets makes it basically impossible to diff assembly:
if you add even a single instruction, the entire shader shows up as a
difference. So, every time I want to compare assembly, I have to strip
this out.
The hex offsets might be useful when debugging compaction, or when
inspecting the program cache buffer. Since it's occasionally useful,
but uncommon, this patch disables it by default, but makes it easy to
re-enable it temporarily when the need arises.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Avoids regenerating it unnecessarily.
Every program in shader-db improved, none by an amount less than a 1/3
reduction. One Dota2 shader decreased from 62 -> 24.
cfg calculations: 429492 -> 193197 (-55.02%)
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
| |
Will let us abstract how the instructions are stored.
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
| |
Reviewed-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
| |
The scratch buffer will be used for private memory and also register
spilling.
v2:
- Code cleanups
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
| |
The is used for programs that have arrays of constants that
are accessed using dynamic indices. The shader will compute
the base address of the constants and then access them using
SMRD instructions.
|
|
|
|
|
| |
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
brw_fs_visitor.cpp:2400:1: warning: unused parameter 'ir' [-Wunused-parameter]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The parameter is an int16_t, and we're check that it's value will fit in
16-bits. Yes, the value that is stored in 16-bits will surely fit in
16-bits.
brw_inst.h: In function 'brw_inst_set_gen6_jump_count':
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
brw_inst.h:321:66: warning: comparison is always true due to limited range of data type [-Wtype-limits]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
| |
brw_inst.h: In function 'brw_inst_set_src1_vstride':
brw_inst.h:118:76: warning: unused parameter 'brw' [-Wunused-parameter]
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Vinson Lee <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
| |
Before it was only storing one of the color components due to truncation.
With this patch it now properly stores all of them.
Reviewed-by: Brian Paul <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
| |
And add a news item.
|
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
| |
Unused.
Also inline util_set_vertex_buffers_count and simplify it.
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This removes the intermediate storage (pm4 state) and generates descriptors
directly in a staging buffer.
It also reduces the number of flushes, because the descriptors no longer
take CS space.
Reviewed-by: Michel Dänzer <[email protected]>
|