| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
move_uniform_array_access_to_pull_constants() and setup_pull_constants()
both have two parts:
1. Decide which UNIFORM registers to demote to pull constants, and
assign locations.
2. Mechanically rewrite the instruction stream to pull the uniform
value into a temporary VGRF and use that, eliminating the UNIFORM
file access.
In order to support pull constants in SIMD16 mode, we will need to make
decisions exactly once, but rewrite both instruction streams.
Separating these two tasks will make this easier.
This patch introduces a new helper, demote_pull_constants(), which
takes care of rewriting the instruction stream, in both cases.
For the moment, a single invocation of demote_pull_constants can't
safely handle both reladdr and non-reladdr tasks, since the two callers
still use different names for uniforms due to remove_dead_constants()
remapping of things. So, we get an ugly boolean parameter saying
which to do. This will go away.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
When demoting a variably indexed uniform array to pull constants, we
only recorded the location for the base of the array (element 0).
Recording locations for all array elements is a trivial amount of code
and will make subsequent refactoring easier.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, both move_uniform_array_access_to_pull_constants() and
setup_pull_constants() maintained stack-local arrays with this
information. Storing this information will allow it to be used from
multiple functions, allowing us to split and move code around.
We'll also eventually want to pass pull constant location information
to the SIMD16 compile. Saving this information will help us do that.
Unfortunately, the two functions *cannot* share the contents of the
array just yet. remove_dead_constants() renumbers all the UNIFORM
registers to be contiguous starting at zero, so the two functions
talk about uniforms using different names. We can't even remap them,
since move_uniform_array_access_to_pull_constants() deletes UNIFORM
registers that are only accessed with reladdr, so remove_dead_constants
can't even see them.
This situation will improve in the next few patches.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The SIMD8 compile will determine whether pull parameters are necessary.
If so, it will set prog_data->nr_pull_params to a value greater than 0.
brw_wm_fs_emit checks if nr_pull_params > 0 and skips the SIMD16 compile
altogether. So, this code should never occur.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, nothing uses live intervals at this point, so this isn't
necessary. However, dump_instructions() calculates them and uses them
to show register pressure. So, calling dump_instructions() in this area
of the code would segfault due to the arrays being the wrong size.
This is not a candidate for stable branches because it only serves to
fix internal debugging code that you manually have to invoke by altering
the source code or using gdb.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, dump_instruction() would print output such as:
{ 2} 3: mov vgrf1:F, u0:F
{ 3} 4: mov vgrf7:F, u0:F
{ 4} 5: mov vgrf8:F, u0:F
which looked like either a scalar access or perhaps a constant-indexed
access of element 0, when it was really a variable index.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In commit e57d77280efcbfd6579a88f071426653287ef833, I fixed this for
destinations in the Vec4 backend, and sources in the scalar backend.
But not both types in both backends.
To prevent this mess from continuing, make the reg_encoding table
static, so only the disassembler can use it.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
opt_saturate_propagation_local compares scan_inst->dst.reg/reg_offset
with inst->src[0].reg/reg_offset, and ensures that scan_inst->dst.file
is GRF. But nothing ensured that inst->src[0].file was GRF.
In the following program, this resulted in u1:F matching vgrf1:UW,
and a saturate being incorrectly propagated from instruction 8 to
instruction 1.
{ 1} 0: add vgrf0:UW, hw_reg1+8:UW, hw_reg0:V
{ 1} 1: add vgrf1:UW, hw_reg1+10:UW, hw_reg0:V
{ 1} 2: linterp vgrf6:F, hw_reg2:F, hw_reg3:F, hw_reg0:F
{ 2} 3: linterp vgrf27:F, hw_reg2:F, hw_reg3:F, hw_reg0+16:F
{ 4} 4: mov vgrf10+0.0:F, vgrf6:F
{ 3} 5: mov vgrf10+1.0:F, vgrf27:F
{ 6} 6: tex vgrf8+0.0:F, vgrf10+0.0:F
{ 5} 7: mov vgrf32:F, u1:F
{ 5} 8: mov.sat vgrf12:F, u1:F
From shader-db:
total instructions in shared programs: 1841932 -> 1841957 (0.00%)
instructions in affected programs: 5823 -> 5848 (0.43%)
I inspected two of the 25 hurt shaders, and concluded that they were
both hitting this bug, and not legitimately optimized.
This fixes bugs in Left 4 Dead 2 and Team Fortress 2, possibly among
others. The optimization pass didn't exist in 10.0, so this is only
a candidate for 10.1.
Cc: "10.1" <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out we can allow COHERENT storage/mappings all the time,
regardless of LLC vs non-LLC. It just means never using temporary
mappings to avoid GPU stalls, and on non-LLC we have to use the GTT intead
of CPU mappings. If we were to use CPU maps on non-LLC (which might be
useful if apps end up using buffer_storage on PBO reads, to avoid WC read
slowness), those would be PERSISTENT but not COHERENT, but doing that
would require us driving the clflushes from userspace somehow.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
It looks like there's no big difference for write-only workloads, but
using a CPU map means that if they happen to read without having set the
MAP_READ_BIT, they get 100x the performance for those reads.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
While in expected usage patterns nobody will ever hit this path, doubling
our bandwidth used seems like a waste, and it cost us extra code too.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
On LLC, it should always be better to use a cached mapping than the GTT.
On non-LLC, it seems pretty silly to try to optimize read performance for
the INVALIDATE_RANGE_BIT case. This will make the buffer_storage logic
easier.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Anuj Phogat <[email protected]>
Tested-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This extension provides a way for an application to render to multiple
surfaces with different buffer formats without having to use multiple
contexts. An EGLContext can be created without an EGLConfig by passing
EGL_NO_CONFIG_MESA. In that case there are no restrictions on the surfaces
that can be used with the context apart from that they must be using the same
EGLDisplay.
_mesa_initialze_context can now take a NULL gl_config which will mark the
context as ‘configless’. It will memset the visual to zero in that case.
Previously the i965 and i915 drivers were explicitly creating a zeroed visual
whenever 0 is passed for the EGLConfig. Mesa needs to be aware that the
context is configless because it affects the initial value to use for
glDrawBuffer. The first time the context is bound it will set the initial
value for configless contexts depending on whether the framebuffer used is
double-buffered.
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
| |
The few paths that were playing with framebuffers and renderbuffer were
saving and restoring them.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This was being applied in a subset of the places that
intel_prepare_render() was called, to set the same flag that
intel_prepare_render() was setting.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The flag wasn't getting updated correctly when the ctx->DrawBuffer or
ctx->ReadBuffer changed. It usually ended up working out because most
apps only have one window system framebuffer, or if they have more than
one and they have any front read/drawing, they will have called
glReadBuffer()/glDrawBuffer() on it when they get started on the new
buffer.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It's the ctx->ReadBuffer that gets read from, not the ctx->DrawBuffer.
So, if you happened to have a ctx->ReadBuffer that was the winsys buffer,
and it had previously been intel_prepare_render()ed but not invalidated
since then, and you called glReadBuffer() to switch to front buffer
instead of back buffer reading on the winsys fbo while your drawbuffer was
a user FBO, you'd never get the front buffer's miptree fetched, and
segfault.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Static and shared builds were possible in the good old days
of static makefiles. Currently the build system does not
distinguish nor does anything special when one requests a
static build.
Print a warning message for the packager that static builds
are not supported and continue building shared libs.
Currently only Debian and derivatives use static build, and
they use it for building a Xlib powered libGL. This patch
will only change the warning message they are seeing but
the binaries produced will be identical.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jon TURNEY <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The primary users of these are linux developers, although
it can be extended for *BSD and others if needed.
Fixes make install for Cygwin and OpenBSD at least.
v2:
- Wrap vdpau targets as well.
v3:
- Fold HAVE_COMPAT_SYMLINKS conditional within install*links.mk
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=63269
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jon TURNEY <[email protected]> (v1)
Reviewed-by: Christian König <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some platforms different library extension - dll, dylib, a.
Honor that when we are creating the required links.
Rename LIB_EXTENSION to LIB_EXT while we're here.
With libglapi linking aside, building classic drivers on
non-linux platforms should be possible now.
v2: Resolve conflicts.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jon TURNEY <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the cases where one links against the static glapi.la there
is no need to create temporary variables only to explicitly
link agaist it.
Instead use SHARED_GLAPI_LIB to explicitly indicate when one
is building and linking with the shared glapi provider.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jon TURNEY <[email protected]>
|
|
|
|
|
|
|
| |
Use the handy script and minimise the boilerplate in the makefiles.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jon TURNEY <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
There is little gain in printing whenever a folder is created.
v2:
- Use $(AM_V_at) over @ to have control in verbose builds.
Suggested by Erik Faye-Lund.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jon TURNEY <[email protected]>
|
|
|
|
|
|
|
| |
Use the automake predefined macro over hardcoding mkdir -p everywhere.
Signed-off-by: Emil Velikov <[email protected]>
Reviewed-by: Jon TURNEY <[email protected]>
|
|
|
|
|
|
|
| |
The non-ARB versions take GLuint ids, not GLhandleARB.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
| |
To follow the example of MESA_FORMAT_Z24_UNORM_X8_UINT.
Reviewed-by: Michel Dänzer <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Removes unnecessary MOV instructions in L4D2, TF2, Dota2, and many other
Steam games.
total instructions in shared programs: 1668126 -> 1657509 (-0.64%)
instructions in affected programs: 242235 -> 231618 (-4.38%)
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
| |
<4,1,1> isn't a real thing. We meant <4,4,1>, i.e., each component of
the whole register.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This keeps us from needing to reemit all the other stage state just
because a surface changed.
Improves unoptimized glamor x11perf -f8text by 1.10201% +/- 0.489869%
(n=296). [v1]
v2:
- Drop binding table packets from Gen8 unit state as well.
- Pass _3DSTATE_BINDING_TABLE_POINTERS_XS to brw_upload_binding_table,
cutting even more code.
v3: Don't forget to drop them from 3DSTATE_GS (botched refactor in v2).
Signed-off-by: Eric Anholt <[email protected]> [v1]
Reviewed-by: Kenneth Graunke <[email protected]> [v1]
Signed-off-by: Kenneth Graunke <[email protected]> [v2, v3]
Reviewed-by: Eric Anholt <[email protected]> [v3]
|
|
|
|
|
|
|
|
| |
This makes both the empty and non-empty binding table paths exit through
the bottom of the function, which gives us a place to share code.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I don't know how many people care about this case, but it's easy enough
to do, so we may as well. The tricky part is that for some reason Mesa
stores the number of array slices in Height, not Depth.
I thought the easiest way to handle that here was to make Height = 1
(the actual height), and srcDepth = srcImage->Height. This requires
some munging when calling _mesa_prepare_mipmap_level, so I created a
wrapper that sorts it out for us.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is equivalent for now, and will differ once we add 1DArray support.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is largely a matter of looping over the number of slices/layers,
and not minifying depth (presumably that code exists for the unfinished
3D texture support).
Normally, I would have made the loop over array slices the outermost
loop. I suspect that would make it trickier to support 3D textures
someday, though, so I didn't. The advantage is that we would only have
one BufferData call per slice, rather than one per miplevel and slice.
However, a GenerateMipmaps microbenchmark indicates that either way is
basically just as fast. So I'm not sure it's worth bothering.
Improves performance in a GenerateMipmaps microbenchmark by nearly 5x.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
For array textures and 3D textures, this represents the layer to use.
Just pass 0 for now.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Almost the exact same code appeared twice, and it needs to expand to
handle additional texture targets. Refactor it to tidy up the code and
avoid duplicating more work in the future.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is what the macro is for.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
fallback_required() already creates the FBO in order to check whether we
can render to the format. So it's guaranteed to exist.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This doesn't interact with the GL API, so we shouldn't use GL types.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
This was only ever used in one place; there's no reason for it to be
non-static.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Putting the implementation of each GL function in its own file makes it
much easier not to get lost.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This will be used in multiple files soon.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Anuj Phogat <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We've had several problems now with FinishRenderTexture not getting called
enough, and we're ready to just give up on it ever doing what we need. In
particular, an upcoming Steam title had rendering bugs that could be fixed
by always_flush_cache=true.
Instead of hoping Mesa core can figure out when we need to flush our
caches, just track what BOs we've rendered to in a set, and when we render
from a BO in that set, emit a flush and clear the set.
There's some overhead to keeping this set, but most of that is just
hashing the pointer -- it turns out our set never even gets very large,
because cache flushes are so common (even on cairo-gl).
No statistically significant performance difference in cairo-gl (n=100),
despite spending ~.5% CPU in these set operations.
v1: (Original patch by Eric Anholt.)
v2: (Changes by Ken Graunke.)
- Rebase forward from May 7th 2013 -> March 4th 2014.
- Drop the FinishRenderTexture hook entirely; after rebasing the
patch, the hook was just an empty function.
- Move the brw_render_cache_set_clear() call from
intel_batchbuffer_emit_flush() to brw_emit_pipe_control_flush().
In theory, this could catch more cases where we've flushed.
- Consider stencil as a possible texturing source.
v3: (changes by anholt):
- Move set_clear() back to emit_mi_flush() -- it means we can drop
more forced flushes from the code. In the previous location, it
wouldn't have been called when we wanted pre-gen6.
- Move the set clear from batch init to reset -- it should be empty at
the start of every batch, since the kernel handled any inter-batch
flush for us.
v4: Drop the debug code in set.c that I accidentally committed.
Signed-off-by: Eric Anholt <[email protected]>
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Tested-by: Dylan Baker <[email protected]> [v2]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We need the header setup to not be predicated on which pixels are
undiscarded. I'm not sure originally if I had thought that the mask
disable implied predicate disable, or if I had just misread the mask
disable as predicate disable. Either way, I know I had spent more time
thinking about this in the gen8 generator than the gen7 generator.
Plus, it turns out that I had mis-implemented the "the GPU will use the
predicate unless this header is present" comment, by skipping setting up
the pixel mask when the header was present.
Fixes GPU hangs in piglit glsl-fs-discard-mrt, Trine, Trine 2 and
preusmably MLL.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=75207
Tested-by: Tapani Pälli <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It was really only used in the radeon driver for a debug printf.
And evidently, libGL.so referenced it just to work around some sort
of linker issue.
This patch removes the two calls to the function and the function
itself.
Fixes undefined _glthread_GetID symbol in libGL reported by 'nm'.
Though, the missing symbol doesn't cause any issues on my system but
it does cause glxinfo to fail on one of our test systems.
Reviewed-by: Jose Fonseca <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Most of the logic refers to the local variable 'mt' directly but
a few cases use 'intelObj->mt' instead. These are the same for
now but will be different once stencil miptree gets used.
v2 (Ian): fixed also indentation in surrounding lines
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
|