| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
| |
Free up unneeded LLVM stuff immediately after generating vertex shader
code. Saves about 500K per shader.
v2: Don't bother calling gallivm_free_function (Jose)
Signed-off-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split free_gallivm_state() into two steps. First step is
gallivm_free_ir() which cleans up the LLVM scaffolding used to generate
code while preserving the code itself. Second step is
gallivm_free_code() to free the memory occupied by the code.
v2: s/gallivm_teardown/gallivm_free_ir/ (Jose)
Signed-off-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a JITMemoryManager derivative which puts all generated code into
one memory pool instead of creating a new one each time code is generated.
This saves significant memory per shader as the pool size is 512K and
a small shader occupies just several K.
This memory manager also defers freeing generated code until you tell
it to do so, making it possible to destroy the LLVM engine while keeping
the code, thus enabling future memory savings.
v2: Fix compilation errors with LLVM 3.4 (Jose)
Signed-off-by: José Fonseca <[email protected]>
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
This is how it is meant to be done nowadays.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
I saw that LLVM internally uses its global context for some things, even
when we use our own. Given ours is also global, might as well use
LLVM's.
However, sepearate contexts can still be enabled with a simple source
code modification, for when the need/benefit arises.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Nowadays LLVMModuleProviderRef is just an alias for LLVMModuleRef, so
its use just causes unnecessary confusion.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
|
| |
Older versions haven't been tested probably don't work anyway. But more
importantly, code supporting it is hindering further work.
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
| |
Reviewed-by: Roland Scheidegger <[email protected]>
|
|
|
|
|
|
| |
Support for prior versions will be removed in the following change.
Reviewed-by: Roland Scheidegger <[email protected]>
|
| |
|
|
|
|
|
|
|
|
| |
It should compare with it's own size.
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Many instructions implicitly update the accumulator on Gen < 6. The instruction
scheduling code just calls add_barrier_deps() for each accumulator access on
these platforms, but a large class of operations don't actually update the
accumulator -- mostly move and logical instructions. Teaching the scheduling
code about this would allow more flexibility to schedule instructions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77740
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The M_PI*f macros used a preprocessor paste to append 'f'
to M_PI defines, which works if the values are only numbers
but breaks on OpenBSD where M_PI definitions have casts
and brackets to meet requirements of a future version of POSIX,
http://austingroupbugs.net/view.php?id=801
http://austingroupbugs.net/view.php?id=828
Simplify the M_PI*f macros by using casts directly in the defines
as suggested by Kenneth Graunke.
Cc: "10.2" <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78665
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Jonathan Gray <[email protected]>
|
|
|
|
|
| |
Commit a96c3bccf6791359d1159ebe9475e0ed5cf790ed intended to add these, but I
forgot to add the file.
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
| |
Real GPU queries need some infrastructure to track samples per tile and
accumulate the results. But fortunately this can be shared across GPU
generation.
See:
https://github.com/freedreno/freedreno/wiki/Queries#hardware-queries
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
| |
Split out fd_query into an abstract base class, to allow multiple
implementations. The current sw based queries are moved into
fd_sw_query.
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
As far as I can tell, Mesa hasn't had a convenient way to dump ARB_vp/fp
source until now. Using MESA_GLSL=dump is convenient, since it means
you can use a single environment variable to dump a program's shaders,
no matter which language they're written in.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The point of copytexsubimage_using_blit_framebuffer is to use a hardware
accelerated BlitFramebuffer path. If that fails, we shouldn't do a
swrast blit---we should try our CTSI fallback code.
This is especially important for i965 and GLES, where we don't even
create a swrast context.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77705
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Eric Anholt <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
The depth extent field is used to limit the allowed slice range that
can be rendered to.
With the previous setting, only slice 0 could be rendered.
This fixes piglit amd_vertex_shader_layer-layered-depth-texture-render.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
| |
If blorp is disabled for color clears, then piglit's
'gl-3.2-layered-rendering-clear-color-all-types 3d mipmapped'
will fail.
Currently, gen8 fails similarly on this test because gen8
does not use blorp.
Signed-off-by: Jordan Justen <[email protected]>
Reviewed-by: Chris Forbes <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
|
|
|
|
|
| |
With the more advanced dead code elimination pass already being run,
eliminate_dead_code was making no difference in instruction count, and had
an undesirable O(n^2) runtime. So remove it and rename
eliminate_dead_code_advanced to eliminate_dead_code.
Reviewed-by: Marek Olšák <marek.olsak at amd.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
That information misleads source code auditing tools to think that
ralloc itself is released under LGPL v3.
Instead, simply state talloc is not licensed under a permissive license.
v2: Use wording suggested by Kenneth.
Reviewed-by: Brian Paul <[email protected]>
Acked-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
We always call brw_merge_inputs() right before looping over the primitives but
this can be called inside the loop for each primitive too. In the case we do it
for the first primitive the call is redundant and can be skipped.
Reviewed-by: Eric Anholt <[email protected]>
|
|
|
|
|
|
| |
Instead take the result from the first call and use it where needed.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
We're moving towards requiring interface additions to be appended to the
end of the interface block. No functional change, opcodes are assigned as
before, but version 2 additions are now grouped together, which prevents
a scanner warning.
Cc: "10.2" <[email protected]>
Signed-off-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
| |
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
| |
Now that we aren't using pixel_[xy] in live variables, nothing is looking
at these regs after the visitor stage.
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the only case where a fs_reg in brw_fs_visitor is used during
optimization/code generation, and it meant that optimizations had to be
careful to not move pixel_x/y's register number without updating it.
Additionally, it turns out we had a couple of other UW values that weren't
getting this treatment (like gl_SampleID), so this more general fix is
probably a good idea (though I wasn't able to replicate problems with
either pixel_[xy]'s values or gl_SampleID, even when telling the register
allocator to reuse registers immediately)
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
The value depends only on the level, so no need to store the bool per slice.
Shrinks intel_mipmap_slice from 24 bytes to 16, while slotting into an
existing hole in intel_mipmap_level.
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
| |
Cc: "10.2" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Cc: "10.2" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Cc: "10.2" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Cc: "10.2" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Cc: "10.2" <[email protected]>
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Need to adjust coordinates since the shader receives the array index as
depth in z, but the TEX instruction expects it to be the second
coordinate for a 1D array texture. This fixes fbo-generatemipmap-array.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ben Skeggs <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
| |
Fixes the new logic of the conditional rendering piglit test.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ben Skeggs <[email protected]>
Cc: "10.2" <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Also make sure that pipe_blit_info gets zero'd out so that query isn't
accidentally left enabled.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Previously the implication was that queries should be disabled during
blits. However glBlitFramebuffer() is supposed to obey the current
query, and this new bit will indicate that to the driver.
Signed-off-by: Ilia Mirkin <[email protected]>
Cc: "10.2" <[email protected]>
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Different textures may be bound to each slot for each stage. So we need
to be able to upload ms parameters for each one without stages
overwriting each other.
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ben Skeggs <[email protected]>
Cc: "10.1 10.2" <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Ilia Mirkin <[email protected]>
Reviewed-by: Ben Skeggs <[email protected]>
Cc: "10.2 10.1" <[email protected]>
|
|
|
|
| |
Signed-off-by: Rob Clark <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|
|
|
|
| |
Reviewed-by: Michel Dänzer <[email protected]>
|