| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
This fixes a bug on BDW when our meta-based stencil blit path assert-fails
due to an invalid internal format even though we do support the
ARB_stencil_texturing extension.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
No functional change.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
| |
The size of a Node is always four bytes so no need for the old code
that was used when sizeof(Node)==8.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
sizeof(Node) is always 4 bytes.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Just minor clean-up.
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The _mesa_dlist_alloc() function is only guaranteed to return a pointer
with 4-byte alignment. On 64-bit systems which don't support unaligned
loads (e.g. SPARC or MIPS) this could lead to a bus error in the VBO code.
The solution is to add a new _mesa_dlist_alloc_aligned() function which
will return a pointer to an 8-byte aligned address on 64-bit systems.
This is accomplished by inserting a 4-byte NOP instruction in the display
list when needed.
The only place this actually matters is the VBO code where we need to
allocate a 'struct vbo_save_vertex_list' which needs to be 8-byte
aligned (just as if it were malloc'd).
The gears demo and others hit this bug.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88662
Cc: "10.4" <[email protected]>
Reviewed-by: José Fonseca <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to the SKL bspec the 3DSTATE_CONSTANT_* commands only take
effect on the next corresponding 3DSTATE_BINDING_TABLE_POINTER_*
command. This patch just makes it set the BRW_NEW_SURFACES state when
uploading the push constants to ensure the binding tables will be
updated.
This fixes the fbo-blending-formats Piglit test and possibly others.
Reviewed-by: Kristian Høgsberg <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When color buffers alone are concerned the depth is not needed.
No regression on BDW where meta blit is used instead of blorp. I
also disabled blorp temporarily for fbo-blits on IVB and saw no
regressions there either.
I also compared several graphics benchmarks on BDW and saw neither
regressions or improvements.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Implementing an idea from Ken, on i965 the shader program for 2D
blits becomes significantly simpler.
Before:
pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
send(8) g2<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q };
mov(8) g123<1>F g2<8,8,1>F { align1 1Q compacted };
mov(8) g124<1>F g3<8,8,1>F { align1 1Q compacted };
mov(8) g125<1>F g4<8,8,1>F { align1 1Q compacted };
mov(8) g126<1>F g5<8,8,1>F { align1 1Q compacted };
mov(8) g127<1>F g2<8,8,1>F { align1 1Q compacted };
nop ;
sendc(8) null g123<8,8,1>F
render RT write SIMD8 LastRT Surface = 0 mlen 5 rlen 0 { align1 1Q EOT };
After:
pln(8) g6<1>F g4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
pln(8) g7<1>F g4.4<0,1,0>F g2<8,8,1>F { align1 1Q compacted };
send(8) g124<1>UW g6<8,8,1>F
sampler (1, 0, 0, 1) mlen 2 rlen 4 { align1 1Q };
sendc(8) null g124<8,8,1>F
render RT write SIMD8 LastRT Surface = 0 mlen 4 rlen 0 { align1 1Q EOT };
v2 (Matt): Removed unintended white-space change
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
Currently all blit programs are unconditionally compiled with
gl_FragDepth.
Signed-off-by: Topi Pohjolainen <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This patch advertises support for GL_OES_texture_*float* extensions
when using i965 drivers.
Signed-off-by: Kevin Rogovin <[email protected]>
Signed-off-by: Kalyan Kondapally <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This patch adds needed support for accepting HALF_FLOAT_OES as valid type
for TexImage*D and TexSubImage*D when Texture FLoat extensions are supported.
Signed-off-by: Kevin Rogovin <[email protected]>
Signed-off-by: Kalyan Kondapally <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch series adds support for following GLES2 Texture Float extensions:
1)GL_OES_texture_float,
2)GL_OES_texture_half_float,
3)GL_OES_texture_float_linear,
4)GL_OES_texture_half_float_linear.
This patch adds basic infrastructure and needed boolean flags to advertise
support for these extensions, by default the support is disabled. Next patch
in the series introduces support for HALF_FLOAT_OES token.
v4: take assert away and make valid_filter_for_float conditional (Tapani),
fix the alphabetical order (Emil)
Signed-off-by: Kevin Rogovin <[email protected]>
Signed-off-by: Kalyan Kondapally <[email protected]>
Reviewed-by: Tapani Pälli <[email protected]>
|
|
|
|
|
|
| |
We have two copies of it in the tree, I'm going to delete one.
Reviewed-by: Marek Olšák <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commits d6eb572905e39c36168b8f5da240af961f9dde0a and
58e8468d113c7d3d4a59ea4a8d70fd45b78e85e6.
This is no longer necessary as we aren't using it in NIR anymore. Also, it
broke the build on some strange systems so let's put it back in querymatrix
where it came from.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88852
Acked-by: Matt Turner <[email protected]>
|
|
|
|
|
|
| |
Signed-off-by: Sven Arvidsson <[email protected]>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87076
Signed-off-by: Marek Olšák <[email protected]>
|
|
|
|
| |
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
| |
I haven't actually seen this bug in the wild, but it's possible that
someone could ask to do a S3TC PBO download or something. This protects us
from accidentally creating a render target with a compressed or otherwise
non-renderable format.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88792
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
Patch adds 2 error messages that point user directly to fix
mispelled or impossible swizzle field for a format.
Signed-off-by: Tapani Pälli <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Added intel_readpixels_tiled_mempcpy and intel_gettexsubimage_tiled_mempcpy
functions. These are the fast paths for glReadPixels and glGetTexImage.
On chrome, using the RoboHornet 2D Canvas toDataURL test, this patch cuts
amount of time spent in glReadPixels by more than half and reduces the time
of the entire test by 10%.
v2: Jason Ekstrand <[email protected]>
- Refactor to make the functions look more like the old
intel_tex_subimage_tiled_memcpy
- Don't export the readpixels_tiled_memcpy function
- Fix some pointer arithmatic bugs in partial image downloads (using
ReadPixels with a non-zero x or y offset)
- Fix a bug when ReadPixels is performed on an FBO wrapping a texture
miplevel other than zero.
v3: Jason Ekstrand <[email protected]>
- Better documentation fot the *_tiled_memcpy functions
- Add target restrictions for renderbuffers wrapping textures
v4: Jason Ekstrand <[email protected]>
- Only check the return value of brw_bo_map for error and not bo->virtual
v5: Jason Ekstrand <[email protected]>
- Don't unnecessarily repeat a comment
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit addes tiled copy functions for coping from tiled memory to
linear memory. These are very similar to the existing linear-to-tiled
paths.
v2: Jason Ekstrand <[email protected]>
- New commit message
- Various whitespace fixes
- Added ptrdiff_t casts as done in commit 225a09790
v3: Jason Ekstrand <[email protected]>
- Fixed a comment
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit refactors the tiled_memcpy code in intel_tex_subimage.c and
moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and
ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled
respectively. The *_faster functions are similarly renamed.
There was also a bit of logic to select between the the libc provided
memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle.
This was moved into an intel_get_memcpy function so that rgba8_copy can
live (and be inlined) in intel_tiled_memcpy.c.
v2: Jason Ekstrand <[email protected]>
- Better commit message
- Fix up the copyright on the intel_tiled_memcpy files
- Various whitespace fixes
- Moved a bunch of stuff that did not need to be exposed from
intel_tiled_memcpy.h to intel_tiled_memcpy.c
- Added proper documentation for intel_get_memcpy
- Incorperated the ptrdiff_t tweaks from commit 225a09790
v3: Jason Ekstrand <[email protected]>
- Fixed a comment
- Move the tile size constants into the .c file
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
|
|
| |
There's no reason why we should be doing this for 2D textures and not
rectangles. Just a matter of adding another hunk to the condition.
Signed-off-by: Jason Ekstrand <[email protected]>
Reviewed-by: Chad Versace <[email protected]>
|
|
|
|
|
|
| |
Fixes compilation with musl libc.
Reviewed-by: Ian Romanick <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
"MOV.nz null src" and "CMP.nz null src 0" are equivalent instructions.
Previously, we deleted MOV.nz instructions when the instruction
generating the MOV's source also wrote the flag register (as the flag
register already contains the desired value). However, we wouldn't
delete CMP.nz instructions that served the same purpose.
We also didn't attempt true cmod propagation on MOV.nz instructions,
while we would for the equivalent CMP.nz form.
This patch fixes both limitations, treating both forms equally.
CMP.nz instructions will now be deleted (helping the NIR backend),
and MOV.nz instructions will have their .nz propagated.
No changes in shader-db without NIR. With NIR,
total instructions in shared programs: 6006153 -> 5969364 (-0.61%)
instructions in affected programs: 2087139 -> 2050350 (-1.76%)
helped: 10704
HURT: 0
GAINED: 2
LOST: 2
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we used a system where a file, nir_opcodes.h, defined some macros that
were included to generate the enum values and the nir_op_infos structure. This
worked pretty well, but for development the error messages were never very
useful, Python tools couldn't understand the opcode list, and it was difficult
to use nir_opcodes.h to do other things like autogenerate a builder API. Now, we
store opcode information in nir_opcodes.py, and we have nir_opcodes_c.py to
generate the old nir_opcodes.c and nir_opcodes_h.py to generate nir_opcodes.h,
which contains all the enum names and gets included into nir.h like before. In
addition to solving the above problems, using Python and Mako to generate
everything means that it's much easier to add keep information centralized as we
add new things like constant propagation that require per-opcode information.
v2:
- make Opcode derive from object (Dylan)
- don't use assert like it's a function (Dylan)
- style fixes for fnoise, use xrange (Dylan)
- use iterkeys() in nir_opcodes_h.py (Dylan)
- use pydoc-style comments (Jason)
- don't make fmin/fmax commutative and associative yet (Jason)
Signed-off-by: Connor Abbott <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
v3 Jason Ekstrand <[email protected]>
- Alphabetize source file lists
- Generate nir_opcodes.h in the builddir instead of the source dir
- Include $(builddir)/src/glsl/nir in the i965 build
- Rework nir_opcodes.h generation so it generates a complete header file
instead of one that has to be embedded inside an enum declaration
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 5952059 -> 5951603 (-0.01%)
instructions in affected programs: 138812 -> 138356 (-0.33%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For some reason, we occasionally write the flag register with a MOV.NZ
instruction:
add(8) g25<1>F -g6<0,1,0>F g15<8,8,1>F
cmp.l.f0(8) g26<1>D g25<8,8,1>F 0F
mov.nz.f0(8) null g26<8,8,1>D
A MOV.NZ instruction on the result of a CMP is like comparing for
equality with true in C. It's useless. Removing it allows us to
generate:
add.l.f0(8) null -g6<0,1,0>F g15<8,8,1>F
total instructions in shared programs: 5955701 -> 5951657 (-0.07%)
instructions in affected programs: 302910 -> 298866 (-1.34%)
GAINED: 1
LOST: 0
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to apply the optimization in cases where the CMP's
argument is negated, by flipping the conditional mod. For example, it
allows us to optimize this:
add(8) temp a b
cmp.l.f0(8) null -temp 0.0
into
add.g.f0(8) temp a b
total instructions in shared programs: 5958360 -> 5955701 (-0.04%)
instructions in affected programs: 466880 -> 464221 (-0.57%)
GAINED: 0
LOST: 1
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 5959463 -> 5958900 (-0.01%)
instructions in affected programs: 70031 -> 69468 (-0.80%)
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
| |
total instructions in shared programs: 5974160 -> 5959463 (-0.25%)
instructions in affected programs: 1743737 -> 1729040 (-0.84%)
GAINED: 0
LOST: 12
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
Otherwise we'll apply the conditional mod to only one of SIMD8
instructions and trigger an assertion.
NoDDClr/NoDDChk have the same problem but we never apply those to these
instructions, so I'm leaving them for a later time.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
| |
3-src instructions can only have GRF/MRF destinations. It's really
difficult to deal with that restriction in dead code elimination (that
wants to give instructions null destinations to show that their result
isn't used) while allowing 3-src instructions to have conditional mod,
so don't, and just give then a destination before register allocation.
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
| |
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that we properly track accumulator dependencies, the scheduler is
able to schedule instructions between the mach and mov in the common
the integer multiplication pattern:
mul acc0, x, y
mach null, x, y
mov dest, acc0
Since a null destination implies no dependency on the destination, we
can also safely schedule instructions (that don't write the accumulator)
between the mul and mach.
GAINED: 103
LOST: 43
Causes one program to spill (643 -> 1076 instructions).
I committed this patch last year (commit 42a26cb5) but reverted it
(commit 0d3f83f4) after inexplicable artifacts in Kerbal Space Program
(bug 78648). Tapani reapplied this patch and could not reproduce the bug
with current Mesa.
Reviewed-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If try_replace_with_sel is able to replace the flow control with a SEL
instruction, then there is no flow control... failing SIMD16 because
of nonexistent flow control is wrong.
No piglit regressions on any i965 platform in Jenkins.
total instructions in shared programs: 4382707 -> 4382707 (0.00%)
instructions in affected programs: 0 -> 0
helped: 0
HURT: 0
GAINED: 2089
LOST: 0
No other platforms affected in shader-db.
Signed-off-by: Ian Romanick <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Reviewed-by: Kenneth Graunke <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows us to count NIR instructions via shader-db.
Use "run" as normal. The results file will contain both NIR and
assembly.
Then, to generate a NIR report:
./report.py <(grep NIR results/foo) <(grep NIR results/bar)
Or, to generate an i965 report:
./report.py <(grep -v NIR results/foo) <(grep -v NIR results/bar)
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
| |
This is useful for debugging and looking for optimization opportunities.
It will need to be expanded when we add support for other scalar stages.
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We want to run CSE and algebraic optimizations again after lowering IO.
Some of the passes in the optimization loop don't handle saturates and
other modifiers, so run it before lowering to source modifiers.
total instructions in shared programs: 6046190 -> 6045768 (-0.01%)
instructions in affected programs: 22406 -> 21984 (-1.88%)
helped: 47
HURT: 0
GAINED: 0
LOST: 0
Signed-off-by: Kenneth Graunke <[email protected]>
Reviewed-by: Jason Ekstrand <[email protected]>
|
| |
|
|
|
|
|
| |
Some glapi headers used to be generated from this Makefile.am, but no
longer.
|