summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* vc4: Only render tiles where the scissor ever intersected them.Eric Anholt2014-12-304-10/+52
| | | | | This gives a 2.7x improvement in x11perf -rect100, since we only end up load/storing the x11perf window, not the whole screen.
* vc4: Move draw call reset handling to a helper function.Eric Anholt2014-12-301-23/+31
| | | | | | This will be more important in the next commit, when there's more state to reset to nonzero values, and I want an early exit from the submit function.
* vc4: Drop the content of vc4_flush_resource().Eric Anholt2014-12-301-4/+4
| | | | | The callers all follow it with a flush of the context, and the flush of the context gives us more information about how things are being flushed.
* mesa: Remove __SSE4_1__ guards from sse_minmax.c.Matt Turner2014-12-291-3/+0
| | | | | | See commit e07c9a288. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/vec4: Do separate copy followed by constant propagation after ↵Matt Turner2014-12-291-1/+2
| | | | | | | | | | | | | | | | | opt_vector_float(). total instructions in shared programs: 5877012 -> 5876617 (-0.01%) instructions in affected programs: 33140 -> 32745 (-1.19%) From before the commit that allows VF constant propagation (which hurt some programs) to here, the results are: total instructions in shared programs: 5877951 -> 5876617 (-0.02%) instructions in affected programs: 123444 -> 122110 (-1.08%) with no programs hurt. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Allow constant propagation of VF immediates.Matt Turner2014-12-291-1/+27
| | | | | | | | | | total instructions in shared programs: 5877951 -> 5877012 (-0.02%) instructions in affected programs: 155923 -> 154984 (-0.60%) Helps 1233, hurts 156 shaders. The hurt shaders are addressed in the next commit. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Add parameter to skip doing constant propagation.Matt Turner2014-12-292-3/+3
| | | | | | | | | | | | | | | | After CSEing some MOV ..., VF instructions we have code like mov tmp, [1F, 2F, 3F, 4F]VF mov r10, tmp mov r11, tmp ... use r10 use r11 We want to copy propagate tmp into the uses of r10 and r11, but *not* constant propagate the VF immediate into the uses of tmp. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Do CSE, copy propagation, and DCE after opt_vector_float().Matt Turner2014-12-291-1/+5
| | | | | | | total instructions in shared programs: 5869005 -> 5868220 (-0.01%) instructions in affected programs: 70208 -> 69423 (-1.12%) Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Perform CSE on MOV ..., VF instructions.Matt Turner2014-12-291-5/+11
| | | | | | | | Port of commit a28ad9d4 from the fs backend. No shader-db changes since we don't emit MOV ..., VF instructions yet. Reviewed-by: Ian Romanick <[email protected]>
* i965/vec4: Add pass to gather constants into a vector-float MOV.Matt Turner2014-12-292-0/+62
| | | | | | | | | | Currently only handles consecutive instructions with the same destination that collectively write all channels. total instructions in shared programs: 5879798 -> 5869011 (-0.18%) instructions in affected programs: 465236 -> 454449 (-2.32%) Reviewed-by: Ian Romanick <[email protected]>
* i965: Add support for saturating immediates.Matt Turner2014-12-294-0/+80
| | | | | | | I don't feel great about assert(!"unimplemented: ...") but these cases do only seem possible under some currently impossible circumstances. Reviewed-by: Ian Romanick <[email protected]>
* i965: Add fs_reg/src_reg constructors that take vf[4].Matt Turner2014-12-293-0/+19
| | | | | | | | | | Sometimes it's easier to generate 4x values into an array, and the memcpy is 1 instruction, rather than 11 to piece 4 arguments together. I'd forgotten to remove the prototype from fs_reg from a previous patch, so it's already there for us here. Reviewed-by: Ian Romanick <[email protected]>
* gallium/target: Drop no longer needed Haiku viewport overrideAlexander von Gluck IV2014-12-271-30/+1
| | | | | * Drop no longer needed mesa headers * Haiku LLVM pipe working with LLVM 3.5.0 on x86_64
* gallium/st: Clean up Haiku depth mapping, fix colorspace errorsAlexander von Gluck IV2014-12-271-29/+19
|
* vc4: Handle unaligned accesses in CL emits.Eric Anholt2014-12-252-26/+78
| | | | | | | As of 229bf4475ff0a5dbeb9bc95250f7a40a983c2e28 we started getting SIBGUS from unaligned accesses on the hardware, for reasons I haven't figured out. However, we should be avoiding unaligned accesses anyway, and our CL setup certainly would have produced them.
* vc4: Don't bother zero-initializing the shader reloc indices.Eric Anholt2014-12-251-2/+2
| | | | | They should all be set to real values by the time they're read, and ideally if you used valgrind you'd see uninitialized value uses.
* vc4: Fix the argument type for cl_u16().Eric Anholt2014-12-251-1/+1
| | | | It doesn't matter, since it just got truncated to 16 inside, anyway.
* egl: Fix non-dri SCons builds re #87657Alexander von Gluck IV2014-12-252-11/+9
| | | | | * Revert change to egl main producing Shared Libraries * Check for dri before including dri code
* radeonsi: Don't modify PA_SC_RASTER_CONFIG register value if rb_mask == 0Michel Dänzer2014-12-251-2/+4
| | | | | | | | | | | E.g. this could happen on older kernels which don't support the RADEON_INFO_SI_BACKEND_ENABLED_MASK query yet. The code in si_write_harvested_raster_configs() doesn't deal with this correctly and would probably mangle the value badly. Cc: "10.4 10.3" <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* vc4: Optimize CL emits by doing size checks up front.Eric Anholt2014-12-245-16/+66
| | | | | | | | The optimizer obviously doesn't have the ability to rewrite these to skip the size checks per call, so we have to do it manually. Improves a norast benchmark on simulation by 0.779706% +/- 0.405838% (n=6087).
* vc4: Avoid repeated hindex lookups in the loop over tiles.Eric Anholt2014-12-242-15/+24
| | | | | Improves norast performance of a microbenchmark by 11.1865% +/- 2.37673% (n=20).
* i965: Add missing BRW_NEW_*_PROG_DATA to texture/renderbuffer atoms.Kenneth Graunke2014-12-241-2/+10
| | | | | | | | | | | | | This was probably missed when moving from a fixed binding table layout to a dynamic one that changes based on the shader. Fixes newly proposed Piglit test fbo-mrt-new-bind. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=87619 Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Mike Stroyan <[email protected]> Cc: "10.4 10.3" <[email protected]>
* i965: Cache register write capability checks.Kenneth Graunke2014-12-241-0/+12
| | | | | | | | | | | | | | | | | | | Our ability to perform register writes depends on the hardware and kernel version. It shouldn't ever change on a per-context basis, so we only need to check once. Checking introduces a synchronization point between the CPU and GPU: even though we submit very few GPU commands, the GPU might be busy doing other work, which could cause us to stall for a while. On an idle i7 4750HQ, this improves performance in OglDrvCtx (a context creation microbenchmark) by 6.14748% +/- 1.6837% (n=20). With Unigine Valley running in the background (to keep the GPU busy), it improves performance in OglDrvCtx by 2290.92% +/- 29.5274% (n=5). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Ben Widawsky <[email protected]>
* freedreno/ir3: split out legalize passRob Clark2014-12-235-154/+214
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: ra debugRob Clark2014-12-233-17/+61
| | | | | | Some compile time RA debug Signed-off-by: Rob Clark <[email protected]>
* egl/haiku: Clean up SConscript whitespaceAlexander von Gluck IV2014-12-231-13/+12
|
* egl/dri2: Fix build of dri2 egl driver with SConsAlexander von Gluck IV2014-12-231-0/+40
| | | | | * egl/dri2 was missing a SConscript * Problem caught by Adrián Arroyo Calle
* egl: Clean up Haiku visual creationAlexander von Gluck IV2014-12-231-49/+47
| | | | | | * Only create one struct * 'final' also is a language conflict * Some style cleanup
* egl: Add Haiku code and supportAlexander von Gluck IV2014-12-239-4/+514
| | | | | | | * This is the cleaned up work of the Haiku GCI student Adrián Arroyo Calle [email protected] * Several patches were consolidated to prevent unnecessary touching of non-related code
* glsl: check if implicitly sized arrays match explicitly sized arrays across ↵Timothy Arceri2014-12-231-1/+20
| | | | | | | | | the same stage V2: Improve error message. Signed-off-by: Timothy Arceri <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use safer pointer arithmetic in gather_oa_results()Chad Versace2014-12-221-1/+1
| | | | | | | | | | | | | | | | | | This patch reduces the likelihood of pointer arithmetic overflow bugs in gather_oa_results(), like the one fixed by b69c7c5dac. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I get nervous when I see code patterns like this: (void*) + (int) * (int) I smell 32-bit overflow all over this code. This patch retypes 'snapshot_size' to 'ptrdiff_t', which should fix any potential overflow. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Use safer pointer arithmetic in intel_texsubimage_tiled_memcpy()Chad Versace2014-12-221-3/+4
| | | | | | | | | | | | | | | | | | This patch reduces the likelihood of pointer arithmetic overflow bugs in intel_texsubimage_tiled_memcpy() , like the one fixed by b69c7c5dac. I haven't yet encountered any overflow bugs in the wild along this patch's codepath. But I recently solved, in commit b69c7c5dac, an overflow bug in a line of code that looks very similar to pointer arithmetic in this function. This patch conceptually applies the same fix as in b69c7c5dac. Instead of retyping the variables, though, this patch adds some casts. (I tried to retype the variables as ptrdiff_t, but it quickly got very messy. The casts are cleaner). Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Fix intel_miptree_map() signature to be more 64-bit safeChad Versace2014-12-225-10/+24
| | | | | | | | | | | | | | | | | This patch should diminish the likelihood of pointer arithmetic overflow bugs, like the one fixed by b69c7c5dac. Change the type of parameter 'out_stride' from int to ptrdiff_t. The logic is that if you call intel_miptree_map() and use the value of 'out_stride', then you must be doing pointer arithmetic on 'out_ptr'. Using ptrdiff_t instead of int should make a little bit harder to hit overflow bugs. As a side-effect, some function-scope variables needed to be retyped to avoid compilation errors. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Remove spurious casts in copy_image_with_memcpy()Chad Versace2014-12-221-4/+4
| | | | | | | | If a pointer points to raw, untyped memory and is never dereferenced, then declare it as 'void*' instead of casting it to 'void*'. Signed-off-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* radeonsi: force NaNs to 0Marek Olšák2014-12-211-4/+8
| | | | | | | | | This fixes incorrect rendering in Unreal Engine demos. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=83510 Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* st/nine: fix DBG typo (trivial)David Heidelberg2014-12-211-1/+1
| | | | | Signed-off-by: David Heidelberg <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* r300g: implement ARR opcodeDavid Heidelberg2014-12-214-4/+16
| | | | | | | | | | Same as ARL, just has extra rounding. Useful for st/nine. Tested-by: Pavel Ondračka <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: David Heidelberg <[email protected]> Reviewed-by: Alex Deucher <[email protected]>
* freedreno/a4xx: blend-colorRob Clark2014-12-201-0/+13
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: alpha-testRob Clark2014-12-201-0/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2014-12-206-61/+151
|
* freedreno/ir3: trans_kill cleanupRob Clark2014-12-201-12/+7
| | | | | | | trans_kill() only handles the single opcode. Drop the remnant of a time when both KILL and KILL_IF were handled by the same fxn. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: hack for standalone compilerRob Clark2014-12-201-1/+5
| | | | | | | | | Standalone compiler doesn't have screen or context. We need to come up with a better way to control the target arch (ie. something that we can control from cmdline w/ standalone compiler) but for now this hack keeps it from segfault'ing. Signed-off-by: Rob Clark <[email protected]>
* i965/fs: Add missing const qualifier.Matt Turner2014-12-191-1/+1
|
* vc4: Coalesce MOVs into VPM with the instructions generating the values.Eric Anholt2014-12-184-15/+143
| | | | | total instructions in shared programs: 41168 -> 40976 (-0.47%) instructions in affected programs: 18156 -> 17964 (-1.06%)
* vc4: Redefine VPM writes as a (destination) QIR register file.Eric Anholt2014-12-173-7/+19
| | | | | This will let me coalesce the VPM writes into the instructions generating the values.
* gallium: remove support for GCC older than 4.2.0Timothy Arceri2014-12-181-1/+1
| | | | | | Signed-off-by: Timothy Arceri <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* vc4: Add support for turning constant uniforms into small immediates.Eric Anholt2014-12-1713-46/+283
| | | | | | | | | | | | | | | | | | | | | | Small immediates have the downside of taking over the raddr B field, so you might have less chance to pack instructions together thanks to raddr B conflicts. However, it also reduces some register pressure since it lets you load 2 "uniform" values in one instruction (avoiding a previous load of the constant value to a register), and increases some pairing for the same reason. total uniforms in shared programs: 16231 -> 13374 (-17.60%) uniforms in affected programs: 10280 -> 7423 (-27.79%) total instructions in shared programs: 40795 -> 41168 (0.91%) instructions in affected programs: 25551 -> 25924 (1.46%) In a previous version of this patch I had a reduction in instruction count by forcing the other args alongside a SMALL_IMM to be in the A file or accumulators, but that increases register pressure and had a bug in handling FRAG_Z. In this patch is I just use raddr conflict resolution, which is more expensive. I think I'd rather tweak allocation to have some way to slightly prefer good choices for files in general, rather than risk failing to register allocate by forcing things into register classes.
* vc4: Move follow_movs() to common QIR code.Eric Anholt2014-12-173-11/+12
| | | | I want this from other passes.
* vc4: Fix missing newline for load immediate instruction disasm.Eric Anholt2014-12-171-4/+4
|
* mesa: Remove unnecessary -f from $(RM).Matt Turner2014-12-174-8/+8
| | | | $(RM) includes -f.