summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* mesa: move math-related function into new c99_math.h fileBrian Paul2015-02-231-1/+1
| | | | | | | | | | | | The alternative would be to include math.h in c99_compat.h but that seems heavy-handed. This patch also replaces INLINE with inline in the c99 math function wrappers. Fixes MSVC build. Acked-by: Matt Turner <[email protected]>
* nir/gcm: Add some missing break statementsJason Ekstrand2015-02-231-0/+4
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir: Copy-propagate vecN operations that are actually movesJason Ekstrand2015-02-231-16/+29
| | | | | | | | | | | | | | | | | | | We were already do this for ALU operations but we haven't for non-ALU operations. This changes that. total NIR instructions in shared programs: 2039883 -> 2022338 (-0.86%) NIR instructions in affected programs: 1768850 -> 1751305 (-0.99%) helped: 14244 HURT: 124 total FS instructions in shared programs: 4083960 -> 4084036 (0.00%) FS instructions in affected programs: 7302 -> 7378 (1.04%) helped: 12 HURT: 51 Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* ra: Disable round-robin strategy for optimistically colorable nodes.Francisco Jerez2015-02-231-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The round-robin allocation strategy is expected to decrease the amount of false dependencies created by the register allocator and give the post-RA scheduling pass more freedom to move instructions around. On the other hand it has the disadvantage of increasing fragmentation and decreasing the number of equally-colored nearby nodes, what increases the likelihood of failure in presence of optimistically colorable nodes. This patch disables the round-robin strategy for optimistically colorable nodes. These typically arise in situations of high register pressure or for registers with large live intervals, in both cases the task of the instruction scheduler shouldn't be constrained excessively by the dense packing of those nodes, and a spill (or on Intel hardware a fall-back to SIMD8 mode) is invariably worse than a slightly less optimal scheduling. Shader-db results on the i965 driver: total instructions in shared programs: 5488539 -> 5488489 (-0.00%) instructions in affected programs: 1121 -> 1071 (-4.46%) helped: 1 HURT: 0 GAINED: 49 LOST: 5 v2: Re-enable round-robin already for the lowest one of the nodes pushed optimistically onto the sack (Connor). v3: Use UINT_MAX instead of ~0, open-code MIN2 (Jason, Connor). Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: Fix lower_load_payload() not to use an incorrect half for ↵Francisco Jerez2015-02-231-0/+8
| | | | | | immediates and uniforms. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Fix lower_load_payload() to take into account non-zero reg_offset.Francisco Jerez2015-02-231-2/+2
| | | | | | | | Fixes metadata guess when instructions in the program specify a destination register with non-zero reg_offset and when the payload of a LOAD_PAYLOAD spans several registers. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Remove logic to keep track of MRF metadata in lower_load_payload().Francisco Jerez2015-02-231-26/+13
| | | | | | | MRFs cannot be read from anyway so they cannot possibly be a valid source of LOAD_PAYLOAD. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Less broken handling of force_writemask_all in lower_load_payload().Francisco Jerez2015-02-231-7/+13
| | | | | | | | | | It's perfectly fine to read the second half of a register written with force_writemask_all from a first half MOV instruction or vice versa, and lower_load_payload shouldn't mark the whole MOV as belonging to the second half in that case. Replicate the same metadata to both halves of the destination when writemasking is disabled. Reviewed-by: Jason Ekstrand <[email protected]>
* mesa/vbo: Use unreachable to silence uninitialized var warning.Matt Turner2015-02-231-2/+1
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: Move START/END_FAST_MATH macros to their only use.Matt Turner2015-02-232-79/+78
| | | | Reviewed-by: Eric Anholt <[email protected]>
* mesa: Remove definition of NULL.Matt Turner2015-02-231-4/+0
| | | | | | If your stdlib.h doesn't define this you should fix your stdlib.h. Reviewed-by: Eric Anholt <[email protected]>
* mesa: Use assert() instead of ASSERT wrapper.Matt Turner2015-02-23131-746/+735
| | | | Acked-by: Eric Anholt <[email protected]>
* mesa: Remove CHECK macro.Matt Turner2015-02-233-58/+30
| | | | | | | There's some commentary about how it's defined by other "modules", and maybe that was true in 2000 when the code was added. Reviewed-by: Eric Anholt <[email protected]>
* mesa: Remove dead CAPI define.Matt Turner2015-02-231-5/+0
| | | | Reviewed-by: Eric Anholt <[email protected]>
* gallium: Use util_cpu_to_le{16,32} in many more places.Matt Turner2015-02-235-390/+88
| | | | | | | | | | | | | ... and util_le{16,32}_to_cpu. I think I've used the right ones for describing the actual operation performed (even though they're both just "byte-swap this if I'm on big-endian"). The Linux Kernel has typedefs __le32/__be32 and friends that static analysis tools can use to check that byte-orderings are correct. It might be interesting to apply that here as well. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium/util: Use HAVE___BUILTIN_* macros.Matt Turner2015-02-231-6/+5
| | | | | Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* mesa: Move C99 MSVC compatibility code from u_math.h to c99_compat.h.Matt Turner2015-02-231-143/+0
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* i965: Link test programs with gtest before pthreads.Matt Turner2015-02-231-10/+10
| | | | | Cc: "10.5" <[email protected]> Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=540962
* osmesa: add gallium include dirs to Makefile.amBrian Paul2015-02-231-0/+2
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89260 Reviewed-by: Jose Fonseca <[email protected]>
* util: move pipe_prim_names array into u_prim_name()Brian Paul2015-02-231-24/+21
| | | | | | Also, wrapping the array in #ifdef DEBUG / #endif doesn't seem necessary. Reviewed-by: Jose Fonseca <[email protected]>
* util: rewrite debug_print_transfer_flags() using debug_dump_flags()Brian Paul2015-02-231-28/+13
| | | | | | Add add missing PIPE_TRANSFER_PERSISTENT, PIPE_TRANSFER_COHERENT flags. Reviewed-by: Jose Fonseca <[email protected]>
* mesa: Adds missing error condition in _mesa_check_sample_count()Eduardo Lima Mitev2015-02-231-3/+4
| | | | | | | | | | | | This corrects a trivial error introduced in commit 19252fee46b835cb4f6b1cce18d7737d62b64a2e. That patch was merged recently and omits one condition (that 'samples' is greater than zero) in one of the error checks. That error will definitely cause regressions. Also corrects the reference to the specification above the error check, which was wrongly quoting OpenGL instead of OpenGL-ES. Reviewed-by: Martin Peres <[email protected]>
* radeonsi: fix a warning caused by previous commitMarek Olšák2015-02-231-1/+1
| | | | Cc: 10.5 10.4 <[email protected]>
* radeonsi: fix point spritesMarek Olšák2015-02-231-1/+1
| | | | | | | | Broken by a27b74819ad375e8c0bc88e13f42c951d2b5cd6a. This fix is critical and should be ported to stable ASAP. Cc: 10.5 10.4 <[email protected]>
* i965/skl: Use 1 register for uniform pull constant payloadBen Widawsky2015-02-221-1/+1
| | | | | | | | | | | | | | When under dispatch_width=16 the previous code would allocate 2 registers for the payload when only one is needed. This manifested itself through bugs on SKL which needs to mess with this instruction. Ken though this might impact shader-db, but apparently it doesn't Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89118 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88999 Signed-off-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Tested-by: Timo Aaltonen <[email protected]>
* nir: Generalize the optimization of subs of subs from 0.Eric Anholt2015-02-211-2/+2
| | | | | | | | | | | | I initially wrote this based on the "(('fneg', ('fneg', a)), a)" above, but we can generalize it and make it more potentially useful. In the specific original case of a 0 for our new 'a' argument, it'll get further algebraic optimization once the 0 is an argument to the new add. No shader-db effects. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Collapse repeated bcsels on the same argument.Eric Anholt2015-02-211-0/+1
| | | | | | | | | vc4 results: total instructions in shared programs: 39881 -> 39794 (-0.22%) instructions in affected programs: 6302 -> 6215 (-1.38%) Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: When faced with a csel on !condition, just flip the arguments.Eric Anholt2015-02-211-0/+1
| | | | | | | | total NIR instructions in shared programs: 39426 -> 39411 (-0.04%) NIR instructions in affected programs: 3748 -> 3733 (-0.40%) Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: Allow nir_opt_algebraic to see booleanness through &&, ||, ^, !.Eric Anholt2015-02-211-1/+29
| | | | | | | | | | | | | | | | | We have some useful optimizations to drop things like 'ine a, 0' on a boolean argument, but if 'a' came from logical operations on bools, it couldn't tell. These kinds of constructs appear as a result of TGSI->NIR quite frequently (at least with if flattening), so being a little more aggressive in detecting booleans can pay off. v2: Add ixor as a booleanness-preserving op (Suggestion by Connor). vc4 results: total instructions in shared programs: 40207 -> 39881 (-0.81%) instructions in affected programs: 6677 -> 6351 (-4.88%) Reviewed-by: Matt Turner <[email protected]> (v1) Reviewed-by: Connor Abbott <[email protected]>
* nir: Add a couple of simplifications of csel operations.Eric Anholt2015-02-211-0/+3
| | | | | | | | vc4 was already cleaning these up, but it does shave 4 NIR instructions in shader-db. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* glsl: ensure that enter/leave record get a record typeIlia Mirkin2015-02-212-0/+5
| | | | | | | May make life easier for tools like Coverity. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* tgsi: avoid returning pointer to local var, make it staticIlia Mirkin2015-02-211-1/+1
| | | | | | | | Spotted by Coverity. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* freedreno/a4xx: set PC_PRIM_VTX_CNTL.VAROUT properlyRob Clark2015-02-211-1/+6
| | | | | | | Fixes xonotic, some webgl stuff, and really pretty much anything with more than 4 varyings. Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2015-02-217-16/+44
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: bit of cleanupRob Clark2015-02-214-33/+27
| | | | Signed-off-by: Rob Clark <[email protected]>
* loader: not having a pci-id should not be a warnRob Clark2015-02-211-3/+6
| | | | | | | | | | If there is no pci-id, which is valid for vc4 and freedreno, just emit an info msg. Keep malformed but existing pci-id's as a warning. Mostly just to clean up a warning that confuses users for the non-pci devices. Signed-off-by: Rob Clark <[email protected]>
* freedreno: implement fenceRob Clark2015-02-214-74/+65
| | | | | | | | | | I never actually implemented the stubbed out fence stuff back in the early days. Fix that. We'll need a few libdrm_freedreno changes to handle timeout properly, so ignore that for now to avoid a libdrm_freedreno dependency bump. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a2xx: fix increment in assertRob Clark2015-02-211-1/+2
| | | | | Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88883 Signed-off-by: Rob Clark <[email protected]>
* i965/fs: Use fs_reg for CS/VS atomics pixel mask immediate dataJordan Justen2015-02-211-2/+2
| | | | | | | | | | | The brw_imm_ud will yield a HW_REG which then will introduce a barrier for certain optimization opportunities. No piglit regressions seen with gen8 (simd8vs). Suggested-by: Matt Turner <[email protected]> Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Set pixel/sample mask for compute shaders atomic opsJordan Justen2015-02-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | For fragment programs, we pull this mask from the payload header. The same mask doesn't exist for compute shaders, so we set all bits to enabled. Previously we were setting 0xff to support SIMD8 VS, but with CS we support SIMD16, and therefore we change this to 0xffff. Related commits for SIMD8 VS: commit d9cd982d556be560af3bcbcdaf62b6b93eb934a5 Author: Ben Widawsky <[email protected]> Date: Sun Feb 15 20:06:59 2015 -0800 i965/simd8vs: Fix SIMD8 atomics commit 4a95be9772a255776309f23180519a4a8560f2dd Author: Jordan Justen <[email protected]> Date: Tue Feb 17 09:57:35 2015 -0800 i965/simd8vs: Fix SIMD8 atomics (read-only) Note: this mask is ANDed with the execution mask, so some channels may not end up issuing the atomic operation. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* ilo: R32G32B32_FLOAT need no special care on Gen8+Chia-I Wu2015-02-211-3/+6
| | | | | Gen8+ must use VALIGN_4. Unlike prior Gens, R32G32B32_FLOAT should supposedly support VALIGN_4.
* ilo: 128 BPP formats can use TiledY on Gen7.5+Chia-I Wu2015-02-211-1/+6
| | | | The restriction is lifted.
* nvc0: enable double supportIlia Mirkin2015-02-201-2/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: remove merge/split pairs to allow normal propagation to occurIlia Mirkin2015-02-201-0/+30
| | | | | | | | | | Because the TGSI interface creates merges for each instruction source and then splits them back out, there are a lot of unnecessary merge/split pairs which do essentially nothing. The various modifier/etc propagation doesn't know how to walk though those, so just remove them when they're unnecessary. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: add support for new TGSI double opcodesIlia Mirkin2015-02-201-0/+236
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: handle zero and negative sqrt argumentsIlia Mirkin2015-02-201-2/+14
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: no instruction can load a double immediateIlia Mirkin2015-02-201-0/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: fix lowering of RSQ/RCP/SQRT/MOD to work with F64Ilia Mirkin2015-02-205-16/+40
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: fix F2F flipped stype/dtype flagsIlia Mirkin2015-02-201-2/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: fix DSET boolean float flagIlia Mirkin2015-02-201-0/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>