aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* gallium/radeon: clean left-shift undefined behaviorNicolai Hähnle2016-05-0711-3989/+3989
| | | | | | | | | | | | | | Shifting into the sign bit of a signed int is undefined behavior. Unfortunately, there are potentially many places where this happens using the register macros. This commit is the result of running sed -ie "s/(((\(\w\+\)) & 0x\(\w\+\)) << \(\w\+\))/(((unsigned)(\1) \& 0x\2) << \3)/g" on all header files in gallium/{r600,radeon,radeonsi}. Reviewed-by: Marek Olšák <[email protected]>
* gallium: fix various undefined left shifts into sign bitNicolai Hähnle2016-05-074-5/+5
| | | | | | | | | Funnily enough, some of these were turned into a compile-time error by gcc with -fsanitize=undefined ("initializer is not a constant"). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: Compute correct LDS size for fragment shaders.Bas Nieuwenhuizen2016-05-061-3/+6
| | | | | | | | No sure where the 36 came from, but we clearly need at least 48 bytes per attribute per primitive. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Add support for loading immediate values in QIR.Eric Anholt2016-05-064-0/+32
| | | | | | | This will be used for resetting the uniform stream in the presence of branching, but may also be useful as an optimization to reduce how many uniforms we have to copy out per draw call (in exchange for increasing icache pressure).
* vc4: Make vc4_qpu_validate() produce more verbose failures.Eric Anholt2016-05-061-35/+71
| | | | | | Seeing the expansion of a QPU_GET_FIELD in an assert isn't very informative, and it's hard find what's going wrong without getting a dump of the instruction that failed.
* vc4: Add a small QIR validate pass.Eric Anholt2016-05-064-0/+127
| | | | | This has caught a couple of bugs during loop development so far, and I should probably have written it long ago.
* vc4: Fix the src count on exp2/log2.Eric Anholt2016-05-061-2/+2
| | | | Found by the upcoming QIR validate pass.
* vc4: Reuse QPU disasm's cond flags in QIR.Eric Anholt2016-05-063-27/+46
| | | | In the process, this made me flatten out the "%s%s%s%s" fprintf arguments.
* vc4: When emitting an instruction to an existing temp, mark it non-SSA.Eric Anholt2016-05-061-0/+2
| | | | Prevents a bug in the later control-flow support series.
* vc4: Make sure that we don't overwrite the signal for PROG_END.Eric Anholt2016-05-061-0/+8
| | | | | | | | We should have already emitted a NOP due to the last instruction being a TLB or VPM write. However, if you disable dead code elimination then you might get dead code at the end, and that dead code might have the signal bits set to something non-default, at which point you die in assertion failure.
* nvc0: unreference images when the context is destroyedSamuel Pitoiset2016-05-061-0/+4
| | | | | | | Like other resources, we need to unreference all images. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* radeonsi: set DECOMPRESS_Z_ON_FLUSH if nr_samples >= 4Marek Olšák2016-05-061-1/+2
| | | | | | | | Vulkan always sets this. It only affects in-place Z decompression. This is recommended for performance, but what app uses MSAA depth texturing? Reviewed-by: Nicolai Hähnle <[email protected]>
* r600g: use the hw MSAA resolving if formats are compatibleMarek Olšák2016-05-061-1/+2
| | | | | | | This allows resolving RGBA into RGBX. This should improve HL2 Lost Coast performance. Reviewed-by: Alex Deucher <[email protected]>
* st/omx/enc: fix incorrect reference picture order for B framesLeo Liu2016-05-051-7/+12
| | | | | | | | | Stacking frames is for driver that's capable to do dual instances encoding. Such feature is not enabled for B frames currently. Signed-off-by: Leo Liu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Cc: "11.1 11.2" <[email protected]>
* vc4: fixup for new nir_foreach_block()Connor Abbott2016-05-054-48/+20
| | | | Reviewed-by: Eric Anholt <[email protected]>
* ir3: fixup for new nir_foreach_block()Connor Abbott2016-05-051-30/+21
|
* swr: [rasterizer core] Faster modulo operator in ProcessVertsTim Rowley2016-05-051-1/+4
| | | | | | Avoid % operator, since we know that curVertex is always incrementing. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Small warning cleanupTim Rowley2016-05-052-8/+4
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Add SWR_ASSUME / SWR_ASSUME_ASSERT macrosTim Rowley2016-05-052-14/+52
| | | | | | Fix static code analysis errors found by coverity on Linux Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Miscellaneous backend changesTim Rowley2016-05-053-22/+31
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Add support for X24_TYPELESS_G8_UINT formatTim Rowley2016-05-053-7/+41
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] Fix printing bugs for tracing.Tim Rowley2016-05-051-81/+24
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer memory] Add missing store tiles functionTim Rowley2016-05-051-1/+4
| | | | | | Storing color hot tile to 8bit w-major stencil format. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer jitter] Add asserts for supported formats in fetch shaderTim Rowley2016-05-051-0/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Fix thread allocationTim Rowley2016-05-051-17/+47
| | | | | | | | Fix windows in 32-bit mode when hyperthreading is disabled on Xeons. Some support for asymmetric processor topologies. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer core] Fix threadviz support in bucketsTim Rowley2016-05-053-12/+14
| | | | | | | Need to do lazy eval of the threadviz knob since order of globals is undefined. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: [rasterizer] Whitespace cleanup and misc changesTim Rowley2016-05-055-5/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: mark descriptor loads as using dynamically uniform indicesNicolai Hähnle2016-05-051-5/+17
| | | | | | | | This tells LLVM to always use SMEM loads for descriptors. It fixes a regression in piglit's arb_shader_storage_buffer_object/execution/indirect.shader_test that was caused by LLVM r268259 (but the proper fix is really here in Mesa). Reviewed-by: Marek Olšák <[email protected]>
* swr: Remove stall waiting for core query counters.Bruce Cherniak2016-05-054-124/+81
| | | | | | | | When gathering query results, swr_gather_stats was unnecessarily stalling the entire pipeline. Results are now collected asynchronously, with a fence marking completion. Reviewed-By: George Kyriazis <[email protected]>
* freedreno: remove null check before freeThomas Hindoe Paaboel Andersen2016-05-051-2/+1
| | | | Reviewed-by: Eduardo Lima Mitev <[email protected]>
* r600,compute: create vtx buffer for text + rodataJan Vesely2016-05-041-2/+10
| | | | | | | Reserve buffer id 2 Signed-off-by: Jan Vesely <[email protected]> Reviewed-by: Tom Stellard <[email protected]>
* freedreno: allow ctx->draw_vbo to failRob Clark2016-05-045-30/+37
| | | | | | | Pretty much only happens if shader variant compile fails. But in this case, if we haven't emitted cmdstream, we don't want to set needs_flush. Signed-off-by: Rob Clark <[email protected]>
* freedreno: move shader-stage dirty bits to global dirty flagRob Clark2016-05-048-59/+41
| | | | | | | | | | | This was always a bit overly complicated, and had some issues (like ctx->prog.dirty not getting reset at the end of the batch). It also required some special hacks to avoid resetting dirty state on binning pass. So just move it all into ctx->dirty (leaving some free bits for future shader stages), and make FD_DIRTY_PROG just be the union of all FD_SHADER_DIRTY_*. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: fix bogus offset for f32x24s8 stencil restoreRob Clark2016-05-041-4/+5
| | | | | | fixes: $piglit/bin/fbo-clear-formats GL_ARB_depth_buffer_float Signed-off-by: Rob Clark <[email protected]>
* freedreno: add some debug_asserts() to catch insane offsetsRob Clark2016-05-041-0/+2
| | | | | | | Ofc won't catch *all* faults, but at least helpful for catching offsets which are completely bogus. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a4xx: deal with VS which do not write positionRob Clark2016-05-041-0/+7
| | | | | | | | Fixes $piglit/bin/glsl-1.40-tf-no-position a3xx may need similar? Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: remove a couple redundant is_flow()sRob Clark2016-05-042-2/+2
| | | | | | | Now that the opc's encode the instruction category (making them unique) we no longer need to check the category in addition to the opc. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: cp small negative integers tooRob Clark2016-05-041-1/+2
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix # of registersRob Clark2016-05-041-1/+1
| | | | | | | The instruction encoding allows for more registers, but at least on a3xx/a4xx they don't actually exist. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: lower immeds to constRob Clark2016-05-043-4/+80
| | | | | | | | | | | | | | | | | Helps reduce register pressure and instruction counts for immediates that would otherwise require a mov into gpr. total instructions in shared programs: 4455332 -> 4369297 (-1.93%) total dwords in shared programs: 8807872 -> 8614432 (-2.20%) total full registers used in shared programs: 263062 -> 250846 (-4.64%) total half registers used in shader programs: 9845 -> 9845 (0.00%) total const registers used in shared programs: 1029735 -> 1466993 (42.46%) half full const instr dwords helped 0 10415 0 17861 5912 hurt 0 1157 21458 947 33 Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add ir3_cp_ctxRob Clark2016-05-043-12/+22
| | | | | | Needed in next commit.. just split out to reduce noise. Signed-off-by: Rob Clark <[email protected]>
* nouveau/video: properly detect the decoder class for availability checksIlia Mirkin2016-05-041-8/+17
| | | | | | | | | | | The kernel is now more strict with the class ids it exposes, so we need to check the G98 and MCP89 classes as well as the GT215 class. This effectively caused us to decide there were no decoding capabilities on newer kernel for VP3 chips. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95251 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.2" <[email protected]>
* gallium/util: change assertion to conditional in util_bitmask_destroy()Brian Paul2016-05-031-4/+4
| | | | | | | | If we fail to create a context in the VMware driver we call this function unconditionally to free a bunch of bit vectors. Instead of asserting on a null pointer, just no-op. Reviewed-by: Jose Fonseca <[email protected]>
* cso: null-out previously bound sampler statesBrian Paul2016-05-031-1/+3
| | | | | | | | | | | | | If, for example, we previously had 2 sampler states bound and now we are binding one, we'd leave the second sampler state unchanged. This change nulls-out the second sampler state in this situation. We're already doing the same thing for sampler views. This silences an occasional warning issued by the VMware driver when the number of sampler views and sampler states disagreed. Reviewed-by: Charmaine Lee <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* svga: try to flag surfaces for sampling, in addition to renderingBrian Paul2016-05-031-0/+11
| | | | | | | | | This silences some warnings when we try to sample from surfaces that were created for drawing, such as when blitting from one of the framebuffer surfaces. We were already doing the opposite situation (adding a bind flag for rendering to surfaces declared as texture sources). Reviewed-by: Charmaine Lee <[email protected]>
* svga: fix copying non-zero layers of 1D array texturesBrian Paul2016-05-031-10/+12
| | | | | | | | Like cube maps, we need to convert the z information to a layer index. Also rename the *_face vars to *_face_layer to make things a little more understandable. Reviewed-by: Charmaine Lee <[email protected]>
* svga: clean up svga_pipe_blit.cBrian Paul2016-05-031-68/+13
| | | | | | Remove dead code. Fix formatting. Reviewed-by: Charmaine Lee <[email protected]>
* rbug: s/Elements/ARRAY_SIZE/Brian Paul2016-05-031-1/+1
| | | | Signed-off-by: Brian Paul <[email protected]>
* freedreno: s/Elements/ARRAY_SIZE/Brian Paul2016-05-031-1/+1
| | | | | Signed-off-by: Brian Paul <[email protected]> Reviewed-by: Rob Clark <[email protected]>
* trace: s/Elements/ARRAY_SIZE/Brian Paul2016-05-031-4/+4
| | | | Signed-off-by: Brian Paul <[email protected]>