summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* swr/rast: Package events.proto with core outputGeorge Kyriazis2018-04-272-2/+32
| | | | | | | | However only if the file exists in DEBUG_OUTPUT_DIR. The expectation is that AR rasterizerLauncher will start placing it there when launching a workload (which is in a subsequent checkin) Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix init in EventHandlerWorkerStatsGeorge Kyriazis2018-04-271-1/+4
| | | | | | Make sure we initialize variables. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix return type of VCVTPS2PH.George Kyriazis2018-04-271-1/+1
| | | | | | expecting <8xi16> return. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: WIP Translation handlingGeorge Kyriazis2018-04-272-18/+26
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Use different handing for stream masksGeorge Kyriazis2018-04-275-6/+11
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Silence warningsGeorge Kyriazis2018-04-273-4/+2
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add support for TexelMask evaluationGeorge Kyriazis2018-04-272-0/+44
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Internal core changeGeorge Kyriazis2018-04-271-0/+1
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix x86 lowering 64-bit float handlingGeorge Kyriazis2018-04-272-6/+56
| | | | | | | | | - 64-bit cvt-to-float needs to be explicitly handled - gathers need the right parameter types to work with doubles Fixes draw-vertices piglit tests Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add some SIMD_T utility functorsGeorge Kyriazis2018-04-271-0/+66
| | | | | | VecEqual and VecHash Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix wrong type allocationGeorge Kyriazis2018-04-271-1/+1
| | | | | | ALLOCA pointer elements, not pointers. Reviewed-by: Bruce Cherniak <[email protected]>
* swr: touch generated files to update timestampGeorge Kyriazis2018-04-271-0/+11
| | | | | | previous change in generators necessitates this change Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix byte offset for non-indexed drawsGeorge Kyriazis2018-04-271-3/+3
| | | | | | for the case when USE_SIMD16_SHADERS == FALSE Reviewed-by: Bruce Cherniak <[email protected]>
* etnaviv: remove not needed includesChristian Gmeiner2018-04-271-3/+0
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Philipp Zabel <[email protected]>
* etnaviv: remove redundant includeChristian Gmeiner2018-04-271-2/+0
| | | | | Signed-off-by: Christian Gmeiner <[email protected]> Reviewed-by: Philipp Zabel <[email protected]>
* broadcom/vc5: Add support for centroid varyings.Eric Anholt2018-04-264-4/+53
| | | | | | | | | It would be nice to share the flags packet emit logic with flat shade flags, but I couldn't come up with a good way while still using our pack macros. We need to refactor this to shader record setup at compile time, anyway. Fixes ext_framebuffer_multisample-interpolation * centroid-*
* broadcom/vc5: Add an assert about GFXH-1559.Eric Anholt2018-04-261-0/+9
| | | | | Our TF outputs always start at 6 or 7 currently, so we don't hit the broken 8 case. Let's make sure that doesn't change somehow.
* broadcom/vc5: Implement GFXH-1742 workaround (emit 2 dummy stores on 4.x).Eric Anholt2018-04-261-8/+27
| | | | | | This should fix help with intermittent GPU hangs in tests switching formats while rendering small frames. Unfortunately, it didn't help with the tests I'm having troubles with.
* st/va: Fix typosDrew Davenport2018-04-261-24/+24
| | | | | | | | | s/attibute/attribute/ s/suface/surface/ v2: rebased(Leo) Reviewed-by: Leo Liu <[email protected]>
* st/va: Fix potential buffer overreadDrew Davenport2018-04-261-1/+1
| | | | | | | | VASurfaceAttribExternalBuffers.pitches is indexed by plane. Current implementation only supports single plane layout. Reviewed-by: Kristian H. Kristensen <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* radeon/vcn: fix mpeg4 msg buffer settingsBoyuan Zhang2018-04-261-9/+9
| | | | | | | | Previous bit-fields assignments are incorrect and will result certain mpeg4 decode failed due to wrong flag values. This patch fixes these assignments. Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* gallium/util: Fix incorrect refcounting of separate stencil.Eric Anholt2018-04-251-2/+1
| | | | | | | | | | | The driver may have a reference on the separate stencil buffer for some reason (like an unflushed job using it), so we can't directly free the resource and should instead just decrement the refcount that we own. Fixes double-free in KHR-GLES3.packed_depth_stencil.blit.depth32f_stencil8 on vc5. Fixes: e94eb5e6000e ("gallium/util: add u_transfer_helper") Reviewed-by: Rob Clark <[email protected]>
* broadcom/vc5: Fix reloads of separate stencil buffers.Eric Anholt2018-04-251-4/+16
| | | | Like for stores, we need to emit a separate load_general packet.
* broadcom/vc5: Fix cpp of MSAA surfaces on 4.x.Eric Anholt2018-04-252-3/+5
| | | | | | The internal-type-bpp path is for surfaces that get stored in the raw TLB format. For 4.x, we're storing MSAA as just 2x width/height at the original format.
* broadcom/vc5: Implement stencil blits using RGBA.Eric Anholt2018-04-252-2/+83
| | | | Fixes piglit fbo-depthstencil blit default_fb
* broadcom/vc5: Remove leftover vc4 MSAA lowering setup in the FS key.Eric Anholt2018-04-251-4/+1
|
* broadcom/vc5: Fix tile load/store of MSAA surfaces on 4.x.Eric Anholt2018-04-251-1/+11
| | | | | For single-sample we have to always program SAMPLE_0, but for multisample we want to store all the samples.
* draw: fix different sign logic when clippingRoland Scheidegger2018-04-251-9/+8
| | | | | | | | | | | | | | | | | | | The logic was flawed, since mul(x,y) will be <= 0 (exactly 0) when the sign is the same but both numbers are sufficiently small (if the product is smaller than 2^-128). This could apparently lead to emitting a sufficient amount of additional bogus vertices to overflow the allocated array for them, hitting an assertion (still safe with release builds since we just aborted clipping after the assertion in this case - I'm however unsure if this is now really no longer possible, so that code stays). Not sure if the additional vertices could cause other grief, I didn't see anything wrong even when hitting the assertion. Essentially, both +-0 are treated as positive (the vertex is considered to be inside the clip volume for this plane), so integrate the logic determining different sign into the branch there. Reviewed-by: Jose Fonseca <[email protected]>
* draw: simplify clip null tri logicRoland Scheidegger2018-04-251-11/+9
| | | | | | | | | Simplifies the logic when to emit null tris (albeit the reasons why we have to do this remain unclear). This is strictly just logic simplification, the behavior doesn't change at all. Reviewed-by: Jose Fonseca <[email protected]>
* nvc0/ir: all short immediates are sign-extended, adjust LIMM testIlia Mirkin2018-04-243-19/+24
| | | | | | | | | | | | | | | | | | | | | | Some analysis suggests that all short immediates are sign-extended. The insnCanLoad logic already accounted for this, but we could still pick the wrong form when emitting actual instructions that support both short and long immediates (with the long form usually having additional restrictions that insnCanLoad should be aware of). This also reverses a bunch of commits that had previously "worked around" this issue in various emitters: 9c63224540ef: gm107/ir: make use of ADD32I for all immediates 83a4f28dc27b: gm107/ir: make use of LOP32I for all immediates b84c97587b4a: gm107/ir: make use of IMUL32I for all immediates d30768025a22: gk110/ir: make use of IMUL32I for all immediates as well as the original import for UMUL in the nvc0 emitter. Reported-by: Karol Herbst <[email protected]> Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Karol Herbst <[email protected]>
* meson: raise required version to 0.44.1Dylan Baker2018-04-241-6/+0
| | | | | | | | | | | | We have already required 0.44 for building clover and swr, so it was already partially required. This just makes it required across the board instead of just for clover and swr. There is a bug in 0.44 which makes it impossible to build mesa in some configurations, so require 0.44.1 which fixes this. Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* meson: fix graw-xlib after auxiliary consolidationDylan Baker2018-04-241-2/+1
| | | | | | | | | | This one's completely my fault, I didn't do good enough testing after rebasing and this got missed. Fixes: d28c24650110c130008be3d3fe584520ff00ceb1 ("meson: build graw tests") Signed-off-by: Dylan Baker <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* gm107/ir/lib: fix sched in div u32 builtinKarol Herbst2018-04-242-4/+4
| | | | | | | | | | Imad needs to set a read barrier. With significant big work groups I was getting wrong results for div u32. Turns out the issue was with the sched opcodes. Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* broadcom/vc5: Set up internal_format for imported resources.Eric Anholt2018-04-241-0/+2
| | | | | Without this, we'd assertion fail in u_transfer_helper when mapping an imported resource.
* broadcom/vc5: Assert that created BOs have offset != 0.Eric Anholt2018-04-241-0/+1
| | | | | The kernel shouldn't return a bo at NULL, and the HW special-cases NULL address values for things like OQs.
* broadcom/vc5: Don't allocate simulator BOs at offset 0.Eric Anholt2018-04-241-1/+5
| | | | | The kernel won't return us BOs at offset 0 (because things like OQs wouldn't work there), so we shouldn't in the simulator either.
* broadcom/vc5: Add sim support for the GET_BO_OFFSET ioctl.Eric Anholt2018-04-242-6/+21
| | | | | Otherwise we'd crash immediately upon importing a BO through EGL interfaces.
* broadcom/vc5: Treat imports of DRM_FORMAT_MOD_INVALID BOs as linear.Eric Anholt2018-04-241-0/+1
| | | | | We don't have any kernel metadata about BO tiling, so this probably is all we should do for the moment.
* Revert "st/dri: Fix dangling pointer to a destroyed dri_drawable"Marek Olšák2018-04-241-4/+0
| | | | | | | | | | This reverts commit dab02dea3411d325a5aee6cda5b581e61396ecc6. It causes crashes of qtcreator and firefox. Fixes: dab02de "st/dri: Fix dangling pointer to a destroyed dri_drawable" Cc: 18.0 18.1 <[email protected]>
* gallivm: dump bitcode before optimizationRoland Scheidegger2018-04-241-13/+20
| | | | | | | | | | | | | | | | | | If we dump the bitcode for off-line debug purposes, we really want the pre-optimized bitcode, otherwise it's useless in identifying problems with IR optimization (if you have a shader which takes an hour to do IR optimization, it's also nice you don't have to wait that hour...). Also, print out the function passes for opt which correspond to what was used for jit compilation (and also the opt level for codegen). Using opt/llc this way should then pretty much mimic what was done for jit. (When specifying something like -time-passes -debug-pass=[Structure|Arguments] (for either opt or llc) that also gives very useful information in which passes all the time was spent, and which passes are really run along with the order - llvm will add passes due to dependencies on its own, and of course -O2 for llc comes with a ~100 pass list.) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: (trivial) do division by 1000 with int64Roland Scheidegger2018-04-241-1/+1
| | | | | | | Conversion to int can otherwise overflow if compile times are over ~71min. (Yes this can happen...) Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: remove LICM passRoland Scheidegger2018-04-241-1/+9
| | | | | | | | | | | | | | LICM is simply too expensive, even though it presumably can help quite a bit in some cases. It was definitely cheaper in llvm 3.3, though as far as I can tell with llvm 3.3 it failed to do anything in most cases. early-cse also actually seems to cause licm to be able to move things when it previously couldn't, which causes noticeable compile time increases. There's more loop passes in llvm, but I'm not sure which ones are helpful, and I couldn't find anything which would roughly do what the old licm in llvm 3.3 did, so ditch it. Reviewed-by: Jose Fonseca <[email protected]>
* gallivm: add early cse passRoland Scheidegger2018-04-241-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This pass is quite cheap, and can simplify the IR quite a bit for our generated IR. In particular on a variety of shaders I've found the time saved by other passes due to the simplified IR more than makes up for the cost of this pass, and on top of that the end result is actually better. The only downside I've found is this enables the LICM pass to move some things out of the main shader loop (in the case I've seen, instanced vertex fetch (which is constant within the jit shader) plus the derived instructions in the shader) which it couldn't do before for some reason. This would actually be desirable but can increase compile time considerably (licm seems to have considerable cost when it actually can move things out of loops, due to alias analysis). But blaming early cse for this seems inappropriate. (Note that the first two sroa / earlycse passes are similar to what a standard llvm opt -O1/-O2 pipeline would do, albeit this has some more passes even before but I don't think they'd do much for us.) It also in particular helps some crazy shader used for driver verification (don't ask...) a lot (about factor of 6 faster in compile time) (due to simplfiying the ir before LICM is run). While here, also move licm behind simplifycfg. For some shaders there seems to be very significant compile time gains (we've seen a factor of 10000 albeit that was a really crazy shader you'd certainly never see in a real app), beause LICM is quite expensive and there's cases where running simplifycfg (along with sroa and early-cse) before licm reduces IR complexity significantly. (I'm not entirely sure if it would make sense to also run it afterwards.) Reviewed-by: Jose Fonseca <[email protected]>
* ac/radv/radeonsi: refactor harvest config register getters.Dave Airlie2018-04-241-105/+6
| | | | | | | | This refactors the code out to share it between radv and radeonsi. Reviewed-by: Bas Nieuwenhuizen <[email protected]> Acked-by: Nicolai Hähnle <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* ac/radv/radeonsi: refactor max simd waves into common code.Dave Airlie2018-04-241-11/+1
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* ac/radv/radeonsi: refactor raster_config default values getters.Dave Airlie2018-04-241-82/+3
| | | | | | | This just makes this common code between the two drivers. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: use common gs_table_depth codeDave Airlie2018-04-241-31/+2
| | | | | Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: don't runtime check gs table infoDave Airlie2018-04-241-7/+7
| | | | | | | | We can just unreachable here, this aligns with radv code, makes it easier to move to common code. Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* st/dri: Fix dangling pointer to a destroyed dri_drawableJohan Klokkhammer Helsing2018-04-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | If an EGLSurface is created, made current and destroyed, and then a second EGLSurface is created. Then the second malloc in driCreateNewDrawable may return the same pointer address the first surface's drawable had. Consequently, when dri_make_current later tries to determine if it should update the texture_stamp it compares the surface's drawable pointer against the drawable in the last call to dri_make_current and assumes it's the same surface (which it isn't). When texture_stamp is left unset, then dri_st_framebuffer_validate thinks it has already called update_drawable_info for that drawable, leaving it unvalidated and this is when bad things starts to happen. In my case it manifested itself by the width and height of the surface being unset. This is fixed this by setting the pointer to NULL before freeing the surface. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106126 Signed-off-by: Johan Klokkhammer Helsing <[email protected]> Signed-off-by: Marek Olšák <[email protected]> Cc: 18.0 18.1 <[email protected]>
* nv50/ir: make a copy of tex src if it's referenced multiple timesIlia Mirkin2018-04-221-37/+49
| | | | | | | | | | | | | | | | For nv50 we coalesce the srcs and defs into a single node. As such, we can end up with impossible constraints if the source is referenced after the tex operation (which, due to the coalescing of values, will have overwritten it). This logic already exists for inserting moves for MERGE/UNION sources. It's the exact same idea here, so leverage that code, which also includes a few optimizations around not extending live ranges unnecessarily. Fixes tests/spec/glsl-1.30/execution/fs-textureSize-components.shader_test Signed-off-by: Ilia Mirkin <[email protected]>