summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nv50/ir: only take abs value when computing high resultIlia Mirkin2015-11-071-1/+1
| | | | | | | | Not reachable from TGSI since it only has UMUL, no IMUL. However it's surprising that setting argument types to s32 will cause sign to get lost. Signed-off-by: Ilia Mirkin <[email protected]>
* nouveau: avoid queueing too much work onto a single fenceIlia Mirkin2015-11-072-26/+43
| | | | | | | | | | Force the fence to get kicked off, which won't actually wait for its completion, but any additional work will be put onto a fresh list. This fixes crashes in teximage-colors --benchmark with too many active maps. Signed-off-by: Ilia Mirkin <[email protected]>
* llvmpipe: disable front updates for nowDave Airlie2015-11-081-1/+1
| | | | | | | | As pointed out by Emil, this sometimes hangs, appears to be due to threading need to rethink how this stuff works for llvmpipe. Signed-off-by: Dave Airlie <[email protected]>
* virgl: wrap ret assignment with braces to do correct thingDave Airlie2015-11-082-2/+2
| | | | | | | Coverity reported that ret could only be 0 or 1, since it was setting ret = fn() > 0, instead of doing (ret = fn()) > 0. Signed-off-by: Dave Airlie <[email protected]>
* nir: Add a nir_deref_tail helperJason Ekstrand2015-11-073-23/+13
| | | | Reviewed-by: Connor Abbott <[email protected]>
* nir/types: Add an is_vector_or_scalar helperJason Ekstrand2015-11-072-0/+7
| | | | Reviewed-by: Connor Abbott <[email protected]>
* i965/fs: Use regs_read/written for post-RA scheduling in calculate_depsJason Ekstrand2015-11-071-11/+4
| | | | | | | | | | | | Previously, we were assuming that everything read/wrote exactly 1 logical GRF (1 in SIMD8 and 2 in SIMD16). This isn't actually true. In particular, the PLN instruction reads 2 logical registers in one of the components. This commit changes post-RA scheduling to use regs_read and regs_written instead so that we add enough dependencies. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92770 Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir/validate: Add better validation of load/store typesJason Ekstrand2015-11-071-2/+14
| | | | Reviewed-by: Connor Abbott <[email protected]>
* radeonsi: add register definitions for StoneyMarek Olšák2015-11-071-0/+322
| | | | | | There are a few non-stoney changes too. Reviewed-by: Alex Deucher <[email protected]>
* radeonsi: add workarounds for CP DMA to stay on the fast pathMarek Olšák2015-11-071-5/+88
| | | | | | v2: set emit_scratch_reloc, add a NULL check Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: unify CP DMA preparation logicMarek Olšák2015-11-071-37/+34
| | | | Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: unify CP DMA code determining various flagsMarek Olšák2015-11-071-28/+23
| | | | | | v2: don't call get_flush_flags twice per function Reviewed-by: Michel Dänzer <[email protected]>
* radeonsi: only enable write confirmation on the last CP DMA packetMarek Olšák2015-11-071-2/+4
| | | | | | This should improve performance for big copies that need to be split. Reviewed-by: Michel Dänzer <[email protected]>
* nv50/ir: allow emission of immediates in imul/imad opsIlia Mirkin2015-11-071-2/+8
| | | | | | | Nothing actually uses this yet (due to complications), but the emission logic is right. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: properly set the type of the constant folding resultIlia Mirkin2015-11-061-4/+4
| | | | | | | This removes the hack used for merge, which only covers a fraction of the cases. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add support for const-folding OP_CVT with F64 source/destIlia Mirkin2015-11-063-0/+45
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: add fp64 opcode emission support for G200 (NVA0)Ilia Mirkin2015-11-061-10/+84
| | | | | | Need to emulate rcp/rsq before providing full fp64 support Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Add support for 64bit immediates to checkSwapSrc01Hans de Goede2015-11-061-5/+6
| | | | | | | | Now that we support 64 bit immediates in insnCanLoad, we need to swap 64 bit immediate sources too for optimal effect. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: Teach insnCanLoad about double immediatesHans de Goede2015-11-061-6/+19
| | | | | | | | | | | | | | | | Teach insnCanLoad about double immediates, together with the "Add support for merge-s to the ConstantFolding pass" This turns the following (nvc0) code: 1: mov u32 $r2 0x00000000 (8) 2: mov u32 $r3 0x3fe00000 (8) 3: add f64 $r0d $r0d $r2d (8) Into: 1: add f64 $r0d $r0d 0.500000 (8) Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: Add support for merge-s to the ConstantFolding passHans de Goede2015-11-061-0/+15
| | | | | | | | | | | This allows later passes like LoadPropagation to properly deal with 64 bit immediates. If the new 64 bit load this introduces does not get optimized away then split64BitOpPostRA() will split this into 2 instructions again. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: disallow 64-bit immediates on nv50 targetsIlia Mirkin2015-11-061-1/+1
| | | | | | No instructions are able to load short immediates like nvc0 can. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: allow movs with TYPE_F64 destinations to be splitIlia Mirkin2015-11-061-0/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: Add support for double immediatesHans de Goede2015-11-061-1/+4
| | | | | | | | Add support for encoding double immediates (up to 20 bits of precision) into the generated gm107 machine-code. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: Add support for double immediatesHans de Goede2015-11-061-0/+8
| | | | | | | | Add support for encoding double immediates (up to 20 bits of precision) into the generated nvc0 machine-code. Signed-off-by: Hans de Goede <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965/nir/fs: Add comment for no-op memory barrier functionsFrancisco Jerez2015-11-061-0/+19
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965/nir/fs: Implement new barrier functions for compute shadersJordan Justen2015-11-061-0/+7
| | | | | | | | | | | | | | | | | | | | | | For these nir intrinsics, we emit the same code as nir_intrinsic_memory_barrier: * nir_intrinsic_memory_barrier_atomic_counter * nir_intrinsic_memory_barrier_buffer * nir_intrinsic_memory_barrier_image We treat these nir intrinsics as no-ops: * nir_intrinsic_group_memory_barrier * nir_intrinsic_memory_barrier_shared v3: * Add comment for no-op cases (curro) v4: * Moving comment to a separate patch authored by curro Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* nir: Add new barrier functions for compute shadersJordan Justen2015-11-062-0/+26
| | | | | | | | When these functions are called in glsl-ir, we create a corresponding nir intrinsic function call. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* glsl: Add new barrier functions for compute shadersJordan Justen2015-11-061-6/+49
| | | | | | | | | | | | | | | | | | | | | When these functions are called in GLSL code, we create an intrinsic function call: * groupMemoryBarrier => __intrinsic_group_memory_barrier * memoryBarrierAtomicCounter => __intrinsic_memory_barrier_atomic_counter * memoryBarrierBuffer => __intrinsic_memory_barrier_buffer * memoryBarrierImage => __intrinsic_memory_barrier_image * memoryBarrierShared => __intrinsic_memory_barrier_shared v2: * Consolidate with memoryBarrier function/intrinsic creation (curro) v3: * Instead of add_memory_barrier_function, add an intrinsic_name parameter to _memory_barrier (curro) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* radeon/uvd: fix VC-1 simple/main profile decode v2Boyuan Zhang2015-11-062-2/+7
| | | | | | | | | | | We just needed to set the extra width/height fields to get this working. v2 (chk): rebased, CC stable added, commit message added, fixed coding style Signed-off-by: Boyuan Zhang <[email protected]> Signed-off-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Cc: "10.6 11.0" <[email protected]>
* st/vaapi: fix vaapi VC-1 simple/main corruption v2Boyuan Zhang2015-11-061-0/+2
| | | | | | | | | | | Apply the start code fix only to advanced profile. v2 (chk): add commit message Signed-off-by: Boyuan Zhang <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Cc: "10.6 11.0" <[email protected]>
* st/va: add support for RGBX and BGRX in VPPJulien Isorce2015-11-062-18/+23
| | | | | | | | | | | | | | Before it was only possible to convert a NV12 surface to RGBA or BGRA. This patch uses the same post processing function, "handleVAProcPipelineParameterBufferType", but add definitions for RGBX and BGRX. This patch also makes vlVaQuerySurfaceAttributes more generic to avoid copy and pasting the same lines. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian K<C3><B6>nig <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* vl/buffers: add RGBX and BGRX to the supported formatsJulien Isorce2015-11-061-0/+18
| | | | | | | | | | Useful is one wants to create RGBX or BGRX surfaces. The infrastructure is such that it required just a few definitions to support these formats. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian K<C3><B6>nig <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/va: properly use brackets in vlVaAcquireBufferHandle's switchJulien Isorce2015-11-061-5/+4
| | | | | | | | | In "switch (mem_type)" the brackets were surrounding "case+default" instead of "case" only. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian K<C3><B6>nig <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/va: properly indent buffer.c, config.c, image.c and picture.cJulien Isorce2015-11-064-56/+56
| | | | | | | | Some lines were using 4 indentation spaces instead of 3. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian K<C3><B6>nig <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* freedreno/a4xx: fix blend colorRob Clark2015-11-061-5/+9
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: update generated headersRob Clark2015-11-066-43/+54
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno: add a305 supportGuillaume Charifi2015-11-061-0/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: Use nir_foreach_variableBoyan Ding2015-11-061-3/+3
| | | | | Signed-off-by: Boyan Ding <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* nir: some small cleanupsRob Clark2015-11-062-14/+14
| | | | | | | | | The various cf nodes all get allocated w/ shader as their ralloc_parent, so lets make this more explicit. Plus couple other corrections/ clarifications. Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nvc0: reintroduce BGRA4 format supportIlia Mirkin2015-11-062-3/+1
| | | | | | | | | | | | | | Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the support to fix a WoW trace. However after further experimentation, I was able to get the blit to work by using a different "fake" format in the 2d engine. The reason why this worked on nv50 is that nv50 falls back to the 3d blit path in case either the src or the dst aren't "faithfully" supported, while nvc0 only does it for the dst format. RG8 is better supported by the nvc0 2d engine than R16. Signed-off-by: Ilia Mirkin <[email protected]>
* mesa: report enum name in glClientActiveTexture() error stringBrian Paul2015-11-051-1/+2
| | | | As we do for glActiveTexture(). Trivial.
* st/va: fix memory leak on error in vlVaCreateSurfaces2Julien Isorce2015-11-051-3/+9
| | | | | | | | Found by coverity: CID #1337953 Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* st/va: indent vlVaQuerySurfaceAttributes and vlVaCreateSurfaces2Julien Isorce2015-11-051-283/+283
| | | | | | | | Some lines were using 4 indentation spaces instead of 3. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian König <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* i965: Fix scalar VS float[] and vec2[] output arrays.Kenneth Graunke2015-11-054-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scalar VS backend has never handled float[] and vec2[] outputs correctly (my original code was broken). Outputs need to be padded out to vec4 slots. In fs_visitor::nir_setup_outputs(), we tried to process each vec4 slot by looping from 0 to ALIGN(type_size_scalar(type), 4) / 4. However, this is wrong: type_size_scalar() for a float[2] would return 2, or for vec2[2] it would return 4. This looked like a single slot, even though in reality each array element would be stored in separate vec4 slots. Because of this bug, outputs[] and output_components[] would not get initialized for the second element's VARYING_SLOT, which meant emit_urb_writes() would skip writing them. Nothing used those values, and dead code elimination threw a party. To fix this, we introduce a new type_size_vec4_times_4() function which pads array elements correctly, but still counts in scalar components, generating correct indices in store_output intrinsics. Normally, varying packing avoids this problem by turning varyings into vec4s. So this doesn't actually fix any Piglit or dEQP tests today. However, if varying packing is disabled, things would be broken. Tessellation shaders can't use varying packing, so this fixes various tcs-input Piglit tests on a branch of mine. v2: Shorten the implementation of type_size_4x to a single line (caught by Connor Abbott), and rename it to type_size_vec4_times_4() (renaming suggested by Jason Ekstrand). Use type_size_vec4 rather than using type_size_vec4_times_4 and then dividing by 4. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* llvmpipe: disable texture cacheRoland Scheidegger2015-11-051-1/+1
| | | | There are some weird problems with 8-wide vectors.
* nouveau: send back a debug message when waiting for a fence to completeIlia Mirkin2015-11-0510-16/+30
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: provide debug messages with shader compilation statsIlia Mirkin2015-11-0511-9/+28
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nouveau: add support for sending debug messages via KHR_debugIlia Mirkin2015-11-055-0/+26
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* st/clover: provide a path for drivers to call through to pfn_notifyIlia Mirkin2015-11-054-4/+36
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> [ Francisco Jerez: Clean up clover::context interface by passing around a function object. ]
* st/mesa: set debug callback for debug contextsIlia Mirkin2015-11-051-0/+57
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]>