summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* nv50: expose two groups of compute-related MP perf countersSamuel Pitoiset2015-11-206-2/+63
| | | | | | | This turns on GL_AMD_performance_monitor. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
* gallium: add the concept of batch queriesNicolai Hähnle2015-11-202-0/+2
| | | | | | | | | | | | | | | | | Some drivers (in particular radeon[si], but also freedreno judging from a quick grep) may want to expose performance counters that cannot be individually enabled or disabled. Allow such drivers to mark driver-specific queries as requiring a new type of batch query object that is used to start and stop a list of queries simultaneously. v3: adjust recently added nv50 queries v2: documentation for create_batch_query Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* gallium: remove pipe_driver_query_group_info field typeNicolai Hähnle2015-11-201-4/+0
| | | | | | | | | | | | | | | | | | | This was only used to implement an unnecessarily restrictive interpretation of the spec of AMD_performance_monitor. The spec says A performance monitor consists of a number of hardware and software counters that can be sampled by the GPU and reported back to the application. I guess one could take this as a requirement that counters _must_ be sampled by the GPU, but then why are they called _software_ counters? Besides, there's not much reason _not_ to expose all counters that are available, and this simplifies the code. v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset) Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nv50: allow using inline vertex data submit when gl_VertexID is usedSamuel Pitoiset2015-11-195-15/+46
| | | | | | | | | | | | | | | The hardware can actually generates vertexid when vertices come from a client-side buffer like when glDrawElements is used. This doesn't fix (or break) any piglit tests but it improves the previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex data submit when gl_VertexID is used") The only disadvantage is that only works on G84+, but we don't really care of that weird and old NV50 chipset. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: add NV84_3D macroSamuel Pitoiset2015-11-193-3/+4
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: actually emit AFETCH on keplerIlia Mirkin2015-11-181-0/+3
| | | | | | | | Looks like this was forgotten in the commit which added the AFETCH logic. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org
* nv50: add missing header into the sources listEmil Velikov2015-11-161-0/+1
| | | | | | Otherwise it won't end up in the tarball. Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
* nv50,nvc0: disable render condition around clear_* functionsIlia Mirkin2015-11-144-0/+32
| | | | | | | Only the regular "clear" call is supposed to respect the render condition. The rest should ignore it. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: add support for performance metrics on G84+Samuel Pitoiset2015-11-144-3/+259
| | | | | | | | Currently only one metric is exposed but more will be added later. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Pierre Moreau <pierre.morrow@free.fr> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: add compute-related MP perf counters on G84+Samuel Pitoiset2015-11-149-2/+548
| | | | | | | | | | | | | | | | | | These compute-related MP performance counters have been reverse engineered using CUPTI which is part of NVIDIA CUDA. As for nvc0, we use a compute kernel to read out those performance counters, and the command stream to configure them. Note that Tesla only exposes 4 MP performance counters, while Fermi has 8. Only G84+ is supported because G80 is an old and weird card. Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64, xonotic-glx, heaven and valley. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Pierre Moreau <pierre.morrow@free.fr> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: implement a basic compute supportSamuel Pitoiset2015-11-1410-9/+1006
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds the ability to launch simple compute kernels like the one I will use to read out MP performance counters in the upcoming patch. This compute support is based on the work of Francisco Jerez (aka curro) that he did as part of his EVoC project in 2011/2012 to get OpenCL working on Tesla. His original work can be found here: https://github.com/curro/mesa/commits/nv50-compute I did some improvements on the original code, like fixing using both 3D and COMPUTE simultaneously, improving global buffers binding, and making the code closer to what nvc0 already does. This compute support has been tested by Pierre Moreau and myself with some compute kernels. This is a step towards OpenCL. Speaking about this, it seems like compute programs overlap fragment programs when they are used both. To fix this, we need to re-validate fragment programs when binding compute programs and vice versa. Note that, textures, samplers and surfaces still need to be implemented. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Tested-by: Pierre Moreau <pierre.morrow@free.fr> Acked-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: free interpolation parameters in nv50_program_destroy()Samuel Pitoiset2015-11-141-1/+1
| | | | | | | | As for nvc0, we need to free memory allocated by interpolation parameters. This fixes a memory leak spotted by valgrind. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: reduce the number of GPR used when reading MP perf countersSamuel Pitoiset2015-11-141-1/+2
| | | | | | | No need to allocate more GPR than used in the compute kernel which reads MP performance counters on Fermi. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nouveau: don't expose HEVC decoding supportIlia Mirkin2015-11-141-0/+1
| | | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org
* nvc0/ir: add support for TGSI_SEMANTIC_HELPER_INVOCATIONIlia Mirkin2015-11-126-0/+6
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50,nvc0: add ARB_clear_texture supportIlia Mirkin2015-11-115-7/+101
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototypeIlia Mirkin2015-11-113-0/+3
| | | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Marek Olšák <marek.olsak@amd.com>
* nvc0: enable compute support on FermiSamuel Pitoiset2015-11-081-2/+2
| | | | | | | | | | | Altough the compute support is still not complete because textures and surfaces need to be implemented, it allows to launch very simple compute kernel like one which reads reading MP performance counters. This turns on PIPE_CAP_COMPUTE and PIPE_SHADER_COMPUTE. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: fix emission of s[] args in certain situationsIlia Mirkin2015-11-071-2/+2
| | | | | | | | | | There might only be a single arg (e.g. cvt), so use mode rather than looking at the source directly. Also we don't want to rely on the type of the value, which can be unreliable, but instead use the instruction's. This works out well since mkSplit doesn't adjust the type. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: only take abs value when computing high resultIlia Mirkin2015-11-071-1/+1
| | | | | | | | Not reachable from TGSI since it only has UMUL, no IMUL. However it's surprising that setting argument types to s32 will cause sign to get lost. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nouveau: avoid queueing too much work onto a single fenceIlia Mirkin2015-11-072-26/+43
| | | | | | | | | | Force the fence to get kicked off, which won't actually wait for its completion, but any additional work will be put onto a fresh list. This fixes crashes in teximage-colors --benchmark with too many active maps. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: allow emission of immediates in imul/imad opsIlia Mirkin2015-11-071-2/+8
| | | | | | | Nothing actually uses this yet (due to complications), but the emission logic is right. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: properly set the type of the constant folding resultIlia Mirkin2015-11-061-4/+4
| | | | | | | This removes the hack used for merge, which only covers a fraction of the cases. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: add support for const-folding OP_CVT with F64 source/destIlia Mirkin2015-11-063-0/+45
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: add fp64 opcode emission support for G200 (NVA0)Ilia Mirkin2015-11-061-10/+84
| | | | | | Need to emulate rcp/rsq before providing full fp64 support Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: Add support for 64bit immediates to checkSwapSrc01Hans de Goede2015-11-061-5/+6
| | | | | | | | Now that we support 64 bit immediates in insnCanLoad, we need to swap 64 bit immediate sources too for optimal effect. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: Teach insnCanLoad about double immediatesHans de Goede2015-11-061-6/+19
| | | | | | | | | | | | | | | | Teach insnCanLoad about double immediates, together with the "Add support for merge-s to the ConstantFolding pass" This turns the following (nvc0) code: 1: mov u32 $r2 0x00000000 (8) 2: mov u32 $r3 0x3fe00000 (8) 3: add f64 $r0d $r0d $r2d (8) Into: 1: add f64 $r0d $r0d 0.500000 (8) Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: Add support for merge-s to the ConstantFolding passHans de Goede2015-11-061-0/+15
| | | | | | | | | | | This allows later passes like LoadPropagation to properly deal with 64 bit immediates. If the new 64 bit load this introduces does not get optimized away then split64BitOpPostRA() will split this into 2 instructions again. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: disallow 64-bit immediates on nv50 targetsIlia Mirkin2015-11-061-1/+1
| | | | | | No instructions are able to load short immediates like nvc0 can. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50/ir: allow movs with TYPE_F64 destinations to be splitIlia Mirkin2015-11-061-0/+6
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* gm107/ir: Add support for double immediatesHans de Goede2015-11-061-1/+4
| | | | | | | | Add support for encoding double immediates (up to 20 bits of precision) into the generated gm107 machine-code. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0/ir: Add support for double immediatesHans de Goede2015-11-061-0/+8
| | | | | | | | Add support for encoding double immediates (up to 20 bits of precision) into the generated nvc0 machine-code. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: reintroduce BGRA4 format supportIlia Mirkin2015-11-062-3/+1
| | | | | | | | | | | | | | Commit 342e68dc60 (nvc0: remove BGRA4 format support) removed the support to fix a WoW trace. However after further experimentation, I was able to get the blit to work by using a different "fake" format in the 2d engine. The reason why this worked on nv50 is that nv50 falls back to the 3d blit path in case either the src or the dst aren't "faithfully" supported, while nvc0 only does it for the dst format. RG8 is better supported by the nvc0 2d engine than R16. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nouveau: send back a debug message when waiting for a fence to completeIlia Mirkin2015-11-0510-16/+30
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50,nvc0: provide debug messages with shader compilation statsIlia Mirkin2015-11-0511-9/+28
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nouveau: add support for sending debug messages via KHR_debugIlia Mirkin2015-11-055-0/+26
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nouveau: relax fence emit space assertIlia Mirkin2015-11-043-3/+3
| | | | | | | | | We also have the "reserved for kick" space available. Some of my earlier changes can probably be removed, but this is a quick fix for some of the rarer fallout. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: <mesa-stable@lists.freedesktop.org>
* nvc0: add missing compute parameters required by cloverSamuel Pitoiset2015-11-031-1/+10
| | | | | | | This fixes crashes with some piglit OpenCL tests. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: handle NULL pointer in nvc0_get_compute_param()Samuel Pitoiset2015-11-031-24/+21
| | | | | | | | | To get the size (in bytes) of a compute parameter, clover first calls get_compute_param() with a NULL data pointer. The RET() macro is based on nv50. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: use correct heaps for FP and GP code segmentsSamuel Pitoiset2015-11-011-2/+2
| | | | | | This is just a cosmetic change. Trivial. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nouveau: get rid of tabsIlia Mirkin2015-10-3119-607/+607
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: do not create an invalid HW query typeSamuel Pitoiset2015-10-302-12/+30
| | | | | | | | While we are at it, store the rotate offset for occlusion queries to nv50_hw_query like on nvc0. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
* nv50: move HW queries to nv50_query_hw.c/h filesSamuel Pitoiset2015-10-308-349/+476
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
* nv50: move nva0_so_target_save_offset() to its correct locationSamuel Pitoiset2015-10-303-21/+18
| | | | | Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
* nv50: add a header file for nv50_querySamuel Pitoiset2015-10-306-40/+49
| | | | | | | | | Like for nvc0, this will allow to split different types of queries and to prepare the way for both global performance counters and MP counters. While we are at it, make use of nv50_query struct instead of pipe_query. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nv50: mark contexts shareable, compile at creation timeIlia Mirkin2015-10-292-1/+4
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nv50: allow per-sample interpolation to be forced via rastIlia Mirkin2015-10-298-9/+52
| | | | | | | | Uses the same technique as for nvc0 of fixups before upload, and evicting in case of state change. Removes one source of variants kept by st/mesa. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: expose a group of performance metrics on FermiSamuel Pitoiset2015-10-293-3/+16
| | | | | | | This allows to monitor those performance metrics through GL_AMD_performance_monitor. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
* nv50/ir: adapt to new method for passing in cull/clip distance masksIlia Mirkin2015-10-294-14/+14
| | | | Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
* nvc0: share shaders between contexts and build immediatelyIlia Mirkin2015-10-293-1/+7
| | | | | | | Avoid deferring building shaders until draw time, should hopefully reduce any stuttering, as well as enable shader-db style analysis. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>