summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers/nouveau
Commit message (Collapse)AuthorAgeFilesLines
* gallium/drivers: Sanitize NULL checks into canonical formEdward O'Callaghan2015-12-062-3/+3
| | | | | | | | | | Use NULL tests of the form `if (ptr)' or `if (!ptr)'. They do not depend on the definition of the symbol NULL. Further, they provide the opportunity for the accidental assignment, are clear and succinct. Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallium/drivers: Trivial code-style cleanupEdward O'Callaghan2015-12-062-2/+2
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* gallium/drivers/nouveau: Make use of ARRAY_SIZE macroEdward O'Callaghan2015-12-0614-22/+20
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* nv50/ir: fold shl + mul with immediatesIlia Mirkin2015-12-051-0/+16
| | | | | | | | | | | | | | On SM20 this gives: total instructions in shared programs : 6299222 -> 6294240 (-0.08%) total gprs used in shared programs : 944139 -> 944068 (-0.01%) total local used in shared programs : 54116 -> 54116 (0.00%) local gpr inst bytes helped 0 126 2781 2781 hurt 0 55 11 11 Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: propagate indirect loads into instructionsIlia Mirkin2015-12-051-0/+52
| | | | | | | | | | | | | | | | This way $r1 = $r0 + 4; c1[$r1] becomes c1[$r0+4]. On SM35: total instructions in shared programs : 6206257 -> 6185058 (-0.34%) total gprs used in shared programs : 911045 -> 910722 (-0.04%) total local used in shared programs : 39072 -> 39072 (0.00%) local gpr inst bytes helped 0 417 4195 4195 hurt 0 280 0 0 Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: flip shl(add, imm) into add(shl, imm)Ilia Mirkin2015-12-051-4/+34
| | | | | | | | | | | | | | | | | This works when the add also has an immediate. This often happens in address calculations. These addresses can then be inlined as well. On code targeted to SM35: total instructions in shared programs : 6223346 -> 6206257 (-0.27%) total gprs used in shared programs : 911075 -> 911045 (-0.00%) total local used in shared programs : 39072 -> 39072 (0.00%) local gpr inst bytes helped 0 119 3664 3664 hurt 0 74 15 15 Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: expose a group of performance metrics for SM30 (Kepler)Samuel Pitoiset2015-12-052-2/+8
| | | | | | | This allows to monitor these performance metrics through GL_AMD_performance_monitor. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: re-introduce performance metrics for SM30 (Kepler)Samuel Pitoiset2015-12-052-5/+188
| | | | | | | This implements more performance metrics than the previous support, but some other metrics still need to be figured out. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: remove useless counting operations for MP countersSamuel Pitoiset2015-12-051-101/+5
| | | | | | Those bits were related to old performance metrics support. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: remove old performance metrics support on KeplerSamuel Pitoiset2015-12-052-37/+0
| | | | | | | These performance metrics will be re-introduced in an upcoming patch that will follow the same design as Fermi. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: remove wrong inst_issued HW SM perf counter on KeplerSamuel Pitoiset2015-12-052-3/+0
| | | | | | | inst_issued is performance metric not a hardware event on Kepler (SM30). It will be re-introduced in an upcoming patch. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: add missing HW SM perf counters for SM30 (Kepler)Samuel Pitoiset2015-12-053-0/+10
| | | | | | | SM30 is the compute capability version for GK104/GK106/GK107. This also introduces a new signal group selection called UNK0F. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: fix the comment that describe MP counters storage on KeplerSamuel Pitoiset2015-12-051-0/+5
| | | | Signed-off-by: Samuel Pitoiset <[email protected]>
* nv50/ir: replace zeros in movs as wellIlia Mirkin2015-12-031-2/+1
| | | | | | | | | The original change to put zeroes directly into instructions created conditional mov's with the zero immediate. However that can't be emitted, so make sure to replace the zero with r63. Fixes: 52a800a68 (nv50/ir: allow immediate 0 to be loaded anywhere) Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fold fma/mad when all 3 args are immediatesIlia Mirkin2015-12-031-0/+30
| | | | | | This happens pretty rarely, but might as well do it when it does. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: avoid looking at uninitialized srcMods entriesIlia Mirkin2015-12-032-2/+2
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: fix DCE to not generate 96-bit loadsIlia Mirkin2015-12-031-1/+31
| | | | | | | | | | A situation where there's a 128-bit load where the last component gets DCE'd causes a 96-bit load to be generated, which no GPU can actually emit. Avoid generating such instructions by scaling back to 64-bit on the first load when splitting. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: fix moves to/from flagsIlia Mirkin2015-12-022-2/+7
| | | | | | | | Noticed this when looking at a trace that caused flags to spill to/from registers. The flags source/destination wasn't encoded correctly according to both envydis and nvdisasm. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: don't forget to mark flagsDef on cvt in txb loweringIlia Mirkin2015-12-021-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: fix instruction permutation logicIlia Mirkin2015-12-021-1/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: the mad source might not have a defining instructionIlia Mirkin2015-12-021-1/+1
| | | | | | | For example if it's $r63 (aka 0), there won't be a definition. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: make sure entire graph is reachableIlia Mirkin2015-12-021-0/+1
| | | | | | | | The algorithm expects the entire CFG to be reachable, so make sure that we hit every node. Otherwise we will end up with uninitialized data, memory corruption, etc. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: deal with loops with no breaksIlia Mirkin2015-12-021-0/+6
| | | | | | | | | | | For example if there are only returns, the break bb will not end up part of the CFG. However there will have been a prebreak already emitted for it, and when hitting the RET that comes after, we will try to insert the current (i.e. break) BB into the graph even though it will be unreachable. This makes the SSA code sad. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nvc0/ir: fold postfactor into immediateIlia Mirkin2015-12-021-0/+6
| | | | | | | | SM20-SM50 can't emit a post-factor in the presence of a long immediate. Make sure to fold it in. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50/ir: allow immediate 0 to be loaded anywhereIlia Mirkin2015-12-021-0/+6
| | | | | | | There's a post-RA fixup to replace 0's with $r63 (or $r127 if too many regs are used), so just as nvc0, let an immediate 0 be loaded anywhere. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir/gk110: add memory barriers support for GK110Samuel Pitoiset2015-12-021-0/+12
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: do not call textureMask() for surface opsSamuel Pitoiset2015-12-021-1/+2
| | | | | | | | | | | That texture mask thing doesn't seem to be needed for surface ops, so just as nve4+, let do that only for texture ops. This fixes a segfault with 'test_surface_st' from gallium/tests/trivial/compute.c on Fermi because this test uses sustp. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: allow to create resources other than buffersSamuel Pitoiset2015-12-015-4/+9
| | | | | | | | | | | | For the compute support, we might stick buffers as surfaces. This fixes an assertion when executing src/gallium/tests/trivial/compute. To avoid using these "restricted" surfaces as render targets, these assertions have been moved. Note that it's already handled for the framebuffer thing on nvc0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: always display the opcode number for unknown instructionsSamuel Pitoiset2015-11-292-2/+2
| | | | | | | This helps in debugging unknown instructions. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nouveau: move interlaced assert down in nouveau_vp3_video_buffer_createJulien Isorce2015-11-251-1/+1
| | | | | | | | templat->interlaced is 0 if not NV12 which is the case currently when using VPP. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix (un)spilling of 3-wide resultsIlia Mirkin2015-11-221-4/+42
| | | | | | | | | There is no 96-bit load/store operations, so we have to split it up into a 32-bit parts, with a split/merge around it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90348 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50,nvc0: properly handle buffer storage invalidation on dsa bufferIlia Mirkin2015-11-222-15/+17
| | | | | | | | | In case that the buffer has no bind at all, assume it can be a regular buffer. This can happen on buffers created through the ARB_dsa interfaces. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nouveau: use the buffer usage to determine placement when no bindingIlia Mirkin2015-11-221-2/+6
| | | | | | | | | | | With ARB_direct_state_access, buffers can be created without any binding hints at all. We still need to allocate these buffers to VRAM or GART, as we don't have logic down the line to place them into GPU-mappable space. Ideally we'd be able to shift these things around based on usage. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92438 Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50: expose two groups of compute-related MP perf countersSamuel Pitoiset2015-11-206-2/+63
| | | | | | | This turns on GL_AMD_performance_monitor. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* gallium: add the concept of batch queriesNicolai Hähnle2015-11-202-0/+2
| | | | | | | | | | | | | | | | | Some drivers (in particular radeon[si], but also freedreno judging from a quick grep) may want to expose performance counters that cannot be individually enabled or disabled. Allow such drivers to mark driver-specific queries as requiring a new type of batch query object that is used to start and stop a list of queries simultaneously. v3: adjust recently added nv50 queries v2: documentation for create_batch_query Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* gallium: remove pipe_driver_query_group_info field typeNicolai Hähnle2015-11-201-4/+0
| | | | | | | | | | | | | | | | | | | This was only used to implement an unnecessarily restrictive interpretation of the spec of AMD_performance_monitor. The spec says A performance monitor consists of a number of hardware and software counters that can be sampled by the GPU and reported back to the application. I guess one could take this as a requirement that counters _must_ be sampled by the GPU, but then why are they called _software_ counters? Besides, there's not much reason _not_ to expose all counters that are available, and this simplifies the code. v3: add a missing change in the nouveau driver (thanks Samuel Pitoiset) Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* nv50: allow using inline vertex data submit when gl_VertexID is usedSamuel Pitoiset2015-11-195-15/+46
| | | | | | | | | | | | | | | The hardware can actually generates vertexid when vertices come from a client-side buffer like when glDrawElements is used. This doesn't fix (or break) any piglit tests but it improves the previous attempt of Ilia (c830d19 "nv50: avoid using inline vertex data submit when gl_VertexID is used") The only disadvantage is that only works on G84+, but we don't really care of that weird and old NV50 chipset. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50: add NV84_3D macroSamuel Pitoiset2015-11-193-3/+4
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0/ir: actually emit AFETCH on keplerIlia Mirkin2015-11-181-0/+3
| | | | | | | | Looks like this was forgotten in the commit which added the AFETCH logic. Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nv50: add missing header into the sources listEmil Velikov2015-11-161-0/+1
| | | | | | Otherwise it won't end up in the tarball. Signed-off-by: Emil Velikov <[email protected]>
* nv50,nvc0: disable render condition around clear_* functionsIlia Mirkin2015-11-144-0/+32
| | | | | | | Only the regular "clear" call is supposed to respect the render condition. The rest should ignore it. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: add support for performance metrics on G84+Samuel Pitoiset2015-11-144-3/+259
| | | | | | | | Currently only one metric is exposed but more will be added later. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: add compute-related MP perf counters on G84+Samuel Pitoiset2015-11-149-2/+548
| | | | | | | | | | | | | | | | | | These compute-related MP performance counters have been reverse engineered using CUPTI which is part of NVIDIA CUDA. As for nvc0, we use a compute kernel to read out those performance counters, and the command stream to configure them. Note that Tesla only exposes 4 MP performance counters, while Fermi has 8. Only G84+ is supported because G80 is an old and weird card. Tested on G84, G96, G200, MCP79 and GT218 with glxgears, glxspheres64, xonotic-glx, heaven and valley. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: implement a basic compute supportSamuel Pitoiset2015-11-1410-9/+1006
| | | | | | | | | | | | | | | | | | | | | | | | | | This adds the ability to launch simple compute kernels like the one I will use to read out MP performance counters in the upcoming patch. This compute support is based on the work of Francisco Jerez (aka curro) that he did as part of his EVoC project in 2011/2012 to get OpenCL working on Tesla. His original work can be found here: https://github.com/curro/mesa/commits/nv50-compute I did some improvements on the original code, like fixing using both 3D and COMPUTE simultaneously, improving global buffers binding, and making the code closer to what nvc0 already does. This compute support has been tested by Pierre Moreau and myself with some compute kernels. This is a step towards OpenCL. Speaking about this, it seems like compute programs overlap fragment programs when they are used both. To fix this, we need to re-validate fragment programs when binding compute programs and vice versa. Note that, textures, samplers and surfaces still need to be implemented. Signed-off-by: Samuel Pitoiset <[email protected]> Tested-by: Pierre Moreau <[email protected]> Acked-by: Ilia Mirkin <[email protected]>
* nv50: free interpolation parameters in nv50_program_destroy()Samuel Pitoiset2015-11-141-1/+1
| | | | | | | | As for nvc0, we need to free memory allocated by interpolation parameters. This fixes a memory leak spotted by valgrind. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: reduce the number of GPR used when reading MP perf countersSamuel Pitoiset2015-11-141-1/+2
| | | | | | | No need to allocate more GPR than used in the compute kernel which reads MP performance counters on Fermi. Signed-off-by: Samuel Pitoiset <[email protected]>
* nouveau: don't expose HEVC decoding supportIlia Mirkin2015-11-141-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* nvc0/ir: add support for TGSI_SEMANTIC_HELPER_INVOCATIONIlia Mirkin2015-11-126-0/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: add ARB_clear_texture supportIlia Mirkin2015-11-115-7/+101
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_CLEAR_TEXTURE and clear_texture prototypeIlia Mirkin2015-11-113-0/+3
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>