summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* nvc0: allow fragment shader inputs to use indirect indexingIlia Mirkin2016-01-141-1/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* radeonsi: don't miss changes to SPI_TMPRING_SIZEMarek Olšák2016-01-141-2/+7
| | | | | | | | | | I'm not sure about the consequences of this bug, but it's definitely dangerous. This applies to SI, CIK, VI. Cc: 11.0 11.1 <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* svga: add DXGenMips command supportCharmaine Lee2016-01-1410-26/+144
| | | | | | | | | | | | | For those formats that support hw mipmap generation, use the DXGenMips command. Otherwise fallback to the mipmap generation utility. Tested with piglit, OpenGL apps (Heaven, Turbine, Cinebench) v2: make sure the texture surface was created with the render target bind flag set relocation flag to SVGA_RELOC_WRITE for the texture surface Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* svga: add num-generate-mipmap HUD queryCharmaine Lee2016-01-143-1/+12
| | | | | | | | The actual increment of the num-generate-mipmap counter will be done in a subsequent patch when hw generate mipmap is supported. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium/st: add pipe_context::generate_mipmap()Charmaine Lee2016-01-1415-0/+51
| | | | | | | | | | | | | | | | This patch adds a new interface to support hardware mipmap generation. PIPE_CAP_GENERATE_MIPMAP is added to allow a driver to specify if this new interface is supported; if not supported, the state tracker will fallback to mipmap generation by rendering/texturing. v2: add PIPE_CAP_GENERATE_MIPMAP to the disabled section for all drivers v3: add format to the generate_mipmap interface to allow mipmap generation using a format other than the resource format v4: fix return type of trace_context_generate_mipmap() Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* gallium/radeon: do not reallocate user memory buffersNicolai Hähnle2016-01-142-8/+31
| | | | | | | | | The whole point of AMD_pinned_memory is that applications don't have to map buffers via OpenGL - but they're still allowed to, so make sure we don't break the link between buffer object and user memory unless explicitly instructed to. Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: implement PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-145-9/+22
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: reset valid_buffer_range on PIPE_TRANSFER_DISCARD_WHOLE_RESOURCENicolai Hähnle2016-01-141-0/+3
| | | | | | | This accomodates a streaming pattern where the discard flag is set when the application wraps back to the beginning of the buffer. Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_CAP_INVALIDATE_BUFFERNicolai Hähnle2016-01-1414-0/+14
| | | | | | | | | It makes sense to re-use pipe->invalidate_resource for the purpose of glInvalidateBufferData, but this function is already implemented in vc4 where it doesn't have the expected behavior. So add a capability flag to indicate that the driver supports the expected behavior. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: move POSITION and FACE fragment shader inputs to system valuesMarek Olšák2016-01-133-45/+25
| | | | | | And FACE becomes integer instead of float. Reviewed-by: Edward O'Callaghan <[email protected]>
* radeonsi: simplify gl_FragCoord behaviorMarek Olšák2016-01-131-23/+22
| | | | | | It will become a system value, not an input. Reviewed-by: Edward O'Callaghan <[email protected]>
* llvmpipe: (trivial) use cast wrapper for __m128d to __m128 castsRoland Scheidegger2016-01-131-2/+2
| | | | some compiler was unhappy.
* llvmpipe: avoid most 64 bit math in rasterizationRoland Scheidegger2016-01-132-65/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trick here is to recognize that in the c + n * dcdx calculations, not only can the lower FIXED_ORDER bits not change (as the dcdx values have those all zero) but that this means the sign bit of the calculations cannot be different as well, that is sign(c + n*dcdx) == sign((c >> FIXED_ORDER) + n*(dcdx >> FIXED_ORDER)). That shaves off more than enough bits to never require 64bit masks. A shifted plane c value could still easily exceed 32 bits, however since we throw out planes which are trivial accept even before binning (and similarly don't even get to see tris for which there was a trivial reject plane)) this is never a problem. The idea isnt't all that revolutionary, in fact something similar was tried ages ago (9773722c2b09d5f0615a47cecf4347859474dc56) back when the values were only 32 bit anyway. I believe now it didn't quite work then because the adjustment needed for testing trivial reject / partial masks wasn't handled correctly. This still keeps the separate 32/64 bit paths for now, as the 32 bit one still looks minimally simpler (and also because if we'd pass in dcdx/dcdy/eo unscaled from setup which would be a good reason to ditch the 32 bit path, we'd need to change the special-purpose rasterization functions for small tris). This passes piglit triangle-rasterization (-fbo -auto -max_size -subpixelbits 8) and triangle-rasterization-overdraw (with some hacks to make it work correctly with large sizes) easily (full piglit as well of course, but most tests wouldn't use triangles large enough to be affected, that is tris with a bounding box over 128x128). The profiler says indeed time spent in rast_tri functions is reduced substantially, BUT of course only if the tris are large. I measured a 3% improvement in mesa gloss demo when supersized to twice the screen size... Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: scale up bounding box planes to subpixel precisionRoland Scheidegger2016-01-133-30/+30
| | | | | | | | | Otherwise some planes we get in rasterization have subpixel precision, others not. Doesn't matter so far, but will soon. (OpenGL actually supports viewports with subpixel accuracy, so could even do bounding box calcs with that). Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* llvmpipe: add sse code for fixed position calculationRoland Scheidegger2016-01-131-8/+50
| | | | | | | | | | | | | | This is quite a few less instructions, albeit still do the 2 64bit muls with scalar c code (they'd need way more shuffles, plus fixup for the signed mul so it totally doesn't seem worth it - x86 can do 32x32->64bit signed scalar muls natively just fine after all (even on 32bit). (This still doesn't have a very measurable performance impact in reality, although profiler seems to say time spent in setup indeed has gone down by 10% or so overall. Maybe good for a 3% or so improvement in openarena.) Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* nvc0: do not force re-binding of compute constbufs on FermiSamuel Pitoiset2016-01-121-1/+1
| | | | | | | | | | Re-binding compute constant buffers after launching a grid have no effects because they are not currently validated and because dirty_cp is not updated accordingly. This might also prevent weird future behaviours when UBOs will be bound for compute. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nvc0: remove useless goto in nvc0_launch_grid()Samuel Pitoiset2016-01-121-6/+4
| | | | | | | Trivial. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: the whole point of data array is to hand out regular registersIlia Mirkin2016-01-111-1/+1
| | | | | Fixes: 0d3051f75a (nv50/ir: Fix scratch allocation size and file) Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: Fix scratch allocation size and filePierre Moreau2016-01-092-3/+3
| | | | | Signed-off-by: Pierre Moreau <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: use a face sysval to avoid the useless back-and-forth conversionIlia Mirkin2016-01-085-9/+2
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add ir3_compiler to gitignoreIlia Mirkin2016-01-081-0/+1
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_SHADER_BUFFER_OFFSET_ALIGNMENTIlia Mirkin2016-01-0814-13/+27
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium: add PIPE_SHADER_CAP_MAX_SHADER_BUFFERSIlia Mirkin2016-01-089-0/+16
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add ureg support for image declsIlia Mirkin2016-01-083-7/+15
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* util/pstipple: allow fragment shader POSITION to be a system valueMarek Olšák2016-01-084-4/+8
| | | | | Reviewed-by: Edward O'Callaghan <[email protected] Reviewed-by: Brian Paul <[email protected]>
* gallium: add caps for POSITION and FACE system valuesMarek Olšák2016-01-0814-0/+28
| | | | | | | v2: document the integer behavior Reviewed-by: Edward O'Callaghan <[email protected] Reviewed-by: Brian Paul <[email protected]>
* radeon, si: Use TGSI chan name defines in lp_build_emit_fetch() callsEdward O'Callaghan2016-01-082-8/+8
| | | | | Signed-off-by: Edward O'Callaghan <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nvc0: add ARB_indirect_parameters supportIlia Mirkin2016-01-075-6/+313
| | | | | | | I chose to make separate macros for this due to the additional complexity and extra scratch usage. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: add support for real ARB_multi_draw_indirectIlia Mirkin2016-01-074-18/+47
| | | | | | | The draw groups are now split up into groups of 32 if there's a non-packed stride, or in groups of 400-500 if the draw data is packed. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0: adjust indirect draw macros to handle multiple draws at onceIlia Mirkin2016-01-073-52/+101
| | | | | | | These are still invoked one at a time, but the underlying macro can handle multiple draws. Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add caps to expose support for multi indirect drawsIlia Mirkin2016-01-0714-0/+28
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* llvmpipe: do 64bit plane calculations in the sse pathRoland Scheidegger2016-01-082-50/+70
| | | | | | | | | | | | The sse path was pretty much disabled for practical purposes because the largest allowed fb size was 128x128. So, adapt it for 64bit plane calculations. This is actually not that difficult, though a problem is that we can't do a signed 32x32->64bit mul, only unsigned, so need to fix that up. Overall, the code still looks reasonable, though it's not like changes there in setup really make much of a difference in the end... Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: don't store eo as 64bit intRoland Scheidegger2016-01-084-11/+16
| | | | | | | | | | | eo, just like dcdx and dcdy, cannot overflow 32bit. Store it as unsigned though just in case (it cannot be negative, but in theory twice as big as dcdx or dcdy so this gives it one more bit). This doesn't really change anything, albeit it might help minimally on 32bit archs. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* llvmpipe: use aligned data for the assembly program in setupRoland Scheidegger2016-01-081-17/+21
| | | | | | | | | Back in the day (before 24678700edaf5bb9da9be93a1367f1a24cfaa471) the values were not actually in a struct but even then I can't see why we didn't simply align the values. Especially since it's trivial to do so. (Not that it actually matters since the code is pretty much unused for now.) Reviewed-by: Oded Gabbay <[email protected]>
* radeonsi: adjust the parameters of si_shader_dumpMarek Olšák2016-01-073-20/+11
| | | | | | | The function will be extended to dump all binaries shaders will consist of, so si_shader* makes sense here. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_shader_dump call out of si_compile_llvmMarek Olšák2016-01-072-2/+11
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: inline si_shader_binary_readMarek Olšák2016-01-073-11/+3
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_shader_dump call out of si_shader_binary_readMarek Olšák2016-01-073-20/+21
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: separate shader dumping code to si_shader_dump and *_dump_statsMarek Olšák2016-01-071-12/+30
| | | | | | | Eventually, I'd like to dump stats for several combined binaries, which is why you don't see a binary parameter in si_shader_dump_stats Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add si_shader_destroy_binaryMarek Olšák2016-01-072-5/+10
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't pass si_shader to si_compile_llvmMarek Olšák2016-01-073-18/+28
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move si_shader_binary_upload out of si_compile_llvmMarek Olšák2016-01-072-4/+9
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: always keep shader code, rodata, and relocs in memoryMarek Olšák2016-01-071-7/+3
| | | | | | | We won't compile shaders in draw calls, but we will concatenate shader binaries according to states in draw calls, so keep the binaries. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't pass si_shader to si_shader_binary_readMarek Olšák2016-01-073-14/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't pass si_shader to si_shader_binary_read_configMarek Olšák2016-01-073-17/+19
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add struct si_shader_configMarek Olšák2016-01-075-64/+68
| | | | | | | There will be 1 config per variant, which will be a union of configs from {prolog, main, epilog}. For now, just add the structure. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move NULL exporting into a separate functionMarek Olšák2016-01-071-15/+22
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move MRT color exporting into a separate functionMarek Olšák2016-01-071-41/+57
| | | | | | This will be used by a fragment shader epilog. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use EXP_NULL for pixel shaders without outputsMarek Olšák2016-01-072-6/+3
| | | | | | This never happens currently. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: only use LLVMBuildLoad once when updating color outputs at the endMarek Olšák2016-01-071-47/+20
| | | | | | | | | | | without LLVMBuildStore. So: - do LLVMBuildLoad - update the values as necessary - export Reviewed-by: Nicolai Hähnle <[email protected]>