summaryrefslogtreecommitdiffstats
path: root/src/gallium/drivers
Commit message (Collapse)AuthorAgeFilesLines
* u_upload_mgr: allow specifying PIPE_USAGE_* for the upload bufferMarek Olšák2016-01-028-8/+13
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: remove alignment parameter from u_upload_createMarek Olšák2016-01-028-10/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_buffer manuallyMarek Olšák2016-01-021-1/+1
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_data manuallyMarek Olšák2016-01-028-11/+11
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* u_upload_mgr: pass alignment to u_upload_alloc manuallyMarek Olšák2016-01-0215-17/+23
| | | | | | | | | | The fixed alignment of u_upload_mgr will go away. This is the first step. The motivation is that one u_upload_mgr can have multiple users, each allocating from the same buffer, but requiring a different alignment. Reviewed-by: Nicolai Hähnle <[email protected]>
* nv50,nvc0: make sure there's pushbuf space and that we ref the bo earlyIlia Mirkin2016-01-014-6/+5
| | | | | | | | | | | First off, we can't flush in the middle of a command. Secondly requesting the extra push space might cause a flush to happen. If that flush happens, we'd have to do the PUSH_REFN again. So instead do PUSH_REFN after the push space request. This helps avoid rare crashes with supertuxkart in libdrm due to assertion failures. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nvc0: Set winding order regardless of domain.Kenneth Graunke2015-12-301-2/+4
| | | | | | | | | | Quads need to respect winding order, too - not just triangles. Fixes rendering in GFXBench 4.0's tessellation benchmark. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nvc0: add ARB_shader_draw_parameters supportIlia Mirkin2015-12-3013-15/+73
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gallium: add PIPE_CAP_DRAW_PARAMETERSIlia Mirkin2015-12-3014-0/+14
| | | | | | | | This allows the state tracker to know that the various draw parameters are available in vertex shaders. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* nv50/ir: attempt to do more constant folding on mad -> add conversionIlia Mirkin2015-12-301-11/+10
| | | | | | | | | The add might actually have a 0 as an argument, which would convert it into a mov. Make sure to detect that. Also avoid the hack of putting the immediate directly into the instruction, instead use a mov to put it into place and let the later LoadPropagation pass place it if possible. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: float(s32 & 0xff) = float(u8), not s8Ilia Mirkin2015-12-291-0/+3
| | | | | | | | Make sure to make conversion unsigned when we're ANDing the high bits away. Fixes corruption in dolphin. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* radeonsi: add RADEON_REPLACE_SHADERS debug optionNicolai Hähnle2015-12-293-5/+105
| | | | | | | | | | | This option allows replacing a single shader by a pre-compiled ELF object as generated by LLVM's llc, for example. This can be useful for debugging a deterministically occuring error in shaders (and has in fact helped find the causes of https://bugs.freedesktop.org/show_bug.cgi?id=93264). v2: drop the debug flag, use DEBUG_GET_ONCE_OPTION instead Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: count compilations in si_compile_llvmNicolai Hähnle2015-12-292-1/+2
| | | | | | | | This changes the count slightly (because of si_generate_gs_copy_shader), but this is only relevant for the driver-specific num-compilations query. It sets the stage for the next commit. Reviewed-by: Marek Olšák <[email protected]>
* r600: fix constant buffer size programmingGrazvydas Ignotas2015-12-292-2/+2
| | | | | | | | | | | | When buffer size is less than 16, zero ends up being programmed as size, which prevents the hardware from fetching the correct values. Fix it by combining shift and align so that the value is always rounded up. Cc: "11.1 11.0 10.6" <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92229 Signed-off-by: Grazvydas Ignotas <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* nir: Get rid of function overloadsJason Ekstrand2015-12-286-24/+24
| | | | | | | | | | | | | | | | | When Connor originally drafted NIR, he copied the same function+overload system that GLSL IR had with a few names changed. However, this double-indirection is not really needed and has only served to confuse people. Instead, let's just have functions which may not have unique names and may or may not have an implementation. If someone wants to do overload resolving, they can hav a hash table based function+overload system in the overload resolving pass. There's no good reason to keep it in core NIR. Reviewed-by: Connor Abbott <[email protected]> Acked-by: Kenneth Graunke <[email protected]> ir3 bits are Reviewed-by: Rob Clark <[email protected]>
* nvc0: don't forget to reset VTX_TMP bufctx slot after blit completionIlia Mirkin2015-12-271-0/+2
| | | | | | | Also release the scratch allocation if any. Signed-off-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nv50,nvc0: add a note when converting vertex elements using CPUIlia Mirkin2015-12-272-0/+6
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: spelling..Rob Clark2015-12-231-6/+6
| | | | Signed-off-by: Rob Clark <[email protected]>
* nir: Add a writemask to store intrinsics.Kenneth Graunke2015-12-221-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tessellation control shaders need to be careful when writing outputs. Because multiple threads can concurrently write the same output variables, we need to only write the exact components we were told. Traditionally, for sub-vector writes, we've read the whole vector, updated the temporary, and written the whole vector back. This breaks down with concurrent access. This patch prepares the way for a solution by adding a writemask field to store_var intrinsics, as well as the other store intrinsics. It then updates all produces to emit a writemask of "all channels enabled". It updates nir_lower_io to copy the writemask to output store intrinsics. Finally, it updates nir_lower_vars_to_ssa to handle partial writemasks by doing a read-modify-write cycle (which is safe, because local variables are specific to a single thread). This should have no functional change, since no one actually emits partial writemasks yet. v2: Make nir_validate momentarily assert that writemasks cover the complete value - we shouldn't have partial writemasks yet (requested by Jason Ekstrand). v3: Fix accidental SSBO change that arose from merge conflicts. v4: Don't try to handle writemasks in ir3_compiler_nir - my code for indirects was likely wrong, and TTN doesn't generate partial writemasks today anyway. Change them to asserts as requested by Rob Clark. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> [v3]
* nvc0: remove use of deprecated sw class identifierBen Skeggs2015-12-221-3/+5
| | | | | | | | | | | Also emits a method to properly bind the class to a subchannel, which was missing previously. The kernel currently doesn't care, but this will break if it ever decides to (ie. to support multiple sw classes). Signed-off-by: Ben Skeggs <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* nv50: fix g98+ vdec class allocationBen Skeggs2015-12-221-6/+51
| | | | | | | | | | | | | | | | | | | The kernel previously exposed incorrect classes for some of the chipsets that this code supports. It no longer does, but the older object ioctls have compatibility to avoid breaking userspace. This needs to be fixed before switching over to the newer interfaces. Rather than hardcoding chipset->class like the rest of the driver does, this makes use of (new) sclass queries to determine what's available. v2. - update to use symbolic class identifier from <nvif/class.h> Signed-off-by: Ben Skeggs <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* nouveau: remove use of deprecated nouveau_device_wrap()Ben Skeggs2015-12-222-0/+5
| | | | | | | | | | | | | | Switching to the newer libdrm entry-points tells libdrm that it's OK to make use of newer kernel interfaces. We want to be able to isolate any bugs to either the interfaces changes, or the use of NVIF itself. As such, this commit has a slight hack which forces libdrm to continue using the older kernel interfaces. Signed-off-by: Ben Skeggs <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* nouveau: fix screen creation failure pathsBen Skeggs2015-12-224-19/+23
| | | | | | | | | | | | | | The winsys layer would attempt to cleanup the nouveau_device if screen init failed, however, in most paths the pipe driver would have already destroyed it, resulting in accesses to freed memory etc. This commit fixes the problem by allowing the winsys to detect whether the pipe driver's destroy function needs to be called or not. Signed-off-by: Ben Skeggs <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* nouveau: return nouveau_screen from hw-specific creation functionsBen Skeggs2015-12-224-9/+9
| | | | | | | | | Kills off a void cast. Signed-off-by: Ben Skeggs <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* nouveau: remove use of deprecated nouveau_device::drm_versionBen Skeggs2015-12-227-12/+15
| | | | | | | | | v2. update for libdrm nouveau_drm::lib_version removal Signed-off-by: Ben Skeggs <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* nouveau: remove use of deprecated nouveau_device::fdBen Skeggs2015-12-222-0/+2
| | | | | | | Signed-off-by: Ben Skeggs <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Tested-by: Samuel Pitoiset <[email protected]>
* r600: fix viewport clipping handling (v2)Dave Airlie2015-12-223-12/+15
| | | | | | | | | | | | | | If oViewport is written, vertex reuse need to be turned off. If oViewport is constant, vertex reuse is fine, and VPORT_PROVOKE_DISABLE need to be set. (we don't have enough info to program VPORT_PROVOKE). Fixes: arb_viewport_array-render-viewport-2 and some CTS tests. v2: drop vport provoke write, drop initial state writing this on evergreen, only program it on evergreen. Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix viewport clipping handling. (v2)Dave Airlie2015-12-221-1/+4
| | | | | | | | | | | | | | | If oViewport is written, vertex reuse need to be turned off. If oViewport is constant, vertex reuse is fine, and VPORT_PROVOKE_DISABLE need to be set. (We don't know if oViewport is constant so we skip this.) Fixes: arb_viewport_array-render-viewport-2 and some CTS tests. v2: drop writing to provoke disable, drop write in initial state. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: drop VTX_CNT_EN write from initial stateDave Airlie2015-12-221-8/+4
| | | | | | | we always program this in shader stages atom now. Reviewed-by: Marek Olšák <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* gallium/radeon: fix regression in a number of driver queriesNicolai Hähnle2015-12-211-3/+3
| | | | | | | | This rather silly mistake was introduced by commit 01910676. Cc: "11.1" <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* vc4: Do instruction scheduling on the QIR to hide texture fetch latency.Eric Anholt2015-12-184-0/+624
| | | | | | | | | | | | | | | | | | | | This is a rewrite of vc4_opt_qpu_schedule.c to operate on QIR. Texture fetch can probably take as much as the rest of the cycles of the program, so it's important to hide our other cycles during it (which is hard to do after register allocation). Also, we can queue up multiple texture requests before collecting the resulting samples, so that we keep the texture unit busy more of the time. High-settings openarena performance +2.35849% +/- 0.221154% (n=7). Also about 2-3% on the multiarb demo. 8 piglit tests (ext_framebuffer_multisample accuracy depthstencil) go from failing in rendering to failing in register allocation, but hopefully I can fix that up with some better register pressure handling here. total instructions in shared programs: 87723 -> 88448 (0.83%) instructions in affected programs: 78411 -> 79136 (0.92%) total estimated cycles in shared programs: 276583 -> 246306 (-10.95%) estimated cycles in affected programs: 265691 -> 235414 (-11.40%)
* vc4: Fix latency handling for QPU texture scheduling.Eric Anholt2015-12-181-32/+50
| | | | | | There's only high latency between a complete texture fetch setup and collecting its result, not between each step of setting up the texture fetch request.
* vc4: Keep sample mask writes from being reordered after TLB writesEric Anholt2015-12-181-1/+2
| | | | | | Fixes a regression I noticed after introducing scheduling on the QIR. Cc: "11.1" <[email protected]>
* freedreno/ir3: fix 32-bit builds with pointer-to-int-cast error enabledRob Herring2015-12-181-1/+1
| | | | | | | | | Android builds with -Werror=pointer-to-int-cast causing an error on 32-bit builds. Cc: "11.0 11.1" <[email protected]> Signed-off-by: Rob Herring <[email protected]> Signed-off-by: Rob Clark <[email protected]>
* gallium/radeon: only dispose locally created target machine in ↵Nicolai Hähnle2015-12-181-2/+3
| | | | | | | | | radeon_llvm_compile Unify the cleanup paths of the function rather than duplicating code. Cc: "11.0 11.1" <[email protected]> Reviewed-by: Michel Dänzer <[email protected]>
* configure.ac: use pkg-config for libelfJonathan Gray2015-12-171-2/+3
| | | | | | | | | | | | | Use PKG_CHECK_MODULES to get the flags to link libelf v2: keep AC_CHECK_LIB as a fallback for elfutils provided libelf that doesn't install a pkg-config file. Signed-off-by: Jonathan Gray <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Tested-by: Michel Dänzer <[email protected]> Cc: "11.0 11.1" <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* nv50: free memory allocated by the prog which reads MP perf countersSamuel Pitoiset2015-12-161-0/+5
| | | | | | | | | This fixes a memory leak introduced in 6a9c151 ("nv50: add compute-related MP perf counters on G84+") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.1" <[email protected]>
* svga: don't use debug code in update_state() in release buildsBrian Paul2015-12-161-0/+4
| | | | Reviewed-by: Jose Fonseca <[email protected]>
* nv50,nvc0: free memory allocated by performance metricsSamuel Pitoiset2015-12-166-4/+22
| | | | | | | | | The destroy_query() helper was actually never called. This fixes a memory leak while monitoring performance metrics. Signed-off-by: Samuel Pitoiset <[email protected]> Acked-by: Ilia Mirkin <[email protected]> Cc: "11.1" <[email protected]>
* nvc0: free memory allocated by the prog which reads MP perf countersSamuel Pitoiset2015-12-161-0/+1
| | | | | | | | | This fixes a long time ago memory leak (even before all my query related changes). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: "11.0 11.1" <[email protected]>
* nvc0: fix metric-achieved_occupancy calculation on KeplerSamuel Pitoiset2015-12-161-1/+4
| | | | | | | The maximum number of resident warps per multiprocessor is 64 on Kepler instead of 48 on Fermi. Signed-off-by: Samuel Pitoiset <[email protected]>
* nvc0: remove old comment related to metric calculationsSamuel Pitoiset2015-12-151-11/+0
| | | | | | I forgot to remove it when I refactored all performance metrics. Signed-off-by: Samuel Pitoiset <[email protected]>
* vc4: Add support for dumping executed commands to a file.Eric Anholt2015-12-153-0/+94
| | | | | | | | | | The VC4_DEBUG=cl,qpu is nice and all, but I want to be able to get more detailed dumps, and to replay the same exact commands in simulation. For that I need a dump with all of the VBOs, shaders, shader recs, etc. This dump can be parsed by vc4-gpu-tools. For now this is only doable from simulator mode, because otherwise we don't have access to the RCL contents generated by the kernel.
* vc4: Import updated vc4_drm.h with hang state.Eric Anholt2015-12-151-0/+45
|
* vc4: Only update vc4->msaa when the framebuffer changes.Eric Anholt2015-12-151-7/+0
| | | | | | Any update here should have been the same as in vc4_set_framebuffer_state(), except for the point where vc4_blit.c temporarily sets different state for its different buffers.
* vc4: Don't consider nr_samples==1 surfaces to be MSAA.Eric Anholt2015-12-156-21/+25
| | | | | | This is apparently a weirdness of gallium -- nr_samples==1 is occasionally used and means the same thing as nr_samples==0. Fixes a bunch of ARB_framebuffer_srgb blit cases in piglit.
* vc4: Fix min() wrapper definition for the simulator's kernel code.Eric Anholt2015-12-151-1/+1
|
* vc4: Warn instead of abort()ing on exec ioctl failures.Eric Anholt2015-12-151-3/+5
| | | | | | | | It's really harsh to abort() the X Server because of a momentary failure (particularly -ENOMEM). I don't see a way to pass an -ENOMEM up the stack from here, but we can at least log to stderr before proceeding on. Cc: "11.1" <[email protected]>
* radeonsi: fix perfcounter selection for SI_PC_MULTI_BLOCK layoutsNicolai Hähnle2015-12-151-1/+1
| | | | | | The incorrectly computed register count caused lockups. Reviewed-by: Edward O'Callaghan <[email protected]>
* gallium/radeon: remove unnecessary test in r600_pc_query_add_resultNicolai Hähnle2015-12-151-3/+0
| | | | | | | This test is a left-over of the initial development. It is unneeded and misleading, so let's get rid of it. Reviewed-by: Edward O'Callaghan <[email protected]>