summaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* nv30: create uploader after pipe->screen is setIlia Mirkin2017-03-191-6/+6
| | | | | | Fixes crashes after recent upload rework. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50,nvc0: enable TEX_LZ and TXF_LZIlia Mirkin2017-03-183-4/+17
| | | | | | | | | There should be minimal gain, if any, for nvc0, but nv50 may end up noticing more often that the lod argument is uniform. This, in turn, will remove the need for some unnecessary transformations, which were being hit due to the checks being done pre-ssa. Signed-off-by: Ilia Mirkin <[email protected]>
* nvc0/ir: treat FMA like MAD for operand propagationKarol Herbst2017-03-181-0/+1
| | | | | | | | | | | | | | | | | | Helps mainly Feral-ported games, due to their use of fma() shader-db changes: total instructions in shared programs : 3901147 -> 3842505 (-1.50%) total gprs used in shared programs : 471258 -> 467359 (-0.83%) total local used in shared programs : 27405 -> 27361 (-0.16%) total bytes used in shared programs : 35749888 -> 35214176 (-1.50%) local gpr inst bytes helped 17 1829 4091 4091 hurt 4 44 3 3 Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]> Cc: [email protected]
* st/dri: wait for thread to finish before unbinding contextTimothy Arceri2017-03-181-0/+3
| | | | | | | | | | Fixes a bunch of piglit crashes that hit an assert() when trying to delete the framebuffer. The assert() was triggered because WinSysDrawBuffer was set to NULL before glDeleteFramebuffers() was called. Tested-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: formalize that create_batch_query doesn't need pipe_contextMarek Olšák2017-03-173-13/+12
| | | | Reviewed-by: Timothy Arceri <[email protected]>
* gallium/radeon: formalize that create_query doesn't need pipe_contextMarek Olšák2017-03-173-32/+32
| | | | | | for threaded gallium Reviewed-by: Timothy Arceri <[email protected]>
* gallium/radeon: reference pipe_resource in pipe_transferMarek Olšák2017-03-172-2/+5
| | | | | | for threaded gallium Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: compile all TGSI compute shaders asynchronouslyMarek Olšák2017-03-171-44/+81
| | | | | | required by threaded gallium Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: require that compiler threads are enabledMarek Olšák2017-03-172-11/+13
| | | | | | | threaded gallium can't use pipe_context's LLVM target machine, because create_shader_selector can be called from a non-driver thread. Reviewed-by: Timothy Arceri <[email protected]>
* trace: remove leftover assertions after pipe_resource wrapping removalMarek Olšák2017-03-171-6/+0
|
* gallium/u_upload: make the first persistent mapping unsynchronizedMarek Olšák2017-03-171-0/+1
| | | | This is simpler for drivers.
* gallium: implement the backend of threaded GL dispatchMarek Olšák2017-03-164-0/+56
| | | | | | Acked-by: Timothy Arceri <[email protected]> Tested-by: Dieter Nützel <[email protected]> Tested-by: Mike Lothian <[email protected]>
* gallivm: (trivial) remove duplicated lineRoland Scheidegger2017-03-161-1/+0
| | | | pointed out by clang (stored value never read)
* draw: (trivial) remove a unnecessary lp_build_alloca()Roland Scheidegger2017-03-161-2/+0
| | | | pointed out by clang (stored value never read)
* swr: support layer output in geometry shadersIlia Mirkin2017-03-151-0/+2
| | | | | | | | | This makes bin/gl-3.2-layered-rendering-gl-layer-render fail only with 2DMS_ARRAY, which is expected given the lackluster MSAA support. However all the regular types pass. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Bruce Cherniak <[email protected]>
* gallium/tgsi: Treat UCMP sources as floats to match the GLSL-to-TGSI pass ↵Francisco Jerez2017-03-152-16/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | expectations. Currently the GLSL-to-TGSI translation pass assumes it can use floating point source modifiers on the UCMP instruction. See the bug report linked below for an example where an unrelated change in the GLSL built-in lowering code for atan2 (e9ffd12827ac11a2d2002a42fa8eb1) caused the generation of floating-point ir_unop_neg instructions followed by ir_triop_csel, which is translated into UCMP with a negate modifier on back-ends with native integer support. Allowing floating-point source modifiers on an integer instruction seems like rather dubious design for a transport IR, since the same semantics could be represented as a sequence of MOV+UCMP instructions instead, but supposedly this matches the expectations of TGSI back-ends other than tgsi_exec, and the expectations of the DX10 API. I take no responsibility for future headaches caused by this inconsistency. Fixes a regression of piglit glsl-fs-tan-1 on softpipe introduced by the above-mentioned glsl front-end commit. Even though the commit that triggered the regression doesn't seem to have made it to any stable branches yet, this might be worth back-porting since I don't see any reason why the bug couldn't have been reproduced before that point. Suggested-by: Roland Scheidegger <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99817 Reviewed-by: Roland Scheidegger <[email protected]>
* swr: validate backend state numAttributesTim Rowley2017-03-151-0/+2
| | | | | | | | General protection and prevents us from smashing the stack on the first clear state validation (a7b8d50bcb). Fixes crash using icc. Reviewed-by: Bruce Cherniak <[email protected]>
* radeonsi: implement TGSI opcodes TEX_LZ and TXF_LZMarek Olšák2017-03-152-6/+16
| | | | | | | | | | This massively decreases VGPR spilling for DiRT Showdown, because we no longer have to use v4i32 for 2D fetches when level == 0. We now use v2i32 for those cases. DiRT Showdown - Spilled VGPRs: -26 (-81%) This surprisingly doesn't have any useful effect on performance (+ 0.05%).
* gallium: add TGSI opcodes TEX_LZ and TXF_LZMarek Olšák2017-03-154-5/+39
| | | | for better code generation in radeonsi
* gallium: add PIPE_CAP_TGSI_TEX_TXF_LZMarek Olšák2017-03-1517-0/+18
|
* radeonsi: disable sinking common instructions down to the end blockSamuel Pitoiset2017-03-151-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Initially this was a workaround for a bug introduced in LLVM 4.0 in the SimplifyCFG pass that caused image instrinsics to disappear (because they were badly sunk). Finally, this is a win because it decreases SGPR spilling and increases the number of waves a bit. Although, shader-db results are good I think we might want to remove it in the future once the issue is fixed. For now, enable it for LLVM >= 4.0. This also fixes a rendering issue with the speedometer in Dirt Rally. More information can be found here https://reviews.llvm.org/D26348. Thanks to Dave Airlie for the patch. v2: - add a FIXME comment - use if (HAVE_LLVM >= 0x0400) instead Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99484 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97988 Signed-off-by: Samuel Pitoiset <[email protected]> Cc: 17.0 <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* tgsi: add missing compute shader entry in tgsi_get_processor_name()Samuel Pitoiset2017-03-151-0/+2
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: clean up tex_fetch_ptrs()Samuel Pitoiset2017-03-151-6/+4
| | | | | | | | Will also help when the src sampler register will be TGSI_FILE_CONSTANT for bindless. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* winsys/amdgpu: use drmGetDevice2 APIEmil Velikov2017-03-151-2/+2
| | | | | | | | | | | Analogous to previous commit Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98502 Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Eric Engestrom <[email protected]> Tested-by: Mike Lothian <[email protected]>
* r600: refactor binding code for attach buffer to CB.Dave Airlie2017-03-151-33/+78
| | | | | | | | | This refactors out the code and fixes it up to be used for images later. It uses the code in the current RAT binding for compute. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: refactor out CB setup.Dave Airlie2017-03-151-104/+143
| | | | | | | | | This moves the code to create CB info out into a separate function so it can be reused in images code to create RATs. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: refactor texture resource words setup code.Dave Airlie2017-03-151-88/+131
| | | | | | | | This refactors out the code to setup a texture resource so we can reuse it later from the images code. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: factor out the code to initialise a buffer resource.Dave Airlie2017-03-151-29/+51
| | | | | | | | | | This takes the code required to initialise a buffer resource out of the texture buffer code, into it's own function. This is going to be used for the image support later. Reviewed-by: Edward O'Callaghan <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600g: make framebuffer atom rely on dual src blend state.Dave Airlie2017-03-154-2/+7
| | | | | | | | In order to make ARB_shader_image_load_store, we have to share the CB space with RATs, so we should only steal the dual src space if we have dual src enabled. Signed-off-by: Dave Airlie <[email protected]>
* nir: Rework conversion opcodesJason Ekstrand2017-03-145-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | The NIR story on conversion opcodes is a mess. We've had way too many of them, naming is inconsistent, and which ones have explicit sizes was sort-of random. This commit re-organizes things and makes them all consistent: - All non-bool conversion opcodes now have the explicit size in the destination and are named <src_type>2<dst_type><size>. - Integer <-> integer conversion opcodes now only come in i2i and u2u forms (i2u and u2i have been removed) since the only difference between the different integer conversions is whether or not they sign-extend when up-converting. - Boolean conversion opcodes all have the explicit size on the bool and are named <src_type>2<dst_type>. Making things consistent also allows nir_type_conversion_op to be moved to nir_opcodes.c and auto-generated using mako. This will make adding int8, int16, and float16 versions much easier when the time comes. Reviewed-by: Eric Anholt <[email protected]>
* gallium/radeon: disable the shader cache if dumping shadersMarek Olšák2017-03-131-0/+5
| | | | | | otherwise, cached shaders aren't dumped. Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi: mark all bound shader buffer ranges as initializedMarek Olšák2017-03-131-0/+3
| | | | | | | | This should prevent cases when a buffer was incorrectly mapped without synchronization just because this wasn't done. Cc: 13.0 17.0 <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* gallium/hud: check NULL return from u_upload_allocJulien Isorce2017-03-131-0/+5
| | | | | | | | | | | | | | | | | | | | Fixes the following segmentation fault: signal SIGSEGV: invalid address (fault address: 0x0) frame #0: 0x00007fffe718e117 radeonsi_dri.so hud_draw_background_quad hud_context.c:170 167 168 assert(hud->bg.num_vertices + 4 <= hud->bg.max_num_vertices); 169 -> 170 vertices[num++] = (float) x1; 171 vertices[num++] = (float) y1; 172 173 vertices[num++] = (float) x1; (lldb) bt * frame #0: 0x00007fffe718e117 radeonsi_dri.so`hud_draw_background_quad frame #1: 0x00007fffe718f458 radeonsi_dri.so`hud_draw frame #2: 0x00007fffe712967f radeonsi_dri.so`dri_flush Signed-off-by: Marek Olšák <[email protected]>
* winsys/radeon: check null return from radeon_cs_create_fence in cs_flushJulien Isorce2017-03-131-11/+13
| | | | | | | | | | | | Follow-up of patch: "radeon_cs_create_fence: check null return from radeon_winsys_bo_create" radeon_drm_cs_flush radeon_cs_create_fence radeon_winsys_bo_create Signed-off-by: Julien Isorce <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* winsys/radeon: check null in radeon_cs_create_fenceJulien Isorce2017-03-131-0/+3
| | | | | | | | | | | | | | Fixes the following segmentation fault: radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c -> if (!bo->handle) (gdb) bt 0 radeon_drm_cs_add_buffer (bo=0x0) at radeon_drm_cs.c 1 0x00007fffe73575de in radeon_cs_create_fence radeon_drm_cs.c 2 0x00007fffe7358c48 in radeon_drm_cs_flush radeon_drm_cs.c Signed-off-by: Julien Isorce <[email protected]> Signed-off-by: Marek Olšák <[email protected]>
* freedreno/ir3: fragz cannot be half precisionRob Clark2017-03-131-0/+6
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: optimize less in glslRob Clark2017-03-131-1/+1
| | | | | | | | | | | | | | | | | | | Rely on nir for optimization, to reduce compile times. Very minimal impact on shader-db: total instructions in shared programs: 104170 -> 104199 (0.03%) total dwords in shared programs: 209664 -> 209728 (0.03%) total full registers used in shared programs: 7156 -> 7161 (0.07%) total half registers used in shader programs: 109 -> 109 (0.00%) total const registers used in shared programs: 24222 -> 24224 (0.01%) half full const instr dwords helped 12 107 103 112 98 hurt 11 104 105 115 102 But shader db runtime dropped from ~29.3s user to ~20.4s user. Signed-off-by: Rob Clark <[email protected]>
* svga: handle P016 format as wellChristian König2017-03-131-0/+1
| | | | | | Fixes: 62cff793785 ("gallium: add P016 format") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100180 Reviewed-by: Emil Velikov <[email protected]>
* st/va: add config support for 10bit decoding v2Christian König2017-03-132-4/+21
| | | | | | | | | Advertise 10bpp support if the driver supports decoding to a P016 surface. v2: Advertise 10bpp for the decoder as well. Signed-off-by: Christian König <[email protected]> Signed-off-by: Mark Thompson <[email protected]>
* st/va: add support for allocating 10bpp surfacesChristian König2017-03-131-9/+15
| | | | | | | We support P010 and P016 as targets for 10bpp video decoding. Signed-off-by: Christian König <[email protected]> Reviewed-by: Mark Thompson <[email protected]>
* st/va: add support for P010 and P016 formats v3Christian König2017-03-134-3/+22
| | | | | | | | | | | | No hardware I know off can actually support P010 natively. But we can easily support P016 and as long as nobody decodes anything into the lower 6bits it doesn't make any difference to P010. v2: allow P0160 for post processing as well v3: fix post processing once more Signed-off-by: Christian König <[email protected]> Reviewed-by: Mark Thompson <[email protected]>
* st/va: clear the video surface on allocationChristian König2017-03-131-4/+35
| | | | | | | This makes debugging of decoding problems quite a bit easier. Signed-off-by: Christian König <[email protected]> Reviewed-by: Mark Thompson <[email protected]>
* st/va: cleanup error handling in vlVaCreateSurfaces2Christian König2017-03-131-25/+27
| | | | | | | No need to have that twice. Signed-off-by: Christian König <[email protected]> Reviewed-by: Mark Thompson <[email protected]>
* radeon/uvd: enable 10bit HEVC decode v2Christian König2017-03-132-8/+20
| | | | | | | | | Just use whatever the state tracker allocated. v2: fix msb mode Signed-off-by: Christian König <[email protected]> Reviewed-by: Mark Thompson <[email protected]>
* radeon/UVD: fix the decoding target pitch calculationChristian König2017-03-131-1/+1
| | | | | | | | The firmware expects the value in pixel not bytes. Didn't made a difference so far because we only used 8bpp surfaces. Signed-off-by: Christian König <[email protected]> Reviewed-by: Mark Thompson <[email protected]>
* vl/video_buffer: add support for P016Christian König2017-03-131-0/+10
| | | | | | | Just simply the description of the planes. Signed-off-by: Christian König <[email protected]> Reviewed-by: Mark Thompson <[email protected]>
* gallium: add P016 formatChristian König2017-03-134-0/+42
| | | | | | | Same layout as NV12, but 16bit per channel instead of 8. Signed-off-by: Christian König <[email protected]> Reviewed-by: Mark Thompson <[email protected]>
* gallium/util: replace pipe_thread_setname() with u_thread_setname()Timothy Arceri2017-03-123-15/+3
| | | | | | | They do the same thing we just moved the function to be accessible to all of Mesa. Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: replace pipe_thread_get_time_nano() with u_thread_get_time_nano()Timothy Arceri2017-03-121-17/+1
| | | | | | | They do the same thing we just moved the function to be accessible to all of Mesa. Reviewed-by: Marek Olšák <[email protected]>
* gallium/util: replace pipe_thread_create() with u_thread_create()Timothy Arceri2017-03-127-33/+7
| | | | | | | They do the same thing we just moved the function to be accessible to all of Mesa. Reviewed-by: Marek Olšák <[email protected]>