aboutsummaryrefslogtreecommitdiffstats
path: root/src/gallium
Commit message (Collapse)AuthorAgeFilesLines
* r600/shader: when using images always load thread id gpr at start (v2)Dave Airlie2018-02-281-15/+7
| | | | | | | | | | | | The delayed loading code was fail if we had control flow. This fixes: tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test v2: don't use temp_reg before setting temp_reg up. Tested-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix whitespace in recent 1d texture commit.Dave Airlie2018-02-281-1/+1
| | | | trivial fix.
* swr/rast: revert clip distance precisionGeorge Kyriazis2018-02-282-4/+17
| | | | | | Fixes piglit tests that broke with 8a64593bde Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Faster frustum prim cullingGeorge Kyriazis2018-02-281-3/+7
| | | | | | | | | Fix clipper validMask setting. We don't need to run frustum rejected primitives through the clipper. Perform frustum culling with only frustum clip codes. Guardband clip codes cannot be used because they overlap frustum codes. Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Consolidate TRANSLATE_ADDRESSGeorge Kyriazis2018-02-284-6/+28
| | | | | | | | Translate is now part of an overloaded LOAD call which required a change to the code gen to skip the load functions in order to handle them manually to make them virtual. Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Code generation cleanupGeorge Kyriazis2018-02-281-15/+21
| | | | | | Generate more compact code from gen_llvm.hpp. Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Remove draw type from event definitionsGeorge Kyriazis2018-02-283-12/+8
| | | | | | | | | | | - Have the draw type sent to DrawInfoEvent in handlers created in archrast.cpp. The draw type no longer needs to be sent during during AR_API_EVENT() call in api.cpp. - Remove draw type from event defintions in events_private.proto, no longer needed Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: whitespace changeGeorge Kyriazis2018-02-281-1/+1
| | | | Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Fix index buffer overfetch issue for non-indexed drawsGeorge Kyriazis2018-02-281-0/+15
| | | | | | | | Populate pLastIndex, even for the non-indexed case. An zero pLastIndex can cause the index offsets inside the fetcher to have non-sensical values that can be either very large positive or very large negative numbers. Reviewed-By: Bruce Cherniak <[email protected]>
* softpipe: don't iterate through PIPE_MAX_SHADER_SAMPLER_VIEWSRoland Scheidegger2018-02-281-2/+2
| | | | | | | | We were setting view to NULL if the iteration was larger than i. But in fact if the view is NULL the code did nothing anyway... Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* cso: don't cycle through PIPE_MAX_SHADER_SAMPLER_VIEWS on context destroyRoland Scheidegger2018-02-281-1/+3
| | | | | | | There's no point, we know the highest non-null one. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* draw: don't needlessly iterate through all sampler view slotsRoland Scheidegger2018-02-281-1/+1
| | | | | | | We already stored the highest (potentially) used number. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* winsys/amdgpu: request high addressesChristian König2018-02-281-4/+12
| | | | | | | | | We now have hopefully fixed all bugs regarding high addresses on Vega10 and Raven. Start to use the high range to make room for SVM in the low range. Signed-off-by: Christian König <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* r600: partly revert disabling tiling for 1d texture.Dave Airlie2018-02-281-0/+5
| | | | | | | | | | | | Previously we had a check for 1d of narrow 2D textures, however narrow 2d textures caused gpu hangs, but it was correct for 1d textures. This fixes a bunch of 1D image piglits for me. Fixes: 7b8e1c089d (r600/texture: drop lowering 1d/2d images to linear.) Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nir: add lower_ldexp to nir compiler optionsTimothy Arceri2018-02-282-0/+2
| | | | Reviewed-by: Marek Olšák <[email protected]>
* ac/radeonsi: add load_base_vertex() to the abiTimothy Arceri2018-02-281-0/+1
| | | | | | | | | | Fixes the following piglit tests: ./bin/arb_shader_draw_parameters-basevertex basevertex -auto -fbo ./bin/arb_shader_draw_parameters-basevertex basevertex-baseinstance -auto -fbo Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: create get_base_vertex() helperTimothy Arceri2018-02-281-14/+20
| | | | | Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: disable vertex_id_zero_based loweringTimothy Arceri2018-02-281-1/+0
| | | | | | | | | | The lowering is incompatible with how the radeonsi backend works. Fixes piglit test: ./bin/arb_shader_draw_parameters-basevertex vertexid-zerobased -auto Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* nvc0: collapse output slots to have adjacent registersIlia Mirkin2018-02-271-2/+12
| | | | | | | | | | The hardware skips over unallocated slots, so we have to make sure those registers are packed together. Fixes KHR-GL45.enhanced_layouts.fragment_data_location_api Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Karol Herbst <[email protected]>
* nvir/gm107: consider FILE_FLAGS dependencies in SchedDataCalculatorGM107Karol Herbst2018-02-261-1/+14
| | | | | | | | | | | | | | | | | | | | currently while insterting barriers, writes and reads to FILE_FLAGS aren't considered. This can lead to WaR hazards in some situations. With the previous commit fixes shaders with intstructions like this: mad u32 $r2 $r4 $r11 $r2 mad u32 { $r5 $c0 } $r4 $r10 $r6 mad (SUBOP:1) u32 $r3 $r4 $r10 $r2 $c0 Affects OpenCL CTS tests on Maxwell+: basic/test_basic intmath_long basic/test_basic intmath_long2 basic/test_basic intmath_long4 v2: only put barriers on instructions which actually read flags Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* nvir/gm107: iterate over all defs in SchedDataCalculatorGM107::findFirstUseKarol Herbst2018-02-261-16/+18
| | | | | | | | | | In the sched data calculator we have to track first use of defs by iterating over all defs of an instruction, not just the first one. v2: fix minGRP and maxGRP values Reviewed-by: Samuel Pitoiset <[email protected]> Signed-off-by: Karol Herbst <[email protected]>
* radeonsi: remove 2 unused user SGPRs from merged TES-GS with 32-bit pointersMarek Olšák2018-02-264-11/+35
| | | | | | The effect of the last 13 commits on user SGPR counts: Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: make SI_SGPR_VERTEX_BUFFERS the last user SGPR inputMarek Olšák2018-02-264-20/+53
| | | | | | | | so that it can be removed and replaced with inline VBO descriptors, and the pointer can be packed in unused bits of VBO descriptors. This also removes the pointer from merged TES-GS where it's useless. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: set correct num_input_sgprs for VS prolog in merged shadersMarek Olšák2018-02-261-24/+24
| | | | | | | We need to take num_input_sgprs from VS, not the second shader. No apps suffered from this. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: allow fewer input SGPRs in 2nd shader of merged shadersMarek Olšák2018-02-261-1/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't use struct si_descriptors for vertex buffer descriptorsMarek Olšák2018-02-266-33/+46
| | | | | | VBO descriptor code will change a lot one day. Reviewed-by: Nicolai Hähnle <[email protected]>
* r600: fix tgsi clock last settingDave Airlie2018-02-261-0/+1
| | | | | | | On cayman this was hitting an assert later, which probably wasn't see on non-cayman due to having the t slot. Fixes: 9041730d1 (r600: add support for ARB_shader_clock.)
* r600: add time lo/hi debugging output.Dave Airlie2018-02-262-0/+12
| | | | This just adds the these to the debug prints.
* radeonsi/nir: enable lowering of fpowTimothy Arceri2018-02-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Lowering fpow in NIR rather than LLVM can be beneficial. Polaris results: Totals from affected shaders: SGPRS: 124928 -> 124896 (-0.03 %) VGPRS: 68616 -> 68332 (-0.41 %) Spilled SGPRs: 394 -> 413 (4.82 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 3668912 -> 3658368 (-0.29 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 18575 -> 18593 (0.10 %) Wait states: 0 -> 0 (0.00 %) Fixes: d6b753920677 "ac/nir: remove emission of nir_op_fpow" Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/tgsi: remove is_msaa_sampler array from tgsi_shader_infoTimothy Arceri2018-02-262-7/+0
| | | | | | | Seems to have not been used since 16be87c90429 Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi/nir: fix loading of doubles for tess varyingsTimothy Arceri2018-02-261-2/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi/nir: fix lds store in tcs outputs handlingTimothy Arceri2018-02-261-1/+1
| | | | | | We were ignoring the channel offset. Reviewed-by: Marek Olšák <[email protected]>
* r600: Take ALU_EXTENDED into account when evaluating jump offsetsGert Wollny2018-02-261-2/+7
| | | | | | | | | | | ALU_EXTENDED needs 4 DWORDS instead of the usual 2, hence if the last ALU clause within a IF-JUMP or ELSE branch is ALU_EXTENDED the target jump offset needs to be adjusted accordingly. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104654 Cc: <[email protected]> Signed-off-by: Gert Wollny <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* radeonsi: remove si_descriptors parameter from emit_shader_pointer functionsMarek Olšák2018-02-241-12/+13
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: preload the tess offchip ring in TESMarek Olšák2018-02-242-12/+10
| | | | | | so that it's not done multiple times in branches Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRsMarek Olšák2018-02-245-91/+70
| | | | | | | TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address aligned to 512KB. Hey, it's a 13-bit pointer! Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move 2nd-shader descriptor pointers into s[0:1]Marek Olšák2018-02-243-74/+140
| | | | | | | | | | | If 32-bit pointers are supported, both pointers can be moved into s[0:1] and then ESGS has exactly the same user data SGPR declarations as VS. If 32-bit pointers are not supported, only one pointer can be moved into s[0:1]. In that case, the 2nd pointer is moved before TCS constants, so that the location is the same in HS and GS. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: change si_descriptors::shader_userdata_offset type to shortMarek Olšák2018-02-242-9/+9
| | | | | | | We will want to use SH registers outside of user data SGPRs, like the GFX9 special SGPRs. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: put both tessellation rings into 1 bufferMarek Olšák2018-02-244-29/+18
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move tessellation ring info into si_screenMarek Olšák2018-02-243-45/+52
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: move TCS_OUT_LAYOUT.PatchVerticesIn to lower bitsMarek Olšák2018-02-243-5/+6
| | | | | | For a later patch. Reviewed-by: Nicolai Hähnle <[email protected]>
* nvir: dont optimize mad with subops to shladdKarol Herbst2018-02-241-1/+2
| | | | | Signed-off-by: Karol Herbst <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* broadcom/vc5: Fix layout of 3D textures.Eric Anholt2018-02-232-32/+81
| | | | | | Cube maps are entire miptrees repeated, while 3D textures have each level have all of its layers next to each other. Fixes tex3d and tex-miplevel-selection GL2:texture() 3D.
* broadcom/vc5: Ignore unused usage flags in is_format_supported.Eric Anholt2018-02-231-27/+16
| | | | | | | | Like for vc4, the new DISPLAY_TARGET flag ended up causing no formats to match. Just drop the whole retval == usage thing and return early when we hit a known unsupported case. Fixes: f7604d8af521 ("st/dri: only expose config formats that are display targets")
* swr: remove dead LLVM code pathsEmil Velikov2018-02-233-28/+0
| | | | | | | | | LLVM requirement was bumped to 4.0.0 with earlier commit. Hence any code tailored for older versions is now unreachable. Signed-off-by: Emil Velikov <[email protected]> Reviewed-By: George Kyriazis <[email protected]> Reviewed-by: Andres Gomez <[email protected]>
* broadcom/vc4: Remove the retval==usage check in is_format_supported().Eric Anholt2018-02-231-26/+13
| | | | This got us into trouble recently, so just remove it entirely.
* broadcom/vc4: Add support for YUV textures using unaccelerated blits.Eric Anholt2018-02-233-3/+35
| | | | | Previously we would assertion fail about having no hardware format. This is enough to get kmscube -M nv12-2img working.
* broadcom/vc4: Fix double-unrefcounting of prsc->next with shadows.Eric Anholt2018-02-231-6/+11
| | | | | | | When we set up the shadow resource we were copying the original resource as the template, including its prsc->next field. When we shadowed the first YUV plane's resource for linear-to-tiled conversion, we would end up unbalancing the refcount on the shadow resource's destruction.
* broadcom/vc4: Add pipe_reference debugging for vc4_bos.Eric Anholt2018-02-232-5/+24
| | | | | Trying to track down the YUV EGLImage use-after-free, it helps to see what the mystery objects are that are being refcounted.
* broadcom/vc4: Remove dead vc4_bo_set_reference().Eric Anholt2018-02-231-8/+0
| | | | | It would be broken if NULL was passed to it anyway, since it wouldn't participate in screen->bo_handles management.