summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radv: Implement WaitForFences with !waitAll.Bas Nieuwenhuizen2018-03-011-5/+15
| | | | | | | | | | Nothing to do except using a busy wait loop. At least for old kernels. A better implementation for newer kernels to come later. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105255 Fixes: f4e499ec79 "radv: add initial non-conformant radv vulkan driver" Reviewed-by: Dave Airlie <[email protected]>
* ac/nir: fix shared atomic operations.Dave Airlie2018-03-011-5/+5
| | | | | | | | | | | The nir->llvm conversion was using the wrong srcs. Fixes: tests/spec/arb_compute_shader/execution/shared-atomics.shader_test Reviewed-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* ac/nir: don't apply slice rounding on txf_msDave Airlie2018-03-011-1/+1
| | | | | | | | | | | This matches the tgsi code. Fixes arb_texture_multisample texelFetch piglit tests. Reviewed-by: Timothy Arceri <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]> Fixes: f4e499ec7914 (radv: add initial non-conformant radv vulkan driver) Signed-off-by: Dave Airlie <[email protected]>
* radeonsi: set some context vars for nir pathTimothy Arceri2018-03-011-6/+10
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: remove llvm from ir structTimothy Arceri2018-03-011-1/+0
| | | | | | | This was added in 425dc4c4b366 but never used. Also since 100796c15c3a native has superseded llvm. Acked-by: Dave Airlie <[email protected]>
* i965: Don't emit MOVs with undefined registers for Gen4 point clipping.Kenneth Graunke2018-02-281-1/+1
| | | | | | | | | | | | Gen4 point clipping calls brw_clip_tri_alloc_regs with nr_verts == 0, which means that c->reg.vertex[] isn't initialized. It then emits MOVs to stomp components of those uninitialized registers to 0. This started causing assertions after Matt's recent series, when those uninitialized registers started getting BRW_REGISTER_TYPE_NF, which definitely doesn't exist on Gen4-5. Reviewed-by: Matt Turner <[email protected]>
* broadcom/vc5: Fix regression in the page-cache slice size alignment.Eric Anholt2018-02-281-3/+6
| | | | | | | We need to align the size of the slice, not the offset of the next slice. Fixes KHR-GLES3.texture_repeat_mode.rgba32ui_11x131_2_clamp_to_edge. Fixes: b4b4ada7616d ("broadcom/vc5: Fix layout of 3D textures.")
* i965: Only emit 3DSTATE_DRAWING_RECTANGLE once on gen8+Jason Ekstrand2018-02-283-2/+13
| | | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Be more clever about setting up our viewport clipJason Ekstrand2018-02-281-8/+12
| | | | | | | | | | | | Before, we were trusting in the hardware to take the intersection of the viewport clip with the drawing rectangle. Unfortunately, 3DSTATE_DRAWING_RECTANGLE is fairly expensive because it implicitly does a full pipeline stall. If we're a bit more careful with our viewport clipping, we can just re-emit it once at context creation time. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Re-add .vs_inputs_dual_locations = trueMatt Turner2018-02-281-0/+1
| | | | | | Looks like a rebase mistake. Fixes: 89fe5190a256 ("intel/compiler: Lower flrp32 on Gen11+")
* r600/shader: when using images always load thread id gpr at start (v2)Dave Airlie2018-02-281-15/+7
| | | | | | | | | | | | The delayed loading code was fail if we had control flow. This fixes: tests/spec/arb_shader_image_load_store/execution/image_checkerboard.shader_test v2: don't use temp_reg before setting temp_reg up. Tested-by: Gert Wollny <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* r600: fix whitespace in recent 1d texture commit.Dave Airlie2018-02-281-1/+1
| | | | trivial fix.
* intel/compiler: Add ICL to test_eu_validate.cppMatt Turner2018-02-281-0/+1
| | | | | | | With the Align16 tests now disabled, we can run the rest of the tests in ICL mode (and see them pass!) Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Disable Align16 tests on Gen11+Matt Turner2018-02-281-0/+16
| | | | | | Align16 is no more. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Add instruction compaction support on Gen11Matt Turner2018-02-281-0/+42
| | | | | | Gen11 only differs from SKL+ in that it uses a new datatype index table. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Mark line, pln, and lrp as removed on Gen11+Matt Turner2018-02-281-4/+6
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Lower flrp32 on Gen11+Matt Turner2018-02-285-17/+26
| | | | | | The LRP instruction is no more. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Implement ddy without using align16 for Gen11+Matt Turner2018-02-281-8/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Align16 is no more. We previously generated an align16 ADD instruction to calculate DDY: add(16) g25<1>F -g23<4>.xyxyF g23<4>.zwzwF { align16 1H }; Without align16, we now implement it as: add(4) g25<1>F -g23<0,2,1>F g23.2<0,2,1>F { align1 1N }; add(4) g25.4<1>F -g23.4<0,2,1>F g23.6<0,2,1>F { align1 1N }; add(4) g26<1>F -g24<0,2,1>F g24.2<0,2,1>F { align1 1N }; add(4) g26.4<1>F -g24.4<0,2,1>F g24.6<0,2,1>F { align1 1N }; where only the first two instructions are needed in SIMD8 mode. Note: an earlier version of the patch implemented this in two instructions in SIMD16: add(8) g25<2>F -g23<4,2,0>F g23.2<4,2,0>F { align1 1N }; add(8) g25.1<2>F -g23.1<4,2,0>F g23.3<4,2,0>F { align1 1N }; but I realized that the channel enable bits will not be correct. If we knew we were under uniform control flow, we could emit only those two instructions however. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Simplify ddx/ddy code generationMatt Turner2018-02-281-42/+21
| | | | | | The brw_reg() constructor just obfuscates things here, in my opinion. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Pass fs_inst to generate_ddx/ddy instead of opcodeMatt Turner2018-02-282-8/+10
| | | | | | | In a future patch, generate_ddy will want to inspect inst->exec_size. Change generate_ddx as well for consistency. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Don't generate integer DWord multiply on Gen11Matt Turner2018-02-283-5/+6
| | | | | | | Like CHV et al., Gen11 does not support 32x32 -> 32/64-bit integer multiplies. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Implement FS_OPCODE_LINTERP with MADs on Gen11+Matt Turner2018-02-282-4/+46
| | | | | | | | | | | | | | | | | | | The PLN instruction is no more. Its functionality is now implemented using two MAD instructions with the new native-float type. Instead of pln(16) r20.0<1>:F r10.4<0;1,0>:F r4.0<8;8,1>:F we now have mad(8) acc0<1>:NF r10.7<0;1,0>:F r4.0<8;8,1>:F r10.4<0;1,0>:F mad(8) r20.0<1>:F acc0<8;8,1>:NF r5.0<8;8,1>:F r10.5<0;1,0>:F mad(8) acc0<1>:NF r10.7<0;1,0>:F r6.0<8;8,1>:F r10.4<0;1,0>:F mad(8) r21.0<1>:F acc0<8;8,1>:NF r7.0<8;8,1>:F r10.5<0;1,0>:F ... and in the case of SIMD8 only the first pair of MAD instructions is used. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Return multiple_instructions_emitted from generate_linterpMatt Turner2018-02-282-4/+8
| | | | | | | If multiple instructions are emitted, special handling of things like conditional mod and NoDDClr/NoDDChk need to be performed. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler/fs: Fix application of cmod and saturate to LINE/MAC pairMatt Turner2018-02-281-2/+11
| | | | | | | | | This isn't technically broken, but the next patch will make this function report whether it generated multiple instructions, and that information will be used to disable the application of conditional mod by the generic code. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Add Gen11+ native float typeMatt Turner2018-02-286-2/+32
| | | | | | | | | | | | This new type exposes the additional precision offered by the accumulator register and will be used in the next patch to implement the functionality of the PLN instruction using a pair of MAD instructions. One weird thing to note: align1 ternary instructions may only have an accumulator in the dst or src1 normally, but when src0's type is :NF the accumulator is read. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/compiler: Add Gen11 register typesMatt Turner2018-02-281-8/+65
| | | | | | | The hardware register types' encodings have changed on Gen11. Good thing we have that superfluous looking brw_reg_type abstraction lying around! Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Disable 64-bit extensions on platforms without 64-bit typesMatt Turner2018-02-283-4/+9
| | | | | | | | Gen11 does not support DF, Q, UQ types in hardware. As a result, we have to disable some GL extensions until they can be reimplemented. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel: Add icl pci id for INTEL_DEVID_OVERRIDEAnuj Phogat2018-02-281-0/+1
| | | | | Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Anuj Phogat <[email protected]>
* i965: Warn about preliminary support for Gen11Matt Turner2018-02-281-0/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Add a preliminary device for Ice LakeAnuj Phogat2018-02-281-1/+56
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Anuj Phogat <[email protected]>
* anv: remove anv_gem_set_context_priority helperTapani Pälli2018-02-283-12/+3
| | | | | | | | anv_gem_set_context_param is to be used directly instead! Fixes: 6d8ab53303 "anv: implement VK_EXT_global_priority extension" Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* swr/rast: revert clip distance precisionGeorge Kyriazis2018-02-282-4/+17
| | | | | | Fixes piglit tests that broke with 8a64593bde Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Faster frustum prim cullingGeorge Kyriazis2018-02-281-3/+7
| | | | | | | | | Fix clipper validMask setting. We don't need to run frustum rejected primitives through the clipper. Perform frustum culling with only frustum clip codes. Guardband clip codes cannot be used because they overlap frustum codes. Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Consolidate TRANSLATE_ADDRESSGeorge Kyriazis2018-02-284-6/+28
| | | | | | | | Translate is now part of an overloaded LOAD call which required a change to the code gen to skip the load functions in order to handle them manually to make them virtual. Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Code generation cleanupGeorge Kyriazis2018-02-281-15/+21
| | | | | | Generate more compact code from gen_llvm.hpp. Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Remove draw type from event definitionsGeorge Kyriazis2018-02-283-12/+8
| | | | | | | | | | | - Have the draw type sent to DrawInfoEvent in handlers created in archrast.cpp. The draw type no longer needs to be sent during during AR_API_EVENT() call in api.cpp. - Remove draw type from event defintions in events_private.proto, no longer needed Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: whitespace changeGeorge Kyriazis2018-02-281-1/+1
| | | | Reviewed-By: Bruce Cherniak <[email protected]>
* swr/rast: Fix index buffer overfetch issue for non-indexed drawsGeorge Kyriazis2018-02-281-0/+15
| | | | | | | | Populate pLastIndex, even for the non-indexed case. An zero pLastIndex can cause the index offsets inside the fetcher to have non-sensical values that can be either very large positive or very large negative numbers. Reviewed-By: Bruce Cherniak <[email protected]>
* softpipe: don't iterate through PIPE_MAX_SHADER_SAMPLER_VIEWSRoland Scheidegger2018-02-281-2/+2
| | | | | | | | We were setting view to NULL if the iteration was larger than i. But in fact if the view is NULL the code did nothing anyway... Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* cso: don't cycle through PIPE_MAX_SHADER_SAMPLER_VIEWS on context destroyRoland Scheidegger2018-02-281-1/+3
| | | | | | | There's no point, we know the highest non-null one. Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* draw: don't needlessly iterate through all sampler view slotsRoland Scheidegger2018-02-281-1/+1
| | | | | | | We already stored the highest (potentially) used number. Reviewed-by: Jose Fonseca <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* anv: implement VK_EXT_global_priority extensionTapani Pälli2018-02-285-0/+95
| | | | | | | | | | | | | | | | v2: add ANV_CONTEXT_REALTIME_PRIORITY (Chris) use unreachable with unknown priority (Samuel) v3: add stubs in gem_stubs.c (Emil) use priority defines from gen_defines.h v4: cleanup, add anv_gem_set_context_param (Jason) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> (v2) Reviewed-by: Chris Wilson <[email protected]> (v2) Reviewed-by: Emil Velikov <[email protected]> (v3) Reviewed-by: Jason Ekstrand <[email protected]>
* i965: use context priority definitions from gen_defines.hTapani Pälli2018-02-283-10/+10
| | | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* intel: add new common header gen_defines.hTapani Pälli2018-02-282-0/+55
| | | | | | Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* winsys/amdgpu: request high addressesChristian König2018-02-281-4/+12
| | | | | | | | | We now have hopefully fixed all bugs regarding high addresses on Vega10 and Raven. Start to use the high range to make room for SVM in the low range. Signed-off-by: Christian König <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* ac/shader: move scanning some info about input PS declarationsSamuel Pitoiset2018-02-285-15/+26
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Dave Airlie <[email protected]>
* glsl/linker: fix bug when checking precision qualifierSamuel Iglesias Gonsálvez2018-02-281-8/+3
| | | | | | | | | | | | | | | According to GLSL ES 3.2 spec, see table in 9.2.1 "Linked Shaders" section, the precision qualifier should match for uniform variables. This also applies to previous GLSL ES 3.x specs. This 'if' checks the condition for uniform variables, while for UBOs it is checked in link_interface_blocks.cpp. Fixes: b50b82b8a553 ("glsl/es31: precision qualifier doesn't need to match in shader interface block members") Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* anv: set maxResourceSize to the respective value for each generationSamuel Iglesias Gonsálvez2018-02-281-1/+14
| | | | | | | | v2: - Add the proper values to gen9+ (Jason) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* r600: partly revert disabling tiling for 1d texture.Dave Airlie2018-02-281-0/+5
| | | | | | | | | | | | Previously we had a check for 1d of narrow 2D textures, however narrow 2d textures caused gpu hangs, but it was correct for 1d textures. This fixes a bunch of 1D image piglits for me. Fixes: 7b8e1c089d (r600/texture: drop lowering 1d/2d images to linear.) Reviewed-by: Roland Scheidegger <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nir: fix interger divide by zero crash during constant foldingTimothy Arceri2018-02-281-2/+2
| | | | | | | | | | | | From the GLSL 4.60 spec Section 5.9 (Expressions): "Dividing by zero does not cause an exception but does result in an unspecified value." Fixes: 89285e4d47a6 "nir: add new constant folding infrastructure" Reviewed-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105271