summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* mesa: add xbgr support adjacent to xrgbIlia Mirkin2018-02-197-2/+74
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Acked-by: Daniel Stone <[email protected]>
* st/shader_cache: copy nir pointer to gl_program after deserializingTimothy Arceri2018-02-201-0/+6
| | | | | | | This fixes a crash when running the arb_get_program_binary-api-errors piglit test twice. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: add nir shader cache supportTimothy Arceri2018-02-201-11/+30
| | | | | | | In future we might want to try avoid calling nir_serialize() but this works for now. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: rename variables tgsi_binary -> ir_binaryTimothy Arceri2018-02-201-21/+21
| | | | | | This better represents that the ir could be either tgsi or nir. Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: fix regression from 32-bit pointers on CIMarek Olšák2018-02-191-1/+1
| | | | Tested-by: Michel Dänzer <[email protected]>
* radv: compact varyings after removing unused onesSamuel Pitoiset2018-02-191-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | It makes no sense to compact before, and the description of nir_compact_varyings() confirms that. Polaris10: Totals from affected shaders: SGPRS: 108528 -> 108128 (-0.37 %) VGPRS: 74548 -> 74500 (-0.06 %) Spilled SGPRs: 844 -> 814 (-3.55 %) Code Size: 3007328 -> 2992932 (-0.48 %) bytes Max Waves: 16019 -> 16009 (-0.06 %) Vega10: Totals from affected shaders: SGPRS: 106088 -> 106232 (0.14 %) VGPRS: 74652 -> 74700 (0.06 %) Spilled SGPRs: 692 -> 658 (-4.91 %) Code Size: 2967708 -> 2953028 (-0.49 %) bytes Max Waves: 18178 -> 18162 (-0.09 %) Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* radeonsi/nir: fix gl_FragCoord for pixel_center_integerTimothy Arceri2018-02-191-0/+5
| | | | | | Fixes piglit test glsl-arb-fragment-coord-conventions Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/nir: add pixel_center_integer to shader infoTimothy Arceri2018-02-192-0/+7
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* gm107/ir: avoid using kepler instruction capabilitiesIlia Mirkin2018-02-172-21/+45
| | | | | | | | Split up the op properties table into generation-specific bits, and only use the kepler ones on kepler. Fixes some CTS images tests. Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Karol Herbst <[email protected]>
* nvc0: add support for bindless on maxwell+Ilia Mirkin2018-02-173-14/+116
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* gm107/ir: change how SUQ works in preparation for bindlessIlia Mirkin2018-02-173-1/+61
| | | | | | | All this information can be retrieved from the TIC directly. Avoid having to dip into the constbuf information about the image. Signed-off-by: Ilia Mirkin <[email protected]>
* i965: Use absolute addressing for constant buffer 0 on Kernel 4.16+.Kenneth Graunke2018-02-172-1/+32
| | | | | | | | | | | | | | | | | | | | | | | By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic state base address. This makes it unusable for pushing UBOs. There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake) which controls whether buffer 0 is relative to dynamic state base address, or simply a normal pointer. Setting that gives us full flexibility. This lets us push up to 4 UBO ranges. We can't currently write this on Haswell and earlier, and will need to update the kernel command parser, and then do the whole version checking song and dance. We also need a brand new kernel that supports context isolation - on older kernels, newly created contexts inherit register state from whatever happened to be running. So, setting this would have catastrophic impact on other drivers such as libva, Beignet, or older Mesa. See commit 8ec5a4e4a4a32f4de351c5fc2bf0eb615b6eef1b where we did this once before, but had to revert it in commit 013d33122028f2492da90a03a. Reviewed-by: Francisco Jerez <[email protected]>
* i965: Stop restoring the default L3 configuration on Kernel 4.16+.Kenneth Graunke2018-02-173-2/+7
| | | | | | | | | | Kernel 4.16 has proper context isolation, which means we can change the L3 configuration without worrying about that leaking to other newly created contexts, breaking the assumptions of other userspace. So, disable our workaround to reprogram it back to the default. Reviewed-by: Francisco Jerez <[email protected]>
* nvc0: Use GP100_COMPUTE_CLASS on GP10BMikko Perttunen2018-02-171-1/+2
| | | | | | | | GP10B requires the use of GP100_COMPUTE_CLASS instead of GP104_COMPUTE_CLASS as is used for other non-GP100 chips. Signed-off-by: Mikko Perttunen <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* i965: Fix aux-surface size checkDaniel Stone2018-02-172-3/+12
| | | | | | | | | | | | | | | | | | | The previous commit reworked the checks intel_from_planar() to check the right individual cases for regular/planar/aux buffers, and do size checks in all cases. Unfortunately, the aux size check was broken, and required the aux surface to be allocated with the correct aux stride, but full image height (!). As the ISL aux surface is not recorded in the DRIimage, we cannot easily access it to check. Instead, store the aux size from when we do have the ISL surface to hand, and check against that later when we go to access the aux surface. Signed-off-by: Daniel Stone <[email protected]> Fixes: c2c4e5bae3ba ("i965: Fix bugs in intel_from_planar") Reviewed-by: Jason Ekstrand <[email protected]>
* radeonsi: implement 32-bit pointers in user data SGPRs (v2)Marek Olšák2018-02-177-59/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | User SGPRs changes: VS: 14 -> 9 TCS: 14 -> 10 TES: 10 -> 6 GS: 8 -> 4 GSCOPY: 2 -> 1 PS: 9 -> 5 Merged VS-TCS: 24 -> 16 Merged VS-GS: 18 -> 11 Merged TES-GS: 18 -> 11 SGPRS: 2170102 -> 2158430 (-0.54 %) VGPRS: 1645656 -> 1641516 (-0.25 %) Spilled SGPRs: 9078 -> 8810 (-2.95 %) Spilled VGPRs: 130 -> 114 (-12.31 %) Scratch size: 1508 -> 1492 (-1.06 %) dwords per thread Code Size: 52094872 -> 52692540 (1.15 %) bytes Max Waves: 371848 -> 372723 (0.24 %) v2: - the shader cache needs to take address32_hi into account - set amdgpu-32bit-address-high-bits Reviewed-by: Samuel Pitoiset <[email protected]> (v1)
* radeonsi: disallow constant buffers with a 64-bit address in slot 0Marek Olšák2018-02-172-1/+9
| | | | | | | State trackers must use a user buffer or const_uploader, or set pipe_resource::flags same as const_uploader->flags. Reviewed-by: Samuel Pitoiset <[email protected]>
* radeonsi: move const_uploader allocations to 32-bit address spaceMarek Olšák2018-02-173-2/+7
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* winsys/radeon: implement and enable 32-bit VM allocationsMarek Olšák2018-02-173-8/+64
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* winsys/radeon: add struct radeon_vm_heapMarek Olšák2018-02-173-36/+47
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* winsys/amdgpu: enable 32-bit VM allocationsMarek Olšák2018-02-171-1/+2
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* gallium/radeon: add 32-bit address space heapsMarek Olšák2018-02-171-3/+44
| | | | Reviewed-by: Samuel Pitoiset <[email protected]>
* ac: query high bits of 32-bit address spaceMarek Olšák2018-02-172-0/+8
|
* gallium: use PIPE_CAP_CONSTBUF0_FLAGSMarek Olšák2018-02-174-5/+27
|
* gallium: allow drivers to impose BO flags restrictions on constant buffer 0Marek Olšák2018-02-1718-0/+21
| | | | Required by radeonsi for optimal behavior.
* meson: Add Haiku platform support v4Alexander von Gluck IV2018-02-169-13/+189
| | | | Reviewed-by: Dylan Baker <[email protected]>
* anv/icl: Add render target flush after uploading binding tableAnuj Phogat2018-02-161-0/+20
| | | | | | | | | | | | | The PIPE_CONTROL command description says: "Whenever a Binding Table Index (BTI) used by a Render Taget Message points to a different RENDER_SURFACE_STATE, SW must issue a Render Target Cache Flush by enabling this bit. When render target flush is set due to new association of BTI, PS Scoreboard Stall bit must be set in this packet." Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Enable float blend optimizationAnuj Phogat2018-02-161-1/+1
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Use gen11 functionsAnuj Phogat2018-02-162-0/+6
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Build anv libs for gen11Anuj Phogat2018-02-164-2/+32
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Generate gen11 entry point functionsAnuj Phogat2018-02-161-1/+5
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Don't use DISPATCH_MODE_SIMD4X2Anuj Phogat2018-02-161-0/+5
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Don't use SingleVertexDispatchAnuj Phogat2018-02-161-0/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Don't set ResetGatewayTimerAnuj Phogat2018-02-161-0/+2
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Add #define genXAnuj Phogat2018-02-161-0/+3
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/icl: Add gen11 mocs definesAnuj Phogat2018-02-161-0/+11
| | | | | Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Implement GenerateMipmap directly, rather than using Meta.Kenneth Graunke2018-02-165-0/+135
| | | | | | | | | | | | | | | | | | Meta is awful and we'd like to stop using it. Implementing this using BLORP allows us to stop trashing a bunch of GL state every time. This follows the structure of st_generate_mipmap(). compute_num_levels is lifted directly from there. Improves performance in Gl41HdrBloom by about 11.794% +/- 1.01919% (n=3) on Kabylake GT2 at 1280x720 (the difference seems much smaller at higher resolutions). v2 (idr): Don't try depth or depth-stencil blorp blits on Gen4 or Gen5 because it's not implemented yet. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* mesa: Move compute_num_levels from st_gen_mipmap.c to mipmap.c.Kenneth Graunke2018-02-163-27/+29
| | | | | | | I want to use compute_num_levels inside i965. Rather than duplicating it, move it from mesa/st to core Mesa, and make it non-static. Reviewed-by: Marek Olšák <[email protected]>
* meson: freedreno depends on nirDylan Baker2018-02-161-0/+1
| | | | | | | | | This fixes a race condition in building targets that link in freedreno. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105120 Fixes: 0bbecc5a8548883f76a7 ("meson: define driver dependencies") Signed-off-by: Dylan Baker <[email protected]> Acked-by: Mark Janes <[email protected]>
* swr/rast: blend_epi32() should return Integer, not FloatGeorge Kyriazis2018-02-161-1/+1
| | | | | | | fix gcc8 compiler error for KNL. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105029 Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Normalize path for debug metadataGeorge Kyriazis2018-02-161-1/+2
| | | | | | in template gen_llvm.hpp Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Consolidate archrast Draw eventsGeorge Kyriazis2018-02-164-26/+79
| | | | | | | | | | | | | | | | Consolidate archrst draw events into single draw event with an attribute that represents the type of draw - Add handlers for new private proto versions of DrawInstancedEvent, DrawIndexedInstancedEvent, DrawInstancedSplitEvent, and DrawIndexedInstancedSplitEvent - Convert the draw events to generic DrawInfoEvents - parse_proto_event_fields() replaces 'AR_DRAW_TYPE' as a field type with 'uint32_t'. This draw type is actually an enum, but can be represented as an unsigned integer. - is_draw_or_dispatch() recognizes DrawInfoEvent as a draw event Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Add semantics for translating addressGeorge Kyriazis2018-02-162-0/+5
| | | | | | Added support for another full translation path in fetch jitter. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Convert C Sampler intrinsicsGeorge Kyriazis2018-02-162-0/+19
| | | | | | | | | | Convert portions of the C sampler to the rasty SIMD lib. Also fix SRL call with a non-immediate. Don't count on the compiler automagically converting an srli call to srl if the shift count isn't an immediate. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Make SIMDLib templated types easier to useGeorge Kyriazis2018-02-165-298/+307
| | | | | | "typename SIMD_T::TypeName" --> "TypeName<SIMD_T>" Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Be more explicit when fetching next componentGeorge Kyriazis2018-02-162-4/+11
| | | | | | | Use a new function to denote that we want to get offset to next component and hide the fact that GEP is used underneath. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix bug related to passing AR handleGeorge Kyriazis2018-02-161-1/+1
| | | | | | We were passing a garbage handle. Let's not do that. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Fix primitive replication issue in tesselation PA.George Kyriazis2018-02-162-2/+3
| | | | Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Use llvm intrinsic masked gatherGeorge Kyriazis2018-02-162-0/+14
| | | | | | | | | Use llvm intrinsic masked.gather instead of manual unroll for the cases where we have vector of pointers. Improves llvm IR debug experience by reducing a ton of IR to a single intrinsic call. Also seems to reduce overall stack use considerably. Reviewed-by: Bruce Cherniak <[email protected]>
* swr/rast: Misc cleanupGeorge Kyriazis2018-02-163-49/+60
| | | | | | Together with correct detection of clipDistance NaNs when no cullDistance is set Reviewed-by: Bruce Cherniak <[email protected]>