summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: Only emit 1 viewport when possible.Kenneth Graunke2016-10-0310-30/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | In core profile, we support up to 16 viewports. However, in the majority of cases, only 1 of them is actually used - we only need the others if the last shader stage prior to the rasterizer writes gl_ViewportIndex. Processing all 16 viewports adds additional CPU overhead, which hurts CPU-intensive workloads such as Glamor. This meant that switching to core profile actually penalized Glamor to an extent, which is unfortunate. This patch tracks the number of relevant viewports, switching between 1 and ctx->Const.MaxViewports if gl_ViewportIndex is written. A new BRW_NEW_VIEWPORT_COUNT flag tracks this. This could mean re-emitting viewport state when switching, but hopefully this is offset by doing 1/16th of the work in the common case. The new flag is also lighter weight than BRW_NEW_VUE_MAP_GEOM_OUT, which we were using in one case. According to Eric Anholt, x11perf -copypixwin10 performance improves by 11.5094% +/- 3.10841% (n=10) on his Skylake. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* spirv: translate cull distance semantic.Dave Airlie2016-10-041-1/+1
| | | | | | | | This just translates to the correct cull distance slot. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* compiler: add printable values for cull distance varyings.Dave Airlie2016-10-041-0/+2
| | | | | | | | We need these for spir-v/nir shaders. Reviewed-by: Edward O'Callaghan <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nir/spirv/cfg: Use a nop intrinsic for tagging the ends of blocksJason Ekstrand2016-10-032-4/+6
| | | | | | | | | | | | | | | | | | | | | Previously, we were saving off the last nir_block in a vtn_block before moving on so that we could find the nir_block again when it came time to handle phi sources. Unfortunately, NIR's control flow modification code is inconsistent when it comes to how it splits blocks so the block pointer we saved off may point to a block somewhere else in the shader by the time we get around to handling phi sources. In order to get around this, we insert a nop instruction and use that as the logical end of our block. Since the control flow manipulation code respects instructions, the nop will keeps its place like any other instruction and we can easily find the end of our block when we need it. This fixes a bug triggered by a couple of vkQuake shaders. Signed-off-by: Jason Ekstrand <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97233 Cc: "12.0" <[email protected]> Tested-by: Dave Airlie <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add a nop intrinsicJason Ekstrand2016-10-031-0/+3
| | | | | | | | | | | This intrinsic has no destination, no sources, no variables, and can be eliminated. In other words, it does nothing and will always get deleted by dead code elimination. However, it does provide a quick-and-easy way to temporarily tag a particular location in a NIR shader. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Cc: "12.0" <[email protected]>
* intel/isl: Allow non-2D HiZ surfacesJason Ekstrand2016-10-031-2/+2
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Add a detailed comment about multisampling with HiZJason Ekstrand2016-10-031-2/+58
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Remove tiling checks from choose_msaa_layoutJason Ekstrand2016-10-032-14/+7
| | | | | | | | | | | We already do those checks in filter_tiling. There's no good reason to repeat them in choose_msaa_layout. If anything they should have been asserts and not "return false" checks. Also, this check was causing us to outright reject multisampled HiZ surfaces which wasn't intended. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Handle HiZ and CCS tiling more directlyJason Ekstrand2016-10-032-16/+16
| | | | | | | | | | | | The HiZ and CCS tiling formats are always used for HiZ and CCS surfaces respectively. There's no reason why we should go through filter_tiling and it's much easier to always get HiZ and CCS right if we just handle them directly. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Allow multisampling with ISL_FORMAT_HiZJason Ekstrand2016-10-032-3/+12
| | | | | | | | | | | | | HiZ buffers can be multisampled and, on Broadwell and earlier, simply using interleaved multisampling with a compression block size of 8x4 samples yields the correct HiZ surface size calculations. Unfortunately, choose_msaa_layout was rejecting multisampled HiZ buffers because of format checks. Now that we have a simple helper for determining if a format supports multisampling, that's an easy enough issue to fix. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Allow creation of 1-D compressed texturesJason Ekstrand2016-10-032-3/+11
| | | | | | | | | | | Compressed 1-D textures are not well-defined thing in either GL or Vulkan. However, auxiliary surfaces are treated as compressed textures in ISL and we can do HiZ and CCS with 1-D so we need to be able to create them. In order to prevent actually using them (the docs say no), we assert in the state setup code. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Fix up asserts in calc_phys_level0_extent_saJason Ekstrand2016-10-031-2/+4
| | | | | | | | | | | | | The assertion that a format is uncompressed in the multisample layouts isn't quite right. What we really want to assert is that the format supports multisampling which is a bit more complicated query. We also want to assert that it has a block size of 1x1 since we do nothing with the block size in the phys_level0_sa assignment. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* intel/isl: Add a format_supports_multisampling helperJason Ekstrand2016-10-035-36/+33
| | | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Nanley Chery <[email protected]>
* vl/dri3: fix warning about incompatible pointer typeNayan Deshmukh2016-10-031-1/+1
| | | | | Signed-off-by: Nayan Deshmukh <[email protected]> Reviewed-by: Leo Liu <[email protected]>
* swr: Removed stalling SwrWaitForIdle from queries.Bruce Cherniak2016-10-034-119/+87
| | | | | | | | Previous fundamental change in stats gathering added a temporary SwrWaitForIdle to begin_query and end_query. Code has been reworked to remove stall. Reviewed-by: George Kyriazis <[email protected]>
* swr: [rasterizer core] refactor thread creationTim Rowley2016-10-033-9/+29
| | | | | | | | | | Create worker pool now computes number of worker threads based on things like topologies, etc. and creates the pool but doesn't actually launch the threads. Instead there is a separate start thread pool function. This allows thread resources to be constructed first before threads start. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] canonicalize blend compile stateTim Rowley2016-10-032-0/+39
| | | | | | Canonicalize to prevent unnecessary JIT compiles. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] archrast fixesTim Rowley2016-10-034-7/+14
| | | | | | | - Immediately sleep threads until thread data is initialized - Fix some compile bugs with AR enabled Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer jitter] fixes for icc in vs2015 compat modeTim Rowley2016-10-0312-1404/+1439
| | | | | | | - Move most jitter functionality into SwrJit namespace - Avoid global "using namespace llvm" in headers Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer core] generalize compute dispatch mechanismTim Rowley2016-10-033-4/+15
| | | | | | Generalize compute dispatch mechanism to support other types of dispatches. Signed-off-by: Tim Rowley <[email protected]>
* swr: [rasterizer common] os.h portability header changesTim Rowley2016-10-031-0/+6
| | | | | | | - Fix conflict between windows MemoryFence and llvm::sys::MemoryFence - Declare gettid() Signed-off-by: Tim Rowley <[email protected]>
* anv/formats: Fix build on gcc-4 and earlierVille Syrjälä2016-10-031-4/+19
| | | | | | | | | | | | | | | | | | | | gcc-4 and earlier don't allow compound literals where a constant is required in -std=c99/gnu99 mode, so we can't use ISL_SWIZZLE() when populating the anv_formats[] array. There are a few ways around it: First one would be -std=c89/gnu89, but the rest of the code depends on c99 so it's not really an option. The second option would be to upgrade to gcc-5+ where the compiler behaviour was relaxed a bit [1]. And the third option is just to avoid using compound literals. I chose the last option since it keeps gcc-4 and earlier working. [1] https://gcc.gnu.org/gcc-5/porting_to.html Cc: Jason Ekstrand <[email protected]> Cc: Topi Pohjolainen <[email protected]> Fixes: 7ddb21708c80 ("intel/isl: Add an isl_swizzle structure and use it for isl_view swizzles") Signed-off-by: Ville Syrjälä <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* egl: stop claiming support for pbuffer + msaaTapani Pälli2016-10-031-0/+9
| | | | | | | | | | | | | | | | | | | This fixes a crash in egl-create-msaa-pbuffer-surface Piglit test and same crash in many dEQP EGL tests. I also found that some Qt example did a workaround because of this crash: https://bugreports.qt.io/browse/QTBUG-47509 v2: Ian pointed out that v1 removed support for all multisample configs, including window ones. This one removes pbuffer bit when adding configs, now only pbuffer+msaa gets rejected and window+msaa continues to work. Fixed also comment (Emil) Signed-off-by: Tapani Pälli <[email protected]> Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: rename max_ds_* variable to max_tes_*Timothy Arceri2016-10-037-32/+32
| | | | | | Using consistent naming allows us to create macros more easily. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: rename max_hs_* variables to max_tcs_*Timothy Arceri2016-10-037-32/+32
| | | | | | Using consistent naming allows us to create macros more easily. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Drop pointless stage == MESA_SHADER_FRAGMENT checks.Kenneth Graunke2016-10-021-5/+1
| | | | | | | There's an assert right above this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* glsl: add missing headers to blob.hTimothy Arceri2016-10-021-0/+2
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* nir/spirv/cfg: Detect switch_break after loop_break/continueJason Ekstrand2016-10-011-2/+2
| | | | | | | | | | | While the current CFG code is valid in the case where a switch break also happens to be a loop continue, it's a bit suboptimal. Since hardware is capable of handling the continue as a direct jump, it's better to use a continue instruction when we can than to bother with all of the nasty switch break lowering. Signed-off-by: Jason Ekstrand <[email protected]> Cc: "12.0" <[email protected]>
* nir/spirv/cfg: Handle switches whose break block is a loop continueJason Ekstrand2016-10-011-0/+13
| | | | | | | | | | | | | | | | It is possible that the break block of a switch is actually the continue of the loop containing the switch. In this case, we need to identify the break block as a continue and break out of current level of CFG handling. If we don't, the continue portion of the loop will get handled twice, once by following after the break and a second time by the loop handling code handling it explicitly. This fixes 6 of the new Vulkan CTS tests: - dEQP-VK.spirv_assembly.instruction.graphics.opphi.out_of_order* - dEQP-VK.spirv_assembly.instruction.graphics.selection_block_order.out_of_order* Signed-off-by: Jason Ekstrand <[email protected]> Cc: "12.0" <[email protected]>
* nir/spirv: add spirv2nir binary to .gitignoreEric Engestrom2016-10-011-0/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/spirv: improve mmap() error handlingEric Engestrom2016-10-011-1/+9
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/spirv: improve lseek() error handlingEric Engestrom2016-10-011-2/+10
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/spirv: add some error checking to open()Eric Engestrom2016-10-011-0/+9
| | | | | | CovID: 1373369 Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* mesa: use uint32_t rather than unsigned for xfb struct membersTimothy Arceri2016-10-011-10/+10
| | | | | | | These structs will be written to disk as part of the shader cache so use uint32_t just to be safe. Reviewed-by: Jason Ekstrand <[email protected]>
* i915/i965: remove commented out warningTimothy Arceri2016-10-012-6/+2
| | | | | | | The warning was also the wrong location, it should have been in the else. Reviewed-by: Ian Romanick <[email protected]>
* mesa: move _mesa_valid_to_render() to api_validate.cBrian Paul2016-09-305-191/+195
| | | | | | | Almost all of the other drawing validation code is in api_validate.c so put this function there as well. Reviewed-by: Anuj Phogat <[email protected]>
* gallium/hud: Add support for CPU frequency monitoringSteven Toth2016-09-304-0/+286
| | | | | | | | Detect all of the CPUs in the system. Expose metrics for min, max and current frequency in Hz. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* Revert "gallium/hud: automatically print % if max_value == 100"Marek Olšák2016-09-301-12/+6
| | | | | | | | This reverts commit dbfeb0ec12d6550e68de1bcd164e422e79bccf2d. With max_value being rounded to 100, it's often wrong. Reviewed-by: Brian Paul <[email protected]>
* docs: update the list of Mesa major versions and API supportBrian Paul2016-09-301-0/+25
| | | | Reviewed-by: Emil Velikov <[email protected]>
* gallium/radeon: fix crash/regression in performance countersNicolai Hähnle2016-09-301-0/+9
| | | | | | | Regression introduced by "gallium/radeon: zero all query buffers". Cc: Michel Dänzer <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: update documentation of buffer_get_virtual_addressNicolai Hähnle2016-09-301-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium/radeon: emit relocations for query fencesNicolai Hähnle2016-09-304-9/+15
| | | | | | | | | This is only needed for r600 which doesn't have ARB_query_buffer_object and therefore wouldn't really need the fences, but let's be optimistic about filling in this feature gap eventually. Cc: Dieter Nützel <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* radeon/uvd: adjust the buffer offset when relocation is usedNicolai Hähnle2016-09-301-0/+1
| | | | | | | We don't plan to use sub-allocated buffers with UVD, but just in case one slips through, this increases the chances of things working out anyway. Reviewed-by: Christian König <[email protected]>
* radeon/vce: adjust the buffer offset when relocation is usedNicolai Hähnle2016-09-301-0/+1
| | | | | | | We don't plan to use sub-allocated buffers with VCE, but just in case one slips through, this increases the chances of things working out anyway. Reviewed-by: Christian König <[email protected]>
* radeon/video: don't use sub-allocated buffersNicolai Hähnle2016-09-302-1/+10
| | | | | | | Cc: Christian König <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97976 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97969 Reviewed-by: Christian König <[email protected]>
* gallium/hud: Add power sensor supportSteven Toth2016-09-294-5/+45
| | | | | | | | | | | | | Implement support for power based sensors, reporting units in milli-watts and watts. Also, minor cleanup - change the related if block to a switch. Tested with two different power sensors, including the nouveau 'power1' sensors on a GTX950 card. Signed-off-by: Steven Toth <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* nv50/ir: teach insnCanLoad() about SHLADDSamuel Pitoiset2016-09-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Commutativity is not allowed with SHLADD, but src2 can accept loads. To allow the load propagation pass to do its job, add a special case like for SUCLAMP because src1 is always an immediate. This IMAD to SHLADD optimization helps a bunch of shaders from Tomb Raider, Victor Vran, UE4 demos (+15% perf with Elemental) and Shadow Warrior. GF100/GK104: total instructions in shared programs :2838045 -> 2834712 (-0.12%) total gprs used in shared programs :396684 -> 396386 (-0.08%) total local used in shared programs :34416 -> 34416 (0.00%) local gpr inst bytes helped 0 326 1105 1105 hurt 0 55 3 3 Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SHLADD(a, b, c) to MOV((a << b) + c)Samuel Pitoiset2016-09-291-0/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize SHLADD(a, b, 0x0) to SHL(a, b)Samuel Pitoiset2016-09-291-0/+8
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>
* nv50/ir: optimize IMAD to SHLADD in presence of power of 2Samuel Pitoiset2016-09-291-0/+7
| | | | | | | Only and only if src1 is a power of 2 we can replace IMAD by SHLADD. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Ilia Mirkin <[email protected]>