summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* winsys/amdgpu: simplify amdgpu_cs_add_buffer() a bitSamuel Pitoiset2017-04-171-4/+3
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965/drm: Delete NULL check in brw_bo_unmap().Kenneth Graunke2017-04-161-3/+0
| | | | | | | | | | | | | | I accidentally moved the bo->bufmgr dereference above the NULL check when cleaning up this code. While passing NULL to free() is a common pattern...passing NULL to unmap seems pretty bad. You really ought to know whether you have a buffer or not. We don't want to paper over bugs like that. So, just drop the NULL check altogether. CID: 1405006 Reviewed-by: Chris Wilson <[email protected]>
* intel/decoder: Fix is_header_field starting condition.Kenneth Graunke2017-04-161-1/+1
| | | | | | | | | | | Starting positions >= 32 are not part of the header, rather than >. Caught by Coverity, which found that "bits <<= field->start" may shift by 32, which has undefined behavior. CID: 1404968 Reviewed-by: Lionel Landwerlin <[email protected]>
* i965/drm: Remove dead return in brw_bo_busy()Kenneth Graunke2017-04-161-3/+1
| | | | | | | | If ret is 0, we return. If ret is not 0, we return. This is dead. CID: 1405013 (Structurally dead code (UNREACHABLE)) Reviewed-by: Topi Pohjolainen <[email protected]>
* android: amd/addrlib: trivial fix for gfx9 supportMauro Rossi2017-04-171-0/+2
| | | | | | | | | | | | Fixes the following build error: external/mesa/src/amd/addrlib/gfx9/gfx9addrlib.cpp:36:10: fatal error: 'gfx9_gb_reg.h' file not found ^ 1 error generated. Fixes: 7f160ef "amd/addrlib: import gfx9 support" Reviewed-by: Nicolai Hähnle <[email protected]> Reviewed-by: Edward O'Callaghan <[email protected]>
* nir: Add GLSL_TYPE_[U]INT64 to some switch statementsJason Ekstrand2017-04-162-0/+4
| | | | | Reviewed-by: Alejandro Piñeiro <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* gallium/radeon: always flush asynchronously and wait after begin_new_csMarek Olšák2017-04-172-4/+11
| | | | | | | | | | This hides the overhead of everything in the driver after the CS flush and before returning from pipe_context::flush. Only microbenchmarks will benefit. +2% FPS for glxgears. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove local variable 'mod' from si_compile_tgsi_shaderMarek Olšák2017-04-171-5/+2
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: add si_shader_selector::vs_needs_prologMarek Olšák2017-04-173-7/+10
| | | | | | cleanup Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't set VGT_GS_MODE as part of the GS stateMarek Olšák2017-04-171-2/+0
| | | | | | The VS state sets it. Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't allow user indices with indirect drawsMarek Olšák2017-04-171-4/+4
| | | | | | | Not possible with GL and it will make future gallium rework easier. (also it's something I wouldn't like to support) Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: merge two if (indirect) statementsMarek Olšák2017-04-171-27/+25
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: don't mark non-dirty textures with CMASK as compressedMarek Olšák2017-04-171-2/+3
| | | | | | | because the compression is skipped with non-dirty textures. Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* docs: Document interaction Fixes tag and stable branches.Bas Nieuwenhuizen2017-04-151-0/+4
| | | | | | | For the next time I forget. Signed-off-by: Bas Nieuwenhuizen <[email protected]> Reviewed-by: Emil Velikov <[email protected]>
* glsl: don't run the GLSL pre-processor when we are skipping compilationTimothy Arceri2017-04-152-9/+20
| | | | | | | | | | | | | | | | | | | | This moves the hashing of shader source for the cache lookup to before the preprocessor. In our experience, shaders are unlikely to hash the same after preprocessing if they didn't hash the same before, so we can skip preprocessing for cache hits. Improves Deus Ex start-up times with a warm cache from ~30 seconds to ~22 seconds. Also fixes the leaking of state. V2: fix indentation v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader. Tested-by (v2): Grazvydas Ignotas <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* glsl: delay optimisations on individual shaders when cache is availableTimothy Arceri2017-04-154-78/+96
| | | | | | | | | | | | | | | | | | | | | | Due to a max limit of 65,536 entries on the index table that we use to decide if we can skip compiling individual shaders, it is very likely we will have collisions. To avoid doing too much work when the linked program may be in the cache this patch delays calling the optimisations until link time. Improves cold cache start-up times on Deus Ex by ~20 seconds. When deleting the cache index to simulate a worst case scenario of collisions in the index, warm cache start-up time improves by ~45 seconds. V2: fix indentation, make sure to call optimisations on cache fallback, make sure optimisations get called for XFB. Tested-by: Grazvydas Ignotas <[email protected]> Tested-by: Dieter Nützel <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* anv: Add the pci_id into the shader cache UUIDJason Ekstrand2017-04-141-5/+15
| | | | | | | | | | This prevents a user from using a cache created on one hardware generation on a different one. Of course, with Intel hardware, this requires moving their drive from one machine to another but it's still possible and we should prevent it. Reviewed-by: Chad Versace <[email protected]> Cc: [email protected]
* etnaviv: native fence fd supportPhilipp Zabel2017-04-157-7/+83
| | | | | | | | | | This adds native fence fd support to etnaviv, similarly to commit 0b98e84e9ba0 ("freedreno: native fence fd"), enabled for kernel driver version 1.1 or later. Signed-off-by: Philipp Zabel <[email protected]> Reviewed-By: Wladimir J. van der Laan <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* docs: mark GL_ARB_vertex_attrib_64bit and OpenGL 4.2 as supported by i965/gen7+Francisco Jerez2017-04-142-4/+7
| | | | | | | | | | v2 (Andreas Boll): - Mark GL 4.1 as supported by i965/gen7+ - Mark GL_ARB_shader_precision as supported by i965/gen7+ - Update release notes Reviewed-by: Andreas Boll <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: enable OpenGL 4.2 in IvybridgeJuan A. Suarez Romero2017-04-142-2/+2
| | | | | Reviewed-by: Andreas Boll <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: enable ARB_shader_precision in gen7+Samuel Iglesias Gonsálvez2017-04-141-1/+1
| | | | | Reviewed-by: Andreas Boll <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: enable ARB_vertex_attrib_64bit for gen7+Juan A. Suarez Romero2017-04-141-1/+1
| | | | | Reviewed-by: Andreas Boll <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* swr: Fix swr osmesa buildGeorge Kyriazis2017-04-141-1/+1
| | | | | | Use GALLIUM_SWR to standardize Reviewed-by: Emil Velikov <[email protected]>
* etnaviv: SINGLE_BUFFER support on GC3000Wladimir J. van der Laan2017-04-158-28/+63
| | | | | | | | | | | | | | | | | | | | This patch adds support for the SINGLE_BUFFER feature on GC3000 GPUs, which allows rendering to a single buffer using multiple pixel pipes. This feature is always used when it is available, which means that multi-tiled formats are no longer being used in that case, and all buffers will be normal (super)tiled. This mimics the behavior of the blob on GC3000. - Because the same format can be used to render to and texture from, this avoids an extra resolve pass when rendering to texture. - i.MX6qp includes a PRE which can scan-out directly from tiled formats, avoiding untiling overhead. Signed-off-by: Wladimir J. van der Laan <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: Update includes from rnndbWladimir J. van der Laan2017-04-155-20/+91
| | | | | | | | | Update to etna_viv commit 8486a97. austriancoder: changed patch to include isa redefinition fix. Signed-off-by: Wladimir J. van der Laan <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: Add chipMinorFeatures4 and 5Wladimir J. van der Laan2017-04-152-1/+15
| | | | | | | | Request chipMinorFeatures bitfields 4 and 5 from the drm driver. Signed-off-by: Wladimir J. van der Laan <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: resolve tile status when flushing resourcePhilipp Zabel2017-04-152-0/+11
| | | | | | | | | | | | | | | | When passing render buffers from EGL clients to a wayland compositor, the resource tile status must be resolved because otherwise the tile status is lost in the transfer and cleared parts of the buffer will contain old contents. The same applies when sampling directly from a renderable resource. lst: Add seqno tracking, to skip flush when not needed. Fixes: aadcb5e94b35 ("etnaviv: enable TS, but disable autodisable") Signed-off-by: Philipp Zabel <[email protected]> Signed-off-by: Lucas Stach <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* etnaviv: stop repeatedly resolving an unchanged resource into its scanout ↵Philipp Zabel2017-04-151-1/+4
| | | | | | | | | | | prime buffer Before resolving a resource into its scanout prime buffer, check that the prime resource is actually older. If it is not, the resolve is an expensive no-op, and we better skip it. Signed-off-by: Philipp Zabel <[email protected]> Reviewed-by: Christian Gmeiner <[email protected]>
* swr: Add polygon stipple supportGeorge Kyriazis2017-04-145-9/+84
| | | | | | | | | Add polygon stipple functionality to the fragment shader. Explicitly turn off polygon stipple for lines and points, since we do them using tris. Reviewed-by: Bruce Cherniak <[email protected]>
* docs/relnotes: add GL_ARB_gpu_shader_fp64 support on i965/ivybridgeSamuel Iglesias Gonsálvez2017-04-141-0/+1
| | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* docs: mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as supported by i965/gen7+Samuel Iglesias Gonsálvez2017-04-141-2/+2
| | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Acked-by: Francisco Jerez <[email protected]>
* i965: enable OpenGL 4.0 to Ivybridge/BaytrailSamuel Iglesias Gonsálvez2017-04-142-5/+6
| | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: enable ARB_gpu_shader_fp64 for Ivybridge/BaytrailSamuel Iglesias Gonsálvez2017-04-141-1/+1
| | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Use correct VertStride on align16 instructions.Matt Turner2017-04-141-10/+34
| | | | | | | | | | | | | | | | | | | | | | In commit c35fa7a, we changed the "width" of DF source registers to 2, which is conceptually fine. Unfortunately a VertStride of 2 is not allowed by align16 instructions on IVB/BYT, and the regular VertStride of 4 works fine in any case. See generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/vs-round-double.shader_test for example: cmp.ge.f0(8) g18<1>DF g1<0>.xyxyDF -g8<2>DF { align16 1Q }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed cmp.ge.f0(8) g19<1>DF g1<0>.xyxyDF -g9<2>DF { align16 2N }; ERROR: In Align16 mode, only VertStride of 0 or 4 is allowed v2: - Add spec quote (Curro). - Change the condition to only BRW_VERTICAL_STRIDE_2 (Curro) Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4/dce: improve track of partial flag register writesSamuel Iglesias Gonsálvez2017-04-141-1/+1
| | | | | | | | | | | | | | | This is required for correctness in presence of multiple 4-wide flag writes (e.g. 4-wide instructions with a conditional mod set) which update a different portion of the same 8-bit flag subregister. Right now we keep track of flag dataflow with 8-bit granularity and consider flag writes to have killed any previous definition of the same subregister even if the write was less than 8 channels wide, which can cause live flag register updates to be dead code-eliminated incorrectly. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: don't do horizontal stride on some register file typesSamuel Iglesias Gonsálvez2017-04-141-2/+5
| | | | | | | | | | | | | horiz_offset() shouldn't be doing anything for scalar registers, because all channels of any SIMD instructions will end up reading or writing the same component of the register, so shifting the register offset would be wrong. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Re-implement in terms of is_uniform() for simplicity. Pass argument by const reference. Clarify commit message. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: Fix exec size for MOVs {SET,PICK}_{HIGH,LOW}_32BIT.Matt Turner2017-04-141-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise for a pack_double_2x32_split opcode, we emit: vec1 64 ssa_135 = pack_double_2x32_split ssa_133, ssa_134 mov(8) g5<1>UD g5<4>.xUD { align16 1Q compacted }; mov(8) g7<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same mov(8) g5<1>UD g6<4>.xUD { align16 1Q compacted }; mov(8) g7.1<2>UD g5<4,4,1>UD { align1 1Q }; ERROR: When the destination spans two registers, the source must span two registers (exceptions for scalar source and packed-word to packed-dword expansion) mov(8) g8.1<2>UD g5.4<4,4,1>UD { align1 2N }; ERROR: The offset from the two source registers must be the same The intention was to emit mov(4)s for the instructions that have ERROR annotations. See tests/spec/arb_gpu_shader_fp64/execution/vs-isinf-dvec.shader_test for example. v2 (Samuel): - Instead of setting the exec size to a fixed value, don't double it (Curro). - Add PICK_{HIGH,LOW}_32BIT to the condition. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Trivial rebase changes. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: use vec4_builder to emit instructions in setup_imm_df()Samuel Iglesias Gonsálvez2017-04-142-50/+50
| | | | | | | | Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to static stand-alone function. Don't write unused components in the result. Use vec4_builder interface for register allocation. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: consider subregister offset in live variablesJuan A. Suarez Romero2017-04-141-2/+2
| | | | | | | | | | | | | | | | | | Take into account offset values less than a full register (32 bytes) when getting the var from register. This is required when dealing with an operation that writes half of the register (like one d2x in IVB/BYT, which uses exec_size == 4). v2: - Take in account this offset < 32 in liveness analysis too (Curro) v3: - Change formula in var_from_reg() (Curro) - Remove useless changes (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: fix assert to detect SIMD lowered DF instructions in IVBFrancisco Jerez2017-04-141-5/+1
| | | | | | | | | | | On IVB, DF instructions have lowered the SIMD width to 4 but the exec_size will be later doubled. Fix the assert to avoid crashing in this case. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Simplify assert. Except for the 'inst->group % 4 == 0' part the assertion was redundant with the previous assertion. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's typeSamuel Iglesias Gonsálvez2017-04-147-27/+60
| | | | | | | | | | This way we can set the destination type as double to all these new opcodes, avoiding any optimizer's confusion that was happening before. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Drop no_spill workaround originally needed due to the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: split d2x conversion and data gathering from one opcode to two ↵Samuel Iglesias Gonsálvez2017-04-142-8/+1
| | | | | | | | | | | | | | | | explicit ones When doing a 64-bit to a smaller data type size conversion, the destination should be aligned to 64-bits. Because of that, we need to gather the data after the actual conversion. Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but now we split them explicitely in two different instructions: VEC4_OPCODE_FROM_DOUBLE just do the conversion and VEC4_OPCODE_PICK_LOW_32BIT will gather the data. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYTJuan A. Suarez Romero2017-04-141-7/+19
| | | | | | | | | | | | In the generator we must generate slightly different code for Ivybridge/Baytrail, because of the way the stride works in this hardware. v2: - Use stride and don't need to fix dst (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: keep original type when dealing with null registersJuan A. Suarez Romero2017-04-141-0/+2
| | | | | | | | | | | | | | | | Keep the original type when dealing with null registers. Especially because we do no want to introduce an implicit conversion between types that could affect the conditional flags. This affects especially when the original type is DF, and we are working on Ivybridge/Baytrail. v2 (Curro) - Fix typo. - Use retype() instead of applying the type directly. - Remove unneeded retype. Reviewed-by: Francisco Jerez <[email protected]>
* i965/vec4: split DF instructions and later double its execsize in IVB/BYTSamuel Iglesias Gonsálvez2017-04-143-1/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to split DF instructions in two on IVB/BYT as it needs an execsize 8 to process 4 DF values (one GRF in total). v2: - Rename helper and make it static inline function (Matt). - Fix indention and add braces (Matt). v3: - Don't edit IR instruction when doubling exec_size (Curro) - Add comment into the code (Curro). - Manage ARF registers like the others (Curro) v4: - Add get_exec_type() function and use it to calculate the execution size. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Fix bogus 'type != BAD_FILE' check. Take destination type as execution type where there is no valid source. Assert-fail if the deduced execution type is byte. Clarify comment in get_lowered_simd_width(). Move SIMD width workaround outside of 'if (...inst->size_written > REG_SIZE)' conditional block, since the problem should be independent of whether the amount of data written by the instruction is greater or lower than a GRF. Drop redundant is_ivb_df definition. Drop bogus inst->exec_size < 8 check. Simplify channel group assertion. ] Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYTSamuel Iglesias Gonsálvez2017-04-141-0/+9
| | | | | | | | | The hardware applies the same channel enable signals to both halves of the compressed instruction which will be just wrong under non-uniform control flow. Fix this by splitting those instructions to SIMD4. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: Get 64-bit indirect moves working on IVB.Francisco Jerez2017-04-141-2/+25
| | | | Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965: Use source region <1,2,0> when converting to DF.Matt Turner2017-04-142-13/+28
| | | | | | | Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead of two. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]>
* i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECTJuan A. Suarez Romero2017-04-141-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | According to the IVB and HSW PRMs: "2.When the destination requires two registers and the sources are indirect, the sources must use 1x1 regioning mode." So for DF instructions the execution size is not limited by the number of address registers that are available, but by the EU decompression logic not handling VxH indirect addressing correctly. This patch limits the SIMD width to 4 in this case. v2: - Fix typo (Matt). - Fix condition (Curro) v3: - Add spec quote (Curro) Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Signed-off-by: Juan A. Suarez Romero <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: fix dst stride in IVB/BYT type conversionsJuan A. Suarez Romero2017-04-141-27/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | When converting a DF to 32-bit conversions, we set dst stride to 2, to fulfill alignment restrictions because the upper Dword of every Qword will be written with undefined value. But in IVB/BYT, this is not necessary, as each DF conversion already writes 2, the first one the real value, and the second one a 0. That is, IVB/BYT already set stride = 2 implicitly, so we must set it to 1 explicitly to avoid ending up with stride = 4. v2: - Fix typo (Matt) v3: - Fix stride in the destination's brw_reg, don't modity IR (Curro) v4: - Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro) - Fix comment (Curro). - Relax hstride assert (Curro) Signed-off-by: Juan A. Suarez Romero <[email protected]> Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> [ Francisco Jerez: Minor spelling fixes. ] Reviewed-by: Francisco Jerez <[email protected]>