summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* iris: Properly support alpha and luminance-alpha formatsKenneth Graunke2019-03-074-85/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For texturing, we map alpha formats to the corresponding red format, as many alpha formats are outright missing, and red is more efficient when sampling anyway. When rendering to A8_UNORM, we use that format directly, so the image gets the shader output's .a/.w channel, rather than the .r/.x channel. All other A* formats are non-renderable, so we can't do much and just mark them as unsupported for rendering. Fortunately, GL only requires rendering to A8_UNORM, so that works out. According to Andre Heider and Timur Kristóf, this fixes font rendering in Witcher 1 (via nine). Andre also reported that it fixes Unigine Heaven (presumably via nine). v2: Use the same swizzle for both sampler views and "render targets". BLORP expects the read swizzle, and will take the inverse when setting up the destination swizzle (and actually applying it in the shaders). We ignore the format swizzle when setting up normal rendering SURFACE_STATEs, which is necessary because it would be an illegal shader channel select combination. Thanks to Jason Ekstrand for pointing out that BLORP took an inverse swizzle. Tested-by: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Defer uploading sampler state tables until draw timeKenneth Graunke2019-03-073-20/+44
| | | | | | | | | | | | | | | | | | | | | | | | | Gallium might call us multiple times to bind subsets of the samplers, at which point we'd recreate the table a bunch of times. It doesn't really buy us anything to do it here - even if we defer to draw time, the dirty tracking ensures we'll only do it on the first draw after a bind_sampler_states() call. We now use the number of samplers specified by the shader instead of the binding count. If this number changes, we flag sampler state as dirty so we re-upload a table with the right number of entries. This also fixes a bug where ice->state.need_border_colors was never unset, so once something needed border colors, the pool would always be pinned in all future batches. v2: Explicitly flag sampler states as dirty, rather than assuming that bind_sampler_states() will be called if the program texture count changes. While this may be true for st/mesa, it isn't the case for Gallium HUD. Tested-by: Timur Kristóf <[email protected]> Tested-by: Andre Heider <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* iris: Plumb through ISL_SWIZZLE_IDENTITY in buffer surface emittersKenneth Graunke2019-03-071-6/+8
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* isl: Add a swizzle parameter to isl_buffer_fill_state()Kenneth Graunke2019-03-076-4/+15
| | | | | | | This is necessary for legacy texture buffer object formats, where we'll need to use a swizzle to fake e.g. luminance. Reviewed-by: Jason Ekstrand <[email protected]>
* iris: fix decode_get_bo callbackLionel Landwerlin2019-03-071-1/+3
| | | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: acb50d6b1ff1b7 ("intel/decoders: handle decoding MI_BBS from ring") Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]>
* virgl: remove unused variableErik Faye-Lund2019-03-071-1/+0
| | | | | | | This variable is now unused, so let's remove it. Fixes: 9c4930946a5 (virgl: add encoder functions for new protocol) Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: remove unused variableErik Faye-Lund2019-03-071-1/+0
| | | | | | | This variable is now unused, so let's remove it. Fixes: db77573d7ba (virgl: modify how we handle GL_MAP_FLUSH_EXPLICIT_BIT) Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: remove unused variableErik Faye-Lund2019-03-071-1/+0
| | | | | | | This variable is now unused, so let's remove it. Fixes: c19aedcf1a8 (virgl: don't mark unclean after a flush) Reviewed-by: Gurchetan Singh <[email protected]>
* virgl: remove unused variablesErik Faye-Lund2019-03-071-3/+0
| | | | | | | | These variables are now unused, let's remove them to get rif of a few warnings. Fixes: f0e71b10888 (virgl: use transfer queue) Reviewed-by: Gurchetan Singh <[email protected]>
* iris: fix decoder callLionel Landwerlin2019-03-071-1/+1
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Fixes: acb50d6b1ff1b7 ("intel/decoders: handle decoding MI_BBS from ring")
* intel/aub_write: factorize context image/pphwsp/ring creationLionel Landwerlin2019-03-075-178/+161
| | | | | | | | | | | We allocate GGTT entries and physical addresses are we create engines rather than having a fixed layout. Context images now receive a parameter argument which is used to setup pml4 & ring buffer addresses. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/aub_write: turn context images arrays into functionsLionel Landwerlin2019-03-074-242/+306
| | | | | | | | | | | We'll make them more parameterized in a later commit. As this is just a transitional commit, we allow ourself to leak the context images allocated in get_context_init(). We'll fix this in the next commit. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/aub_write: store the physical page allocator in structLionel Landwerlin2019-03-072-15/+33
| | | | | | | We want to use this allocator in the next commit for GGTT pages. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/aub_write: log mmio writesLionel Landwerlin2019-03-071-0/+5
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/aub_write: switch to use i915_drm engine classesLionel Landwerlin2019-03-074-44/+59
| | | | | | | | | Prepare aub write to deal with multiple engine instances. We don't pass the instance number yet this could be done in the future by having a 2 dimensional array of struct engine. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Rafael Antognolli <[email protected]>
* intel/aub_write: break execlist write in 2Lionel Landwerlin2019-03-071-41/+67
| | | | | | | | We want to reuse the execlist submission, but won't need the ring buffer update. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/aub_write: write header in initLionel Landwerlin2019-03-074-82/+84
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/aub_write: split comment section from HW setupLionel Landwerlin2019-03-074-26/+52
| | | | | | | | In the future we'll want error2aub to reuse the context image saved by i915 instead of the default one we write in intel_dump_gpu. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/aub_read: reuse defines from gen_contextLionel Landwerlin2019-03-071-12/+13
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/decoders: limit number of decoded batchbuffersLionel Landwerlin2019-03-074-2/+27
| | | | | | | | | | | IGT has a test to hang the GPU that works by having a batch buffer jump back into itself, trigger an infinite loop on the command stream. As our implementation of the decoding is "perfectly" mimicking the hardware, our decoder also "hangs". This change limits the number of batch buffer we'll decode before we bail to 100. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/decoders: handle decoding MI_BBS from ringLionel Landwerlin2019-03-0711-17/+19
| | | | | | | | | An MI_BATCH_BUFFER_START in the ring buffer acts as a second level batchbuffer (aka jump back to ring buffer when running into a MI_BATCH_BUFFER_END). Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* intel/decoders: add address space indicator to get BOsLionel Landwerlin2019-03-078-42/+65
| | | | | | | Some commands like MI_BATCH_BUFFER_START have this indicator. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Rafael Antognolli <[email protected]>
* vulkan/overlay: fix missing var rename in previous commitEric Engestrom2019-03-071-1/+1
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* vulkan/util: use the platform defines in vk.xml instead of hard-coding themEric Engestrom2019-03-071-9/+10
| | | | | | | See also: 3d4238d26c5de4a0f7a5 "anv: use the platform defines in vk.xml instead of hard-coding them" Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* iris: add support for tgsi_to_nirAndre Heider2019-03-071-3/+8
| | | | | | | | | | | The Gallium Nine state tracker now works on iris. Also tested with GALLIUM_HUD and Star Wars: Knights of the Old Republic on WINE (GL_ATI_fragment_shader). Signed-off-by: Andre Heider <[email protected]> Reviewed-by: Timur Kristóf <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir: free dead_ctx in case of no progressTapani Pälli2019-03-071-1/+3
| | | | | | | | | | | | | | | Fixes a leak: ==7576== 320 (48 direct, 272 indirect) bytes in 1 blocks are definitely lost in loss record 26 of 26 ==7576== at 0x4C2EE3B: malloc (vg_replace_malloc.c:309) ==7576== by 0x53EF0E4: ralloc_size (ralloc.c:119) ==7576== by 0x53EF0C2: ralloc_context (ralloc.c:113) ==7576== by 0x5471F64: nir_split_per_member_structs (nir_split_per_member_structs.c:176) ==7576== by 0x51288CF: anv_shader_compile_to_nir (anv_pipeline.c:216) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* anv: call blob_finish when done with itTapani Pälli2019-03-071-0/+3
| | | | | | | | | | | | | | | | | Fixes leaks from anv_device_upload_nir: ==7345== 8,192 bytes in 2 blocks are definitely lost in loss record 24 of 24 ==7345== at 0x4C2ED78: malloc (vg_replace_malloc.c:308) ==7345== by 0x4C31393: realloc (vg_replace_malloc.c:836) ==7345== by 0x54E0848: grow_to_fit (blob.c:67) ==7345== by 0x54E0BE5: blob_reserve_bytes (blob.c:166) ==7345== by 0x54E0C7C: blob_reserve_intptr (blob.c:186) ==7345== by 0x54704A7: nir_serialize (nir_serialize.c:1091) ==7345== by 0x512F97D: anv_device_upload_nir (anv_pipeline_cache.c:756) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* anv: use anv_gem_munmap in block pool cleanupTapani Pälli2019-03-071-1/+5
| | | | | | | | | | | | | | | | Use anv_gem_munmap for unmap when softpin in use, this corresponds to anv_gem_mmap used in anv_block_pool_expand_range. This fixes valgrind errors seen for each pool when softpin is in use: ==25581== 262,144 bytes in 1 blocks are definitely lost in loss record 31 of 31 ==25581== at 0x50E77E8: anv_gem_mmap (anv_gem.c:96) ==25581== by 0x50EEE2B: anv_block_pool_expand_range (anv_allocator.c:543) ==25581== by 0x50EEB51: anv_block_pool_init (anv_allocator.c:477) ==25581== by 0x50EF7EF: anv_state_pool_init (anv_allocator.c:920) ==25581== by 0x510B8EB: anv_CreateDevice (anv_device.c:2031) Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* iris: Fix MOCS for blits and clearsKenneth Graunke2019-03-065-20/+27
| | | | I915_MOCS_CACHED is the wrong value. Expose mocs() and use that.
* st/glsl: start spilling out common st glsl conversion codeTimothy Arceri2019-03-067-122/+222
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NIR and TGSI paths are currently intertwined which makes it not only hard to follow but also makes it hard to take advantage of the differences in IR. Here we take the first step to splitting that path apart. With this we take the opportunity to no longer call the GLSL IR optimisation passes after the final lowering calls for NIR. We can instead just use the NIR passes which can produce better code and should also result in faster compile times. The speed-up can be measured in some dolphin uber shaders due to no longer calling lower_if_to_cond_assign() for example dolphin/ubershaders/120.shader_test goes from ~1.63 -> ~1.53 seconds on my machine. There are some code changes as a result of not calling lower_if_to_cond_assign(), this is because it flattens ifs that contain UBOs where as NIR's peephole select doesn't. This is were most of the regressions in Max Waves happens with shader-db. shader-db results (VEGA): Totals from affected shaders: SGPRS: 2349056 -> 2349640 (0.02 %) VGPRS: 1322160 -> 1323300 (0.09 %) Spilled SGPRs: 21190 -> 21527 (1.59 %) Spilled VGPRs: 99 -> 99 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 72 -> 72 (0.00 %) dwords per thread Code Size: 57260904 -> 57270932 (0.02 %) bytes Compile Time: 1107186 -> 1022942 (-7.61 %) milliseconds LDS: 786 -> 786 (0.00 %) blocks Max Waves: 391932 -> 391619 (-0.08 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Eric Anholt <[email protected]>
* radeonsi/nir: stop calling nir_lower_returns()Timothy Arceri2019-03-061-1/+0
| | | | | | We now call this for all drivers in glsl_to_nir() instead. Reviewed-by: Eric Anholt <[email protected]>
* i965: stop calling nir_lower_returns()Timothy Arceri2019-03-061-3/+1
| | | | | | We now call this for all drivers in glsl_to_nir() instead. Reviewed-by: Eric Anholt <[email protected]>
* glsl: use NIR function inlining for drivers that use glsl_to_nir()Timothy Arceri2019-03-066-8/+89
| | | | | | | | glsl_to_nir() is still missing support for converting certain functions to NIR, so for those we use the GLSL IR optimisations to remove the functions. Reviewed-by: Eric Anholt <[email protected]>
* glsl/freedreno/panfrost: pass gl_context to the standalone compilerTimothy Arceri2019-03-065-8/+15
| | | | | | | This allows us to use the ctx with glsl_to_nir() in a following patch. Reviewed-by: Eric Anholt <[email protected]>
* vulkan/overlay: drop dependency on validation layer headersLionel Landwerlin2019-03-064-213/+40
| | | | | | | | | v2: reimplement layer chain info getters (Eric) v3: make it compile.. (Lionel) Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* vulkan/util: generate instance/device dispatch tablesLionel Landwerlin2019-03-065-24/+148
| | | | | | | | This will be used by the overlay instead of system installed validation layers helpers. Signed-off-by: Lionel Landwerlin <[email protected]> Acked-by: Eric Engestrom <[email protected]>
* vulkan/util: make header available from c++Lionel Landwerlin2019-03-061-1/+9
| | | | | Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* iris: setup EdgeFlag Vertex Element when needed.Jose Maria Casanova Crespo2019-03-063-15/+86
| | | | | | | | | | | | | | | | If Vertex Shader uses EdgeFlag the hardware request that it is setup as the last VERTEX_ELEMENT_STATE. If SGVS are add at draw time we need to also reconfigure the last 3DSTATE_VF_INSTANCING so its VertexElementIndex points to the new Vertex Element that contains the EdgeFlag. So if draw parameters or edgeflag are not used the CSO generated at iris_create_vertex_element is sent directly in the batches. But if edge flag is used we adjust last VERTEX_ELEMENT_STATE and last 3DSTATE_VF_INSTANCING using their alternative edge flag version we generate at iris_create_vertex_element and store at the CSO. Reviewed-by: Kenneth Graunke <[email protected]>
* v3d: Include a count of register pressure in the RA failure dumps.Eric Anholt2019-03-061-1/+13
| | | | | | You usually want to go find the highest pressure and figure out why you couldn't spill or what pattern led to a bunch of pressure leading to that point.
* radv: enable lower_mul_2x32_64Samuel Pitoiset2019-03-061-0/+1
| | | | | | Fixes: 58bcebd987b ("spirv: Allow [i/u]mulExtended to use new nir opcode") Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* st/nir: Move 64-bit lowering laterJason Ekstrand2019-03-061-2/+5
| | | | | | | | | | | | | | Now that we have a loop unrolling cost function and loop unrolling isn't going to kill us the moment we have a 64-bit op in a loop, we can go ahead and move 64-bit lowering later. This gives us the opportunity to do more optimizations and actually let the full optimizer run even on 64-bit ops rather than hoping one round of opt_algebraic will fix everything. This substantially reduces both fp64 shader compile times and the resulting code size. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/nir: Move 64-bit lowering laterJason Ekstrand2019-03-061-21/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a loop unrolling cost function and loop unrolling isn't going to kill us the moment we have a 64-bit op in a loop, we can go ahead and move 64-bit lowering later. This gives us the opportunity to do more optimizations and actually let the full optimizer run even on 64-bit ops rather than hoping one round of opt_algebraic will fix everything. This substantially reduces both fp64 shader compile times and the resulting code size. On the vs-isnan-dvec test from piglit: Before this commit: 1684.63s user 17.29s system 99% cpu 28:28.24 total 101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills. Peak memory usage (according to massif): 1.435 GB After this commit: 179.64s user 7.75s system 99% cpu 3:07.92 total 57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills. Peak memory usage (according to massif): 531.0 MB Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/lower_doubles: Inline functions directly in lower_doublesJason Ekstrand2019-03-069-63/+53
| | | | | | | | | | | | Instead of trusting the caller to already have created a softfp64 function shader and added all its functions to our shader, we simply take the softfp64 shader as an argument and do the function inlining ouselves. This means that there's no more nasty functions lying around that the caller needs to worry about cleaning up. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/deref: Expose nir_opt_deref_implJason Ekstrand2019-03-062-1/+2
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nir/inline_functions: Break inlining into a builder helperJason Ekstrand2019-03-063-40/+60
| | | | | | | | | | | | | This pulls the guts of function inlining into a builder helper so that it can be used elsewhere. The rest of the infrastructure is still needed for most inlining cases to ensure that everything gets inlined and only ever once. However, there are use-cases where you just want to inline one little thing. This new helper also has a neat trick where it can seamlessly inline a function from one nir_shader into another. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/nir: Inline functions in float64_funcs_to_nirJason Ekstrand2019-03-061-0/+5
| | | | | | | | | | This doesn't really change anything as the functions will all get inlined anyway. However it does let us do a bit of the work earlier and in a common place. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* glsl/nir: Add a shared helper for building float64 shadersJason Ekstrand2019-03-067-99/+70
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/nir: Drop an unneeded lower_constant_initializers callJason Ekstrand2019-03-061-2/+0
| | | | | | | | | | | Even though this is technically a step in the function inlining process as laid out in nir_inline_functions.c, it's not really needed. We already have constant initializers lowered here and no new ones are added by appending the softfp64 functions. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/debug: Add a debug flag to force software fp64Jason Ekstrand2019-03-063-2/+4
| | | | | | Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Compile the fp64 program based on nir optionsJason Ekstrand2019-03-061-1/+2
| | | | | | | | | | Instead of looking the devinfo directly, look at the lowering options we provided to NIR. This is more accurate as it's now checking for "do we need full software lowering" rather than a hardware bit. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>