summaryrefslogtreecommitdiffstats
path: root/src/intel
Commit message (Collapse)AuthorAgeFilesLines
* intel/nir: Add a helper for getting BRW_AOP from an intrinsicJason Ekstrand2019-08-214-170/+78
| | | | | | So many duplicated switch statements.... Reviewed-by: Kenneth Graunke <[email protected]>
* nir: Add explicit signs to image min/max intrinsicsJason Ekstrand2019-08-214-22/+46
| | | | | | | | | | | This better matches all the other atomic intrinsics such as those for SSBOs and shared variables where the sign is part of the intrinsic opcode. Both generators (GLSL and SPIR-V) know the sign from the type of the image variable or handle. In SPIR-V, signed min/max are separate opcodes from unsigned. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* anv: inline uniforms blocks don't count toward descriptor set limitsArcady Goldmints-Orlov2019-08-201-0/+23
| | | | | | | | In a descriptor set inline uniform blocks don't use up any bindings. However, the presence of any inline uniform blocks doed require the use of the descriptor buffer, which takes up one binding. Reviewed-by: Jason Ekstrand <[email protected]>
* isl: Enable Unorm Path in Color PipeKenneth Graunke2019-08-152-0/+9
| | | | | | | | | | | | | | | | Improves performance on my Icelake 8x8 locked to 700Mhz. For example, some GfxBench5 subtests have the following results: - [i965] gl_manhattan: ................ 7.01119% +/- 0.180971% (n=5) - [i965] gl_4 (Car Chase): 4.24351% +/- 0.175622% (n=5) - [i965] gl_blending: ................ 3.36327% +/- 0.180267% (n=5) - [i965] gl_5_normal (Aztec Ruins): 1.67962% +/- 0.243534% (n=10) - [iris] gl_manhattan: ................ 3.92357% +/- 0.073965% (n=25) - [iris] gl_4 (Car Chase): 2.17746% +/- 0.0826858% (n=5) - [iris] gl_blending: ................ 2.79599% +/- 0.803652% (n=15) - [iris] gl_5_normal (Aztec Ruins): 1.30930% +/- 0.106523% (n=25) Reviewed-by: Jason Ekstrand <[email protected]>
* anv: Properly initialize device->slice_hash.Rafael Antognolli2019-08-151-2/+2
| | | | | | | | | | | | When subslices_delta == 0 and we take the early return, device->slice_hash is not initialized on GEN11. It then causes a segfault when going through anv_DestroyDevice, if compiled with valgrind. Fixes: 7bc022b4bbc ("anv/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/compiler: Fix resource leak in error pathDanylo Piliaiev2019-08-151-0/+1
| | | | | | | | | CID: 1452261 Fixes: 04a99515 "intel/compiler: add ability to override shader's assembly" Signed-off-by: Danylo Piliaiev <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* intel/tools: Fix aub_file initialization in intel_dump_gpuCaio Marcelo de Oliveira Filho2019-08-121-0/+6
| | | | | | | | | | | | | | | | | | The `device` can be set earlier either by a command line or a by intercepting an ioctl call to get the I915_PARAM_CHIPSET_ID done by the application early. In both cases `aub_file` and `devinfo` would not be initialized. Fix by splitting the conditions - `device == 0`: use the FD to get both device and devinfo. - Or `devinfo.gen == 0`: use `device` to initialize it. And separatedly, initialize aub_file the first time it is needed. Fixes: d594d2a0524 ("intel/tools: use device info initializer") Acked-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* anv/gen11: Emit SLICE_HASH_TABLE when pipes are unbalanced.Rafael Antognolli2019-08-123-0/+78
| | | | | | | If the pixel pipes have a different number of subslices, emit a slice hashing table that will ensure proper workload distribution. v2: Don't need to set the mask - it's mbo (Ken).
* intel: Get information about pixel pipes subslices.Rafael Antognolli2019-08-122-1/+25
| | | | v2: Use 1 instead of 1UL (Ken).
* intel/gen_decoder: Decode SLICE_HASH_TABLE.Rafael Antognolli2019-08-121-0/+8
|
* intel/genxml: Update 3D_MODE and add SLICE_HASH_TABLE.Rafael Antognolli2019-08-121-1/+33
| | | | | | | Add these fields and the 3DSTATE_SLICE_TABLE_STATE_POINTERS instruction so we can properly configure the slice and subslice hashing on ICL+ v2: Make 'Mask' field a mbo (Ken).
* anv: Implement VK_KHR_pipeline_executable_propertiesJason Ekstrand2019-08-125-4/+295
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Add a ralloc context to anv_pipelineJason Ekstrand2019-08-123-0/+9
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Force a full re-compile when CAPTURE_INTERNAL_REPRESENTATION_TEXT is setJason Ekstrand2019-08-123-57/+75
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/pipeline: Split setting up per-stage keys into its own loopJason Ekstrand2019-08-121-3/+8
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Record shader compile stats in the pipeline cacheJason Ekstrand2019-08-124-9/+59
| | | | | | We're going to want these to be available regardless of caching. Reviewed-by: Lionel Landwerlin <[email protected]>
* anv/pipeline: Stash generated code in the pipeline stageJason Ekstrand2019-08-121-42/+47
| | | | Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs: Add SLM size to brw_cs_prog_dataJason Ekstrand2019-08-122-0/+2
| | | | | | | We don't need it for state setup but it's a useful statistic we want to pass on to developers. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/compiler: Fill a compiler statistics structJason Ekstrand2019-08-1212-28/+75
| | | | | | | | | This commit is all annoying plumbing work which just adds support for a new brw_compile_stats struct. This struct provides a binary driver readable form of the same statistics we dump out to stderr when we INTEL_DEBUG is set with a shader stage. Reviewed-by: Lionel Landwerlin <[email protected]>
* intel/fs: add 64 bit integer multiplication loweringPaulo Zanoni2019-08-122-4/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | While NIR's lower_imul64() solves the case of 64 bit integer multiplications generated early, we don't have a way to lower such instructions when they are generated by our own backend, such as the scan/reduce intrinsics. We'll need this soon, so implement it now. An easy way to test this is to simply disable nir_lower_imul64 to let those operations reach the backend. v2: - Fix Q/UQ copy/paste errors (Caio). - Transform an 'if' into 'else if' (Caio). - Add an extra comment to clarify the need for 64b = 32b * 32b (Caio). - Make private functions private (Caio). v3: - Remove ambiguity with 'b' and 'd' variables (Caio). - Allocate potentially less regs for the dwords (Caio). Cc: Jason Ekstrand <[email protected]> Cc: Matt Turner <[email protected]> Cc: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/compiler: invert the logic of lower_integer_multiplication()Paulo Zanoni2019-08-121-13/+10
| | | | | | | | | | | | | | Invert the logic of how progress is handled: remove the continue statements and mark progress inside the places where it actually happens. We're going to add a new lowering that also looks for BRW_OPCODE_MUL, so inverting the logic here makes the resulting code much easier to follow. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/compiler: don't instantiate a builder for each instructionPaulo Zanoni2019-08-122-12/+10
| | | | | | | | | | | | | Don't instantiate a builder for each instruction during lower_integer_multiplication(). Instantiate one only when needed. On the other hand, these unneeded builders don't seem to cost much to init, so I don't expect any significant difference in performance: this is mostly about code organization. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* intel/compiler: extract subfunctions of lower_integer_multiplication()Paulo Zanoni2019-08-122-186/+197
| | | | | | | | | | | | | | The lower_integer_multiplication() function is already a little too big. I want to add more to it, so let's reorganize the existing code first. Let's start with just extracting the current code to subfunctions. Later we'll change them a little more. v2: Make private functions private (Caio). v3: Fix typo (Caio). Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Signed-off-by: Paulo Zanoni <[email protected]>
* nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_uboRhys Perry2019-08-121-1/+1
| | | | | | | | | | v2: add to series v3: update Makefile.sources v4: don't remove a comment and break statement v4: use nir_can_move_instr Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* anv/gen9: Optimize slice and subslice load balancing behavior.Francisco Jerez2019-08-124-0/+109
| | | | | | | | | | | | | | | See "i965/gen9: Optimize slice and subslice load balancing behavior." for the rationale. According to Jason, improves Aztec Ruins performance by 2.7%. Reviewed-by: Kenneth Graunke <[email protected]> (v1) v2: Undo CPU performance micro-optimization done in i965 and iris due to lack of data justifying it on anv. Use cmd_buffer_apply_pipe_flushes wrapper instead of emitting pipe control command directly. (Jason) Reviewed-by: Jason Ekstrand <[email protected]>
* intel/genxml: Add GT_MODE hashing defs for Gen9.Francisco Jerez2019-08-121-0/+17
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* anv: Implement VK_EXT_subgroup_size_control version 2Jason Ekstrand2019-08-122-1/+9
| | | | | | | The version bump adds a proper features struct. Fixes: d10de253097 "anv: Implement VK_EXT_subgroup_size_control" Reviewed-by: Eric Engestrom <[email protected]>
* anv: add missing `break`Eric Engestrom2019-08-091-0/+1
| | | | | | Fixes: f6e7de41d7b15185b746 ("anv: Implement VK_EXT_line_rasterization") Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: drop unused codeLionel Landwerlin2019-08-091-17/+0
| | | | | | | We stopped using this when we moved to Jason's mi_builder. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv/android: disable shared representable image support explicitlyTapani Pälli2019-08-091-0/+10
| | | | | | | | | | | | Android 9 loader conditionally advertises VK_KHR_shared_presentable_image extension based on this property and it looks like it does not initialize the struct before query. Pragmas are added to ignore warnings with Android specific structure types in same manner as commit 8d386e6eef8 did. Signed-off-by: Tapani Pälli <[email protected]> Acked-by: Jason Ekstrand <[email protected]>
* intel/perf: use MAJOR_IN_SYSMACROS/MAJOR_IN_MKDEVGreg V2019-08-081-0/+4
| | | | | Reviewed-by: Eric Engestrom <[email protected]> Fixes: 134e750e16bfc53480e0 ("i965: extract performance query metrics")
* i965/tiled_memcpy: avoid creating bswap32 if it exists as a macro (e.g. on ↵Greg V2019-08-081-0/+3
| | | | | | FreeBSD) Reviewed-by: Eric Engestrom <[email protected]>
* anv: add MAP_POPULATE fallback define for portabilityGreg V2019-08-081-0/+4
| | | | | | FreeBSD does not have MAP_POPULATE Reviewed-by: Eric Engestrom <[email protected]>
* anv: remove unused Linux-specific includeGreg V2019-08-081-1/+0
| | | | | | Fixes: 4201cc2dd3a ("anv: Implement VK_KHX_external_semaphore_fd") Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Eric Engestrom <[email protected]>
* anv,i965,iris: deduplicate setting of total_sharedRhys Perry2019-08-082-2/+1
| | | | | | | | v5: add patch Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* anv: use derefs for shared memory accessRhys Perry2019-08-081-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vkpipeline-db for my Skylake GPU: total instructions in shared programs: 8847602 -> 8847896 (<.01%) instructions in affected programs: 10165 -> 10459 (2.89%) helped: 8 HURT: 2 total cycles in shared programs: 1606273555 -> 1606251634 (<.01%) cycles in affected programs: 2201803 -> 2179882 (-1.00%) helped: 7 HURT: 3 The shaders with more instructions is due to a loop over a shared array in Three Kingdoms being unrolled (and creating a lot of nested ifs). Not sure if that's good or bad. One of the shaders with worse cycles is only worse by 0.04% and the other two are the shaders with loops unrolled. v2: add patch v4: don't set spirv_options.shared_addr_format v4: move comment concerning the shared address format used and NULL v4: add vkpipeline-db results v5: rename to nir_lower_vars_to_explicit_types v5: move setting of total_shared to outside brw_compile_cs v6: set shared_addr_format v6: formatting changes Signed-off-by: Rhys Perry <[email protected]> Reviewed-by: Caio Marcelo de Oliveira Filho <[email protected]> (v5) Reviewed-by: Jason Ekstrand <[email protected]>
* anv: support GetSwapchainGrallocUsage2ANDROID for AndroidTapani Pälli2019-08-083-22/+88
| | | | | | | | | | | | | | New function supports gralloc1 usage flags that get set separately for producer and consumer. As we still need to support old method too, let's share common code and use android_convertGralloc0To1Usage helper. Bump the VK_ANDROID_native_buffer version to indicate support for the new call. Changes were tested on Android Celadon P with Basemark GPU and various Sascha Willems Vulkan demos. Signed-off-by: Tapani Pälli <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* intel/perf: fix debug typoMark Janes2019-08-071-5/+5
| | | | | | Misspelling was seen with INTEL_DEBUG=perfmon. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: make gen_perf_query_object privateMark Janes2019-08-072-72/+80
| | | | | | Encapsulate the details of this structure within the perf implemenation. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: make perf context privateMark Janes2019-08-072-64/+109
| | | | | | Encapsulate the details of this data structure. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: print debug informationMark Janes2019-08-072-0/+35
| | | | | | | | | INTEL_DEBUG=perfmon will iterate over the perf queries, printing information about the state of each query. Some of this information will be private to intel/perf, and needs to a dump routine that can be called from i965. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: make internal methods privateMark Janes2019-08-072-95/+62
| | | | | | | Now that all references from i965 have been moved to perf, we can make internal methods private again. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: make oa_sample_buffers privateMark Janes2019-08-072-119/+120
| | | | | | | All references to this data structure have been moved inside the perf subsystem. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: expose method to create queryMark Janes2019-08-072-0/+19
| | | | | | | By encapsulating this implementation within perf, we can eventually make struct gen_perf_ctx private. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move initialization of pipeline statistics metrics to gen_perfMark Janes2019-08-072-124/+219
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move get_query_data into gen_perfMark Janes2019-08-072-0/+376
| | | | | | | | | | | | | | | This refactor moves several helper functions for get_query_data as well: - accumulate_oa_reports - read_gt_frequency - get_pipeline_stats_data - get_oa_counter_data Functions which are no longer referenced in brw_performance_query.c have been removed. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move delete_query to gen_perfMark Janes2019-08-072-0/+93
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move is_query_ready to gen_perfMark Janes2019-08-072-0/+31
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: move wait_query to perfMark Janes2019-08-072-0/+167
| | | | | | | | | | | | | The following methods have duplicate implementation of read_oa_samples_until in brw_performance_query.c: - read_oa_samples_for_query - read_oa_samples_until They ar still referenced by other methods in the file and will be removed on the subsequent commit. Reviewed-by: Kenneth Graunke <[email protected]>
* intel/perf: create a vtable entry for bo_busyMark Janes2019-08-071-0/+1
| | | | | | | Iris and i965 variants of this method need to be called by perf routines. Reviewed-by: Kenneth Graunke <[email protected]>