summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* radv: add get_chip_name() callbackSamuel Pitoiset2017-09-152-0/+10
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* r600: add .gitignore for egd_tables.hDave Airlie2017-09-151-0/+1
|
* radeonsi: enable STD430 packing of UBOs by defaultTimothy Arceri2017-09-151-1/+1
| | | | | | | Before this change we were defaulting to STD140 which is slightly less efficient at packing arrays. Reviewed-by: Marek Olšák <[email protected]>
* st/mesa: set UseSTD430AsDefaultPacking const based on CAPTimothy Arceri2017-09-151-0/+3
| | | | Reviewed-by: Marek Olšák <[email protected]>
* gallium: introduce PIPE_CAP_LOAD_CONSTBUFTimothy Arceri2017-09-1517-0/+18
| | | | Reviewed-by: Marek Olšák <[email protected]>
* radeonsi: make use of LOAD for UBOsTimothy Arceri2017-09-151-10/+21
| | | | | | v2: always set can_speculate and allow_smem to true Reviewed-by: Marek Olšák <[email protected]>
* mesa/st: add LOAD support for UBOsTimothy Arceri2017-09-151-79/+119
| | | | | | | This will allow us to use STD430 packing by default if the driver supports it. Reviewed-by: Marek Olšák <[email protected]>
* mesa/st: create add_buffer_to_load_and_stores() helperTimothy Arceri2017-09-151-19/+27
| | | | | | Will be used to add LOAD support to UBOs. Reviewed-by: Marek Olšák <[email protected]>
* gallium: add CONSTBUF type to tgsi_file_typeTimothy Arceri2017-09-152-0/+2
| | | | | | | This will be use to distinguish between load types when using the TGSI_OPCODE_LOAD opcode. Reviewed-by: Marek Olšák <[email protected]>
* virgl: drop const dimensions on first block.Dave Airlie2017-09-151-0/+27
| | | | | | | | | | | | The virgl protocol version of tgsi doesn't handle this yet, transform it back to the old ways. Thanks to Nicolai Hähnle <[email protected]> for also writing nearly the same patch. Fixes: 41e342d5 tgsi/ureg: always emit constants (and their decls) as 2D Tested-by: Rob Herring <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* st/glsl->tgsi: fix u64 to bool comparisons.Dave Airlie2017-09-151-1/+15
| | | | | | | | | | | Otherwise we end up using a 32-bit comparison which didn't end well. Timothy caught this while playing around with some opt passes. Fixes: 278580729a (st/glsl_to_tgsi: add support for 64-bit integers) Reviewed-by: Timothy Arceri <[email protected]> Signed-off-by: Dave Airlie <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]>
* i965: Print size of validation and relocation lists in INTEL_DEBUG=flushKenneth Graunke2017-09-141-3/+8
| | | | | | | | It's nice to have this information. While we're at it, tweak the formatting to try and vertically align numbers in the common case. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Disentangle batch and state buffer flushing.Kenneth Graunke2017-09-145-40/+30
| | | | | | | | | | | | | | | | | | | | | | We now flush the batch when either the batchbuffer or statebuffer reaches the original intended batch size, instead of when the sum of the two reaches a certain size (which makes no sense now that they're separate buffers). With this change, we also need to update our "are we near the end?" estimate to require separate batch and state buffer space. I obtained these estimates by looking at the size of draw calls in the Unreal 4 Elemental Demo (using INTEL_DEBUG=flush and always_flush_batch=true). This will significantly impact the size of our batches. I've adjusted both down to try and be roughly similar to what we had been doing. On various benchmarks, a 20kB batch and 16kB statebuffer seemed to about right, but we may need to adjust this further. I tried a 16kB batch, but that regressed Synmark OglMultithread performance by a fair bit. 32kB for both would have significantly increased our batch sizes. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Delete BATCH_RESERVED handling.Kenneth Graunke2017-09-142-34/+3
| | | | | | | | | Now that we can grow the batchbuffer if we absolutely need the extra space, we don't need to reserve space for the final do-or-die ending commands. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Make BLORP properly avoid batch wrapping.Kenneth Graunke2017-09-141-14/+2
| | | | | | | | We need to set brw->no_batch_wrap to actually avoid flushing in the middle of our BLORP operation, and instead grow the batchbuffer. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Grow the batch/state buffers if we need space and can't flush.Kenneth Graunke2017-09-141-5/+131
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, we would just assert fail and die in this case. The only safeguard is the "estimated max prim size" checks when starting a draw (or compute dispatch or BLORP operation)...which are woefully broken. Growing is fairly straightforward: 1. Allocate a new larger BO. 2. memcpy the existing contents over to the new buffer 3. Set the new BO to the same GTT offset as the old BO. When emitting relocations, we write the presumed GTT offset of the target BO. If we changed it, we'd have to update all the existing values (by walking the relocation list and looking at offsets), which is more expensive. With the old BO freed, ideally the kernel could simply place the new BO at that offset anyway. 4. Update the validation list to contain the new BO. 5. Update the relocation list to have the GEM handle for the new BO (which we can skip if using I915_EXEC_HANDLE_LUT). v2: Update to handle malloc'd shadow buffers. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Use a separate state buffer, but avoid changing flushing behavior.Kenneth Graunke2017-09-147-82/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we emitted GPU commands and indirect state into the same buffer, using a stack/heap like system where we filled in commands from the start of the buffer, and state from the end of the buffer. We then flushed before the two met in the middle. Meeting in the middle is fatal, so you have to be certain that you reserve the correct amount of space before emitting commands or state for a draw. Currently, we will assert !no_batch_wrap and die if the estimate is ever too small. This has been mercifully obscure, but has happened on a number of occasions, and could in theory happen to any application that issues a large draw at just the wrong time. Estimating the amount of batch space required is painful - it's hard to get right, and getting it right involves a lot of code that would burn CPU time, and also be painful to maintain. Rolling back to a saved state and retrying is also painful - failing to save/restore all the required state will break things, and redoing state emission burns a lot of CPU. memcpy'ing to a new batch and continuing is painful, because commands we issue for a draw depend on earlier commands as well (such as STATE_BASE_ADDRESS, or the GPU being in a pirtacular state). The best plan is to never run out of space, which is totally doable but pretty wasteful - a pessimal draw requires a huge amount of space, and rarely occurs. Instead, we'd like to grow the batch buffer if we need more space and can't safely flush. We can't grow with a meet in the middle approach - we'd have to move the state to the end, which would mean updating every offset from dynamic state base address. Using separate batch and state buffers, where both fill starting at the beginning, makes it easy to grow either as needed. This patch separates the two concepts. We create a separate state buffer, with a second relocation list, and use that for brw_state_batch. However, this patch tries to retain the original flushing behavior - it adds the amount of batch and state space together, as if they were still co-existing in a single buffer. The hope is to flush at the same time as before. This is necessary to avoid provoking bugs caused by broken batch wrap handling (which we'll fix shortly). It also avoids suddenly increasing the size of the batch (due to state not taking up space), which could have a significant performance impact. We'll tune it later. v2: - Mark the statebuffer with EXEC_OBJECT_CAPTURE when supported (caught by Chris). Unfortunately, we lose the ability to capture state data on older kernels. - Continue to support the malloc'd shadow buffers. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Pass screen to intel_batchbuffer_reset().Kenneth Graunke2017-09-141-10/+8
| | | | | | | This will let us access screen->kernel_features in the next patch. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Prepare INTEL_DEBUG=bat decoding for a separate statebuffer.Kenneth Graunke2017-09-141-56/+54
| | | | | | | | | | | We'll need to read from both buffers when decoding state. This also drops the "failed to map" fallback - it's completely useless on LLC systems where we write directly to the mapped BO. It's not that useful on non-LLC systems either. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Split brw_emit_reloc into brw_batch_reloc and brw_state_reloc.Kenneth Graunke2017-09-145-48/+73
| | | | | | | | | | | brw_batch_reloc emits a relocation from the batchbuffer to elsewhere. brw_state_reloc emits a relocation from the statebuffer to elsewhere. For now, they do the same thing, but when we actually split the two buffers, we'll change brw_state_reloc to use the state buffer. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Refactor relocs into a brw_reloc_list structure.Kenneth Graunke2017-09-142-19/+32
| | | | | | | | | | I'm planning on splitting batch and state into separate buffers, at which point we'll need two relocation lists. In preparation for that, this patch refactors the relocation stuff into a structure we can replicate...which looks a lot like anv_reloc_list. Reviewed-by: Chris Wilson <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move brw_state_batch code to intel_batchbuffer.cKenneth Graunke2017-09-144-97/+47
| | | | | | | | | | | | The batch buffer and state buffer code is fairly tied together, and having it in one .c file will make refactoring easier. Also, drop some commentary above brw_state_batch. The "aperture checking performance hacks" are long since gone, so that paragraph makes little sense at this point. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Drop a useless ret == 0 check.Kenneth Graunke2017-09-141-22/+20
| | | | | | | | | Prior to the previous patch, we would pwrite the batchbuffer contents, and wanted to skip the execbuffer if that failed. Now that we memcpy, we don't set ret != 0 on failure anymore, so it will always be 0. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Use a WC map and memcpy for the batch instead of pwrite.Kenneth Graunke2017-09-141-10/+8
| | | | | | | | | | | | We'd like to eliminate the malloc'd shadow copy eventually, but there are still unresolved performance problems. In the meantime, let's at least get rid of pwrite. On Apollolake, improves Synmark OglBatch6 performance by: 1.53581% +/- 0.269589% (n=108). Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Use batch->bo->size in brw_emit_reloc assertion.Kenneth Graunke2017-09-141-1/+1
| | | | | | | This makes the assertion safe against batchbuffers growing. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965: Delete a batch size assertion that isn't very useful.Kenneth Graunke2017-09-141-3/+0
| | | | | | | | | | | This assertion prevents you from doing intel_batchbuffer_require_space with a size so huge it won't fit in the batchbuffer. This doesn't seem like a common mistake, and I've never seen the assert to be useful. Soon, I hope to have batches grow, at which point this won't make sense. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Wilson <[email protected]>
* i965/screen: Implement queryDmaBufFormatModifierAttirbsJason Ekstrand2017-09-141-1/+23
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* i965/screen: Report the correct number of image planesJason Ekstrand2017-09-141-1/+8
| | | | | | | | | For non-CCS images, we were reporting just one plane even though they may have multiple in the case of YUV. Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* gbm: Add a gbm_device_get_format_modifier_plane_count functionJason Ekstrand2017-09-144-0/+48
| | | | | | | | This allows the user to query the number of planes required by a given format+modifier combination without having to create a bo or surface. Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* dri/image: Add a format modifier attributes queryJason Ekstrand2017-09-141-1/+26
| | | | | Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Daniel Stone <[email protected]>
* drirc: enable glthread for more games (Civ5, CivBE, Dreamfall, Hitman, SR3)Christoph Berliner2017-09-141-0/+15
| | | | Signed-off-by: Marek Olšák <[email protected]>
* glsl: avoid accessing invalid memory after get_variable_being_redeclared()Iago Toral Quiroga2017-09-141-20/+19
| | | | | | | | | | | | | | After get_variable_being_redeclared() has been called, it is no longer safe to access the original variable pointer, since its memory might have been freed. Since callers of this function should only be accessing the variable pointer returned by the function, avoid potential bugs by re-assigning the original variable pointer to the result of the function call, making it impossible for the remaining code to access an invalid variable pointer. Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: make the redeclared variable NULL if it is deletedIago Toral Quiroga2017-09-141-3/+6
| | | | | | | | | | | | | | get_variable_being_redeclared() can delete the original variable in a specific scenario. The code sets it to NULL after this so other code in that same function doesn't try to access trashed memory after the fact, however, the copy of that variable in the caller code won't see any of this making it very easy to overlook. Make the function a bit safer by taking a pointer to the original variable so we can also make NULL the caller's pointer to the variable if this function deletes it. Reviewed-by: Nicolai Hähnle <[email protected]>
* glsl: use 'declared_var' instead of 'var' after checking redeclarationsIago Toral Quiroga2017-09-141-2/+2
| | | | | | | | | Since the original 'var' might have been deleted from this point forward. Bugzila: https://bugs.freedesktop.org/show_bug.cgi?id=102685 Fixes: 51bf007d2c27fba (glsl: Disallow unsized array of atomic_uint) Reviewed-by: Nicolai Hähnle <[email protected]>
* dri/radeon: use ARRAY_SIZE macroEric Engestrom2017-09-141-1/+3
| | | | | Signed-off-by: Eric Engestrom <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* radv: dump the list of enabled options when a hang occuredSamuel Pitoiset2017-09-143-0/+46
| | | | | | | | Useful to know which debug/perftest options were enabled when a hang report is generated. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: dump last 60 lines of dmesg when a hang occuredSamuel Pitoiset2017-09-141-0/+20
| | | | | | | Copied from dd_dump_dmesg(). Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: dump descriptors when a hang occuredSamuel Pitoiset2017-09-141-0/+182
| | | | | | | | Might be useful for checking if all descriptors are sets by the application. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: save all descriptor pointers into the trace BOSamuel Pitoiset2017-09-142-0/+35
| | | | | | | To dump them when a hang is detected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: dump annotated shaders using UMRSamuel Pitoiset2017-09-141-0/+172
| | | | | | | | | This might be very useful in order to figure out where a shader is stucked. This uses UMR to detect which instruction is executing bad things. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radeonsi: move si_get_wave_info() to AMD common codeSamuel Pitoiset2017-09-143-93/+97
| | | | | | | | This will allow us to use it from radv. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: dump some status MMIO registers when a hang occuredSamuel Pitoiset2017-09-141-0/+57
| | | | | | | | Might report some useful information to help figuring out where does the hang happened. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv/winsys: add a read_registers() callbackSamuel Pitoiset2017-09-142-0/+14
| | | | | | | To dump some status MMIO registers when a hang is detected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: dump shader stats when a hang occuredSamuel Pitoiset2017-09-141-3/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_shader_dump_stats() helperSamuel Pitoiset2017-09-143-63/+77
| | | | | | | To dump the shader stats when a hang is detected. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: dump the active shaders when a hang occuredSamuel Pitoiset2017-09-141-5/+89
| | | | | | | Only the disassembly is currently dumped. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add debug flags for syncing shaders after every draw callSamuel Pitoiset2017-09-143-0/+17
| | | | | | | To improve GPU hangs detection when shaders are stucked. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add radv_cmd_buffer_after_draw() helper functionSamuel Pitoiset2017-09-141-6/+12
| | | | | | | To share common code after every draw/compute calls. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: save the bound pipeline pointers into the trace BOSamuel Pitoiset2017-09-142-7/+54
| | | | | | | | | | | | | When a GPU hang is detected in radv_gpu_hang_occured() we know which command buffer is faulty but the bound pipelines might have been updated during the execution. The pointers to the radv_pipeline objects are emitted just after the second trace ID, that way it would be easy to dump the active shaders at the moment of the hang. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>
* radv: add a comment that describes the trace BO layoutSamuel Pitoiset2017-09-141-0/+6
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Bas Nieuwenhuizen <[email protected]>