aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/gen7_urb.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Delete unused variable.Kenneth Graunke2016-11-191-2/+0
| | | | | | I forgot to delete this in 9ef2b9277d3bead6dbfa47e95794ca61e8be4e84. Signed-off-by: Kenneth Graunke <[email protected]>
* intel: Share URB configuration code between GL and Vulkan.Kenneth Graunke2016-11-191-138/+4
| | | | | | | | | This code is far too complicated to cut and paste. v2: Update the newly added genX_gpu_memcpy.c; const a few things. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Use arrays in Gen7+ URB code.Kenneth Graunke2016-11-191-202/+134
| | | | | | | | | | | So much of this code was cut and pasted per stage. We can accomplish much of it by looping over shader stages. Improves performance of OglBatch7 (version 6) by 1.50783% +/- 0.287049% (n = 71) at 1024x768 on Cherryview. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Drop brw->urb.{nr_*_entries,*_start} assignments from gen7_urb.c.Kenneth Graunke2016-11-191-17/+8
| | | | | | | | The context fields are for Gen4-5; setting them has always been useless. There's no point in spending the cost in the hottest path in the driver. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Switch to roundf in HS/DS URB code.Kenneth Graunke2016-11-191-2/+2
| | | | | | | | | | | Matt intentionally switched the VS calculation to be float-based in commit c1da15709a0c0c2775bd9e534f67c60f7dc95ce8. Tessellation support was written before this and rebased forward, and missed the change. Now it's consistent. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make URB code use prog_data for GS/tessellation enable checks.Kenneth Graunke2016-11-191-6/+4
| | | | | | | | If geometry/tessellation shaders are disabled, prog_data will be NULL (see brw_state_upload.c). This consolidates dirty bits a little. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel: Convert devinfo->urb.min_*_entries into an array.Kenneth Graunke2016-11-191-4/+5
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* intel: Convert devinfo->urb.max_*_entries into an array.Kenneth Graunke2016-11-191-10/+16
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Eliminate brw->gs.prog_data pointer.Kenneth Graunke2016-10-051-1/+3
| | | | | | | | | | | | Just say no to: - brw->gs.base.prog_data = &brw->gs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_gs_prog_data as needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Eliminate brw->tes.prog_data pointer.Kenneth Graunke2016-10-051-1/+3
| | | | | | | | | | | | Just say no to: - brw->tes.base.prog_data = &brw->tes.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_tes_prog_data as needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Eliminate brw->tcs.prog_data pointer.Kenneth Graunke2016-10-051-1/+3
| | | | | | | | | | | | Just say no to: - brw->tcs.base.prog_data = &brw->tcs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_tcs_prog_data as needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: Eliminate brw->vs.prog_data pointer.Kenneth Graunke2016-10-051-1/+3
| | | | | | | | | | | | Just say no to: - brw->vs.base.prog_data = &brw->vs.prog_data->base.base; We'll just use the brw_stage_prog_data pointer in brw_stage_state and downcast it to brw_vs_prog_data as needed. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Timothy Arceri <[email protected]>
* i965: rename max_ds_* variable to max_tes_*Timothy Arceri2016-10-031-2/+2
| | | | | | Using consistent naming allows us to create macros more easily. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: rename max_hs_* variables to max_tcs_*Timothy Arceri2016-10-031-2/+2
| | | | | | Using consistent naming allows us to create macros more easily. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: get rid of duplicated values from gen_device_infoLionel Landwerlin2016-09-231-8/+8
| | | | | | | | Now that we have gen_device_info mutable, we can update its values and drop all copies we had in brw_context. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* intel/i965: make gen_device_info mutableLionel Landwerlin2016-09-231-1/+1
| | | | | | | | | | | | Make gen_device_info a mutable structure so we can update the fields that can be refined by querying the kernel (like subslices and EU numbers). This patch does not make any functional change, it just makes gen_get_device_info() fill a structure rather than returning a const pointer. Signed-off-by: Lionel Landwerlin <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename intelScreen to screen.Kenneth Graunke2016-09-201-1/+1
| | | | | | | | "intelScreen" is wordy and also doesn't fit our style guidelines. "screen" is shorter, which is nice, because we use it fairly often. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Anuj Phogat <[email protected]>
* intel: s/brw_device_info/gen_device_info/Jason Ekstrand2016-09-031-1/+1
| | | | | | | | | | | | | Generated by: sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.c sed -i -e 's/brw_device_info/gen_device_info/g' src/intel/**/*.h sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.c sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.cpp sed -i -e 's/brw_device_info/gen_device_info/g' **/i965/*.h Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/urb: Allow blorp to record current settingsTopi Pohjolainen2016-07-041-40/+50
| | | | | | | | | | | | | This makes it possible to skip urb re-configuration if the subsequent renders agree with the settings. Also allows blorp to allocate the maximun amount of vs entries available. Core upload logic already knows how to calculate this. Helps one synthetic benchmark. Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/blorp/gen7+: Do not trigger push constant space reconfigTopi Pohjolainen2016-07-041-2/+1
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Avoid division by zero.Ardinartsev Nikita2016-06-231-11/+15
| | | | | | | | | Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0 Cc: "12.0 11.2" <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
* i965: Fix issues with number of VS URB entries on Cherryview/Broxton.Kenneth Graunke2016-06-131-0/+2
| | | | | | | | | | | | | | | | Cherryview/Broxton annoyingly have a minimum number of VS URB entries of 34, which is not a multiple of 8. When the VS size is less than 9, the number of VS entries has to be a multiple of 8. Notably, BLORP programmed the minimum number of VS URB entries (34), with a size of 1 (less than 9), which is invalid. It seemed like this could be a problem in the regular URB code as well, so I went ahead and updated that to be safe. Cc: "12.0" <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* Revert "i965/urb: fixes division by zero"Matt Turner2016-05-181-5/+19
| | | | This reverts commit 2a8aa1e3deb99a1ae16d942318da648c1327ece5.
* i965/urb: fixes division by zeroArdinartsev Nikita2016-05-181-19/+5
| | | | | | | Fixes regression introduced by af5ca43f2676bff7499f93277f908b681cb821d0 Reviewed-by: Matt Turner <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95419
* i965/blorp: Use BRW_NEW_BLORP instead of trashing all state bitsTopi Pohjolainen2016-04-231-2/+1
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make all atoms to track BRW_NEW_BLORP by defaultKenneth Graunke2016-04-231-2/+4
| | | | Reviewed-by: Topi Pohjolainen <[email protected]
* i965: Remove unnecessary brw->tess_ctrl_program assertions.Kenneth Graunke2015-12-221-1/+0
| | | | | | | | | | | | | This is trying to enforce the fact that the hardware requires HS, TE, and DS to be enabled or disabled together. But it's kind of an ad-hoc attempt, and not too useful. More importantly, we aren't going to have a gl_shader_program for the TCS which is automatically generated when none is present. (We'll just handle it in the driver backend.) So, these will trip for no reason. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Consolidate BRW_NEW_TESS_{CTRL,EVAL}_PROGRAM flags.Kenneth Graunke2015-12-221-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | For several reasons, I don't think it's particularly useful to have separate flags: 1. Most of the time, tessellation shaders are paired, so both will be replaced at the same time. 2. The data layout is tightly coupled. Both need to agree on the number of per-patch slots in the VUE map. Even adding extra TCS outputs that aren't read by the TES will trigger the need for recompiles. 3. The TCS is optional from an API perspective, but required by the hardware whenever tessellation is enabled. So, atoms that deal with the TCS must check brw->tess_eval_program (BRW_NEW_TESS_EVAL_PROGRAM?) rather than brw->tess_ctrl_program to tell whether tessellation is enabled. So, not only is it unlikely to be useful, it's a bit confusing to get right. Simply using one flag for both simplifies this. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Allocate URB space for HS and DS stages when required.Chris Forbes2015-12-151-34/+143
| | | | | | | | | | | | | | | | | | | | v2: (by Ken, incorporating feedback from Matt Turner): - Rewrite the push constant allocation code to be clearer. - Only apply the minimum VS entries workaround on Gen 8. v3: (by Ken) - Fix a bug in v2 where we failed to allocate the full push constant space when the number of enabled stages didn't divide the available push constant space evenly. (Any left over space is now allocated to the PS, as it was in v1.) - Fix an off-by-one error in v2's number of enabled stages calculation. - Use DIV_ROUND_UP for nicer formatting. - Line wrapping fixes. Signed-off-by: Chris Forbes <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Use DIV_ROUND_UP() in gen7_urb.c code.Kenneth Graunke2015-12-141-9/+8
| | | | | | | This is a newer convention, which we prefer over ALIGN(x, n) / n. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Define state flag to signal that the URB size has been altered.Francisco Jerez2015-12-091-0/+3
| | | | | | | | This will make sure that we recalculate the URB layout anytime the URB size is modified by the L3 partitioning code. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Fix PIPE_CONTOL typo.Kenneth Graunke2015-11-171-1/+1
| | | | PIPE_CONTOL!!!
* i965: Use float calculations when double is unnecessary.Matt Turner2015-07-291-1/+1
| | | | | | | | | | | | | | | | | Literals without an f/F suffix are of type double, and implicit conversion rules specify that the float in (float op double) be converted to a double before the operation is performed. I believe float execution was intended (in nearly all cases) or is sufficient (in the case of gen7_urb.c). Removes a lot of float <-> double conversion instructions and replaces many double instructions with float instructions which are cheaper. text data bss dec hex filename 4928659 195160 26192 5150011 4e953b i965_dri.so before 4928315 195152 26192 5149659 4e93db i965_dri.so after Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/state: Don't use brw->state.dirty.brwJordan Justen2015-03-311-2/+2
| | | | | | | | | | | | | | | Now, we only use ctx->NewDriverState. I used this bash & sed command in the i965 directory: for file in *.[ch] *.[ch]pp; do sed -i -e 's/state\.dirty\.brw/ctx.NewDriverState/g' $file done Followed by manual changes to brw_state_upload.c. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Delete brw_state_flags::cache and related code.Kenneth Graunke2014-12-021-1/+0
| | | | | | | | | It's been merged into brw_state_flags::brw for simplicity and efficiency. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move BRW_NEW_*_PROG_DATA flags to .brw (not .cache).Kenneth Graunke2014-12-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | I put the BRW_NEW_*_PROG_DATA flags at the beginning so that brw_state_cache.c can still continue using 1 << brw_cache_id. I also added a comment explaining the difference between BRW_NEW_*_PROG_DATA and BRW_NEW_*_PROGRAM, as it took me a long time to remember it. Non-mechanical changes: - brw_state_cache.c and brw_ff_gs.c now signal .brw, not .cache. - brw_state_upload.c - INTEL_DEBUG=state changes. - brw_context.h - bit definition merging. v2: Correct the explanation of BRW_NEW_*_PROG_DATA to mention state-based recompiles, and nix the "proper subset" claim, as it's false. (Caught by Kristian Høgsberg). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Rename CACHE_NEW_*_PROG to BRW_NEW_*_PROG_DATA.Kenneth Graunke2014-12-021-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we've moved a bunch of CACHE_NEW_* bits to BRW_NEW_*, the only ones that are left are legitimately related to the program cache. Yet, it seems a bit wasteful to have an entire bitfield for only 7 bits. State upload is one of the hottest paths in the driver. For each atom in the list, we call check_state() to see if it needs to be emitted. Currently, this involves comparing three separate bitfields (mesa, brw, and cache). Consolidating the brw and cache bitfields would save a small amount of CPU overhead per atom. Broadwell, for example, has 57 state atoms, so this small savings can add up. CACHE_NEW_*_PROG covers the brw_*_prog_data structures, as well as the offset into the program cache BO (prog_offset). Since most uses refer to brw_*_prog_data, I decided to use BRW_NEW_*_PROG_DATA as the name. Removing "cache" completely is a bit painful, so I decided to do it in several patches for easier review, and to separate mechanical changes from manual ones. This one simply renames things, and was made via: $ for file in *.[ch]; do sed -i -e 's/CACHE_NEW_\([A-Z_\*]*\)_PROG/BRW_NEW_\1_PROG_DATA/g' \ -e 's/BRW_NEW_WM_PROG_DATA/BRW_NEW_FS_PROG_DATA/g' $file done Note that BRW_NEW_*_PROG_DATA is still in .cache, not .brw! The next patch will remedy this flaw. It will also fix the alphabetization issues. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Acked-by: Matt Turner <[email protected]>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-2/+4
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Skip recalculating URB allocations if the entry size didn't change.Eric Anholt2014-10-241-0/+13
| | | | | | | | | We only get here if the VS/GS compiled programs change, but we can even skip it if the VS/GS size didn't change. Affects cairo runtime on glamor by -1.26471% +/- 0.674335% (n=234) Reviewed-by: Kenneth Graunke <[email protected]>
* Revert 5 i965 patches: 8e27a4d2, 373143ed, c5bdf9be, 6f56e142, 88e3d404Jordan Justen2014-09-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | Reverts * "i965: Modify state upload to allow 2 different sets of state atoms." 8e27a4d2b3e4e74e9a77446bce49607433d86be3 * "i965: Modify dirty bit handling to support 2 pipelines." 373143ed9187c4d4ce1e3c486b5dd0880d18ec8b * "i965: Create a macro for checking a dirty bit." c5bdf9be1eca190417998d548fd140c1eca37a54 Conflicts: src/mesa/drivers/dri/i965/brw_context.h * "i965: Create a macro for setting all dirty bits." 6f56e1424d923fd80c84090fbf4506c9eaaffea1 Conflicts: src/mesa/drivers/dri/i965/brw_blorp.cpp src/mesa/drivers/dri/i965/brw_state_cache.c src/mesa/drivers/dri/i965/brw_state_upload.c * "i965: Create a macro for setting a dirty bit." 88e3d404dad009d8cff5124cf8acee7daeaceb64 Signed-off-by: Jordan Justen <[email protected]>
* i965: Create a macro for setting a dirty bit.Paul Berry2014-09-011-1/+1
| | | | | | | This will make it easier to extend dirty bit handling to support compute shaders. Reviewed-by: Jordan Justen <[email protected]>
* i965: Remove unneeded VS workaround stalls on Baytrail.Greg Hunt2014-06-261-3/+3
| | | | | | | | | | | | | According to the workarounds list, these stalls aren't needed on production Baytrail systems. Piglit confirms that as well. These cause a small slowdown when we are sending a large number of small batches to the GPU. Removing these improves performance by up to 5% on some CPU bound SynMark tests (Batch[4-7], DrvState1, HdrBloom, Multithread, ShMapPcf). Signed-off-by: Gregory Hunt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Only emit VS state pipe control workaround on IVB and BYT.Kenneth Graunke2014-02-271-1/+2
| | | | | | | | | | | | | | According to the BSpec's 3D workarounds page, this is unnecessary on shipping Haswell hardware, and was never necessary on Broadwell. It unfortunately doesn't say anything about Baytrail. The workaround database confirms those results for Ivybridge, Haswell, and Broadwell. Baytrail is less clear - one page says it's necessary, while the other says it isn't. For now, be conservative and leave it enabled. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Double the push constant space multipliers on Broadwell too.Kenneth Graunke2014-01-181-2/+4
| | | | | | | Broadwell has 2Kb push constant size increments like Haswell GT3. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Disable workaround flush for push constants on Broadwell.Kenneth Graunke2014-01-141-1/+1
| | | | | | | | If it wasn't necessary for Haswell, it's likely not to be necessary for Broadwell either. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Fix 3DSTATE_PUSH_CONSTANT_ALLOC_PS packet creation.Kenneth Graunke2013-12-201-1/+1
| | | | | | | | | When adding geometry shader support, we accidentally reversed the size and offset parameters. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]> Cc: "10.0" <[email protected]>
* i965/gen7: Emit workaround flush when changing GS enable state.Paul Berry2013-11-181-22/+2
| | | | | | | | | | v2: Don't go to extra work to avoid extraneous flushes. (Previous experiments in the kernel have suggested that flushing the pipeline when it is already empty is extremely cheap). Cc: "10.0" <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/gen7.5: Fix lower bound on number of VS URB entries.Paul Berry2013-09-051-3/+4
| | | | | | | | | | | | Haswell GT2 and GT3 require the number of vertex shader URB entries to be at least 64, not 32. At the moment, we always meet this requirement automatically, because in the absence of a geometry shader, we assign all available URB space to the vertex shader. But when we turn on support for geometry shaders, this lower limit will become important. Reviewed-by: Chad Versace <[email protected]>
* i965: Make sure constants re-sent after constant buffer reallocation.Paul Berry2013-08-311-0/+13
| | | | | | | | | | | | | | | | | | | | The hardware requires that after constant buffers for a stage are allocated using a 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} command, and prior to execution of a 3DPRIMITIVE, the corresponding stage's constant buffers must be reprogrammed using a 3DSTATE_CONSTANT_{VS,HS,DS,GS,PS} command. Previously we didn't need to worry about this, because we only programmed 3DSTATE_PUSH_CONSTANT_ALLOC_{VS,HS,DS,GS,PS} once on startup (or, previous to that, whenever BRW_NEW_CONTEXT was flagged). But now that we reallocate the constant buffers whenever geometry shaders are switched on and off, we need to make sure the constant buffers are reprogrammed. We do this by adding a new bit, BRW_NEW_PUSH_CONSTANT_ALLOCATION, to brw->state.dirty.brw. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: Allocate push constant space for use by GS.Paul Berry2013-08-311-14/+57
| | | | | | | | | | | | | | | | | | Previously, we would always use the same push constant allocation regardless of what shader programs were being run: the available push constant space was split into 2 equal size partitions, one for the vertex shader, and one for the fragment shader. Now that we are adding geometry shader support, we need to do something smarter. This patch adjusts things so that when a geometry shader is in use, we split the available push constant space into 3 nearly-equal size partitions instead of 2. Since the push constant allocation is now affected by GL state, it can no longer be set up by brw_upload_initial_gpu_state(); instead it must be set up by a state atom. Reviewed-by: Kenneth Graunke <[email protected]>