aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_wm.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Support GL_CLAMP natively on Broadwell.Kenneth Graunke2014-06-051-1/+2
| | | | | | | | | | | | The new hardware actually supports this OpenGL 1.x feature natively, so we can finally drop our shader workarounds. Not many applications use GL_CLAMP, and most use it unintentionally, but it's trivial to do right, so we should. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Finally kill struct brw_wm_compile (better known as 'c').Kenneth Graunke2014-05-181-11/+11
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Stop copying the program key.Kenneth Graunke2014-05-181-6/+4
| | | | | | | | | We already have a perfectly good copy of the program key, and nobody is going to modify it. The only reason we copied it was because the brw_wm_compile structure embedded the key rather than pointing to it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Rip struct brw_wm_compile out of the visitors and generators.Kenneth Graunke2014-05-181-1/+2
| | | | | | | | | Instead, just pass the key and prog_data as separate parameters. This moves it up a level - one step further toward getting rid of it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Plumb a mem_ctx all the way through the FS compile.Kenneth Graunke2014-05-181-4/+5
| | | | | | | | 'c' is going away, but we still need a memory context that lives for the duration of the compile. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Actually free program data on the error path.Kenneth Graunke2014-05-181-1/+3
| | | | | | | | | We throw away the data generated during compilation on the success path, so we really ought to on the failure path as well. The caller has no access to it anyway, so it's purely leaked. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Move total_scratch calculation into fs_visitor::run().Kenneth Graunke2014-05-181-4/+1
| | | | | | | | | | | With this one use gone, c->last_scratch is now only used inside fs_visitor. The rest of the driver uses prog_data->total_scratch. We already compute similar prog_data fields in fs_visitor, so this seems reasonable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Move perf_debug about register spilling to a more obvious spot.Kenneth Graunke2014-05-181-4/+0
| | | | | | | | | | | | | The if (!allocated_without_spills) block is an obvious spot for this performance warning message. In the Vec4 backend, scratch is also used for indirect access of temporary arrays. The FS backend doesn't implement that yet, but if it did, this message would be inaccurate, since scratch access wouldn't necessarily mean spilling. Moving it preemptively fixes that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* mesa: Replace use of _ReallyEnabled as a boolean with use of _Current.Eric Anholt2014-04-301-1/+1
| | | | | | | | | | | | | I'm probably not the only person that has tried to kill _ReallyEnabled. This does the mechanical part of the work, and cleans _ReallyEnabled from i965. I think that using _Current makes texture management clearer: You can't have multiple targets in use in the same texture image unit at the same time, because there's just that one pointer. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove unused sampler key fieldsTopi Pohjolainen2014-04-081-10/+0
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/sso: rename Shader to the pointer _ShaderGregory Hainaut2014-03-251-1/+1
| | | | | | | | | | | | | | | | Basically a sed but shaderapi.c and get.c. get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior shaderapi.c => the old api stil update the Shader object directly V2: formatting improvement V3 (idr): * Rebase fixes after a block of code was moved from ir_to_mesa.cpp to shaderapi.c. * Trivial reformatting. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Use a separate variable to keep track of the last uniform index seen.Francisco Jerez2014-02-191-0/+1
| | | | | | | Like the VEC4 back-end does. It will make dynamic allocation of the param_size array easier in a future commit. Reviewed-by: Paul Berry <[email protected]>
* i965: Move up duplicated fields from stage-specific prog_data to ↵Francisco Jerez2014-02-191-17/+9
| | | | | | | | | | | | | brw_stage_prog_data. There doesn't seem to be any reason for nr_params, nr_pull_params, param, and pull_param to be duplicated in the stage-specific subclasses of brw_stage_prog_data. Moving their definition to the common base class will allow some code sharing in a future commit, the removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and the simplification of the stage-specific brw_*_prog_data_compare. Reviewed-by: Paul Berry <[email protected]>
* i965: Add Gen6 gather wa to sampler keyChris Forbes2014-02-081-0/+21
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Ignore 'centroid' interpolation qualifier in case of persample shadingAnuj Phogat2014-01-281-1/+2
| | | | | | | | | | | | I missed this change in commit f5cfb4a. It fixes the incorrect rendering caused in Dolphin Emulator. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73915 Cc: [email protected] Signed-off-by: Anuj Phogat <[email protected]> Tested-by: Markus Wick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Switch from BRW_MAX_TEX_UNIT to the actual limit.Kenneth Graunke2014-01-221-1/+2
| | | | | | | | | | BRW_MAX_TEX_UNIT is about to grow, but only Gen7+ will be able to support the new larger value. On older platforms, we don't want to allocate the extra space - it would just be a waste. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use sample barycentric coordinates with per sample shadingAnuj Phogat2014-01-211-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation of arb_sample_shading doesn't set 'Barycentric Interpolation Mode' correctly. We use pixel barycentric coordinates for per sample shading. Instead we should select perspective sample or non-perspective sample barycentric coordinates. It also enables using sample barycentric coordinates in case of a fragment shader variable declared with 'sample' qualifier. e.g. sample in vec4 pos; A piglit test to verify the implementation has been posted on piglit mailing list for review. V2: Do not interpolate all the 'in' variables at sample position if fragment shader uses 'sample' qualifier with one of them. For example we have a fragment shader: #version 330 #extension ARB_gpu_shader5: require sample in vec4 a; in vec4 b; main() { ... } Only 'a' should be sampled at sample location, not 'b'. Cc: [email protected] Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add an option to ignore sample qualifierAnuj Phogat2014-01-211-1/+1
| | | | | | | | | | This will be useful in my next patch which depends on a functionality of _mesa_get_min_invocations_per_fragment() to ignore the sample qualifier (prog->IsSample) based on a flag passed to it. Cc: [email protected] Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
* i965: Don't flag gather quirks for Gen8+Chris Forbes2013-12-071-1/+1
| | | | | | | | | | | My understanding is that Broadwell retains the same SCS mechanism that Haswell has, so even if the underlying issue with this format is not fixed, the w/a will be applied in SCS rather than needing shader code. Signed-off-by: Chris Forbes <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/Gen7: Include bitfield in the sampler key for CMS layoutChris Forbes2013-12-071-0/+13
| | | | | | | | | | | | | | | | | | | | We need to emit extra shader code in this case to sample the MCS surface first; we can't just blindly do this all the time since IVB will sometimes try to access the MCS surface even if disabled. V3: Use actual MSAA layout from the texture's mt, rather then computing what would have been used based on the format. This is simpler and less fragile - there's at least one case where we might want to have a texture's MSAA layout change based on what the app does (CMS SINT falling back to UMS if the app ever attempts to render to it with a channel disabled.) This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain an implementation detail of the miptree code. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Drop trailing whitespace from the rest of the driver.Kenneth Graunke2013-12-051-6/+6
| | | | | | | Performed via: $ for file in *; do sed -i 's/ *//g'; done Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix texture swizzling on Broadwell.Kenneth Graunke2013-12-021-1/+1
| | | | | | | | Like Haswell, we do this in SURFACE_STATE rather than shader workarounds. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Gen4-5: Include alpha func/ref in program keyChris Forbes2013-11-061-0/+16
| | | | | | | V2: Better explanation of the rationale for doing this. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add FS backend for builtin gl_SampleIDAnuj Phogat2013-11-011-0/+6
| | | | | | | | | | | | | V2: - Update comments - Add compute_sample_id variables in brw_wm_prog_key - Add a special backend instruction to compute sample_id. V3: - Make changes to support simd16 mode. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Add FS backend for builtin gl_SamplePositionAnuj Phogat2013-11-011-0/+6
| | | | | | | | | | | | | V2: - Update comments. - Add compute_pos_offset variable in brw_wm_prog_key. - Add variable uses_pos_offset in brw_wm_prog_data. V3: - Make changes to support simd16 mode. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.Eric Anholt2013-10-151-1/+2
| | | | | | | | | | | It would be nice to be able to pack our binding table so that programs that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1 binding table entries. To do that, we need the compiled program to have information on where its surfaces go. v2: Rename size to size_bytes to be more explicit. Reviewed-by: Paul Berry <[email protected]>
* i965: Remove dead arguments from prog_data_compare.Eric Anholt2013-10-151-2/+1
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/ivb: Flag RG32F quirk for texture gather regardless of swizzlesChris Forbes2013-10-061-1/+1
| | | | | | | | | | | | As of ARB_gpu_shader5, textureGather doesn't always read the post-swizzle RED channel -- so we can't just look at the red swizzle state. Theoretically we could only flag the quirk if *some* green swizzle is in use, but that's probably more trouble than it's worth. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/hsw: Apply gather4 RG32F w/a using SCS instead of shader.Chris Forbes2013-10-031-2/+3
| | | | | | | | The new surface channel select bits allow us to avoid having to recompile the shader for this workaround. Signed-off-by: Chris Forbes <[email protected]> Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
* i965: w/a for gather4 green RG32FChris Forbes2013-10-031-0/+9
| | | | | | | | | V4: Only flag quirks if there are any uses of gather in the shader, to avoid spurious recompiles just because someone happened to use RG32F. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: compute DDX in a subspan based only on top rowChia-I Wu2013-10-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Consider only the top-left and top-right pixels to approximate DDX in a 2x2 subspan, unless the application requests a more accurate approximation via GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the new driconf option disable_derivative_optimization. This results in a less accurate approximation. However, it improves the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0% confidence) on Haswell. No noticeable image quality difference observed. The improvement comes from faster sample_d. It seems, on Haswell, some optimizations are introduced to allow faster sample_d when all pixels in a subspan have the same derivative. I considered SAMPLE_STATE too, which allows one to control the quality of sample_d on Haswell. But it gave much worse image quality without giving better performance comparing to this change. No piglit quick.tests regression on Haswell (tested with v1). v2: better guess for precompile program key Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: When >64 input components, order them to match prev pipeline stage.Paul Berry2013-09-161-1/+2
| | | | | | | | | | | | | Since the SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots, we can't arrange the fragment shader inputs in an arbitrary order if there are more than 16 input varying slots in use. We need to make sure that slots 16-31 match the corresponding outputs of the previous pipeline stage. The easiest way to accomplish this is to just make all varying slots match up with the previous pipeline stage. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use brw_stage_state for WM data as well.Kenneth Graunke2013-09-131-4/+4
| | | | | | | | This gets the VS, GS, and PS all using the same data structure. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Shorten sampler loops in key setup.Kenneth Graunke2013-08-191-2/+4
| | | | | | | | | Now that we have the number of samplers available, we don't need to iterate over all 16. This should be particularly helpful for vertex shaders. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Delete intel_context entirely.Kenneth Graunke2013-07-091-3/+2
| | | | | | | | | | This makes brw_context inherit directly from gl_context; that was the only thing left in intel_context. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::gen and gt fields to brw_context.Kenneth Graunke2013-07-091-4/+3
| | | | | | | | | | Most functions no longer use intel_context, so this patch additionally removes the local "intel" variables to avoid compiler warnings. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::is_<platform> flags to brw_context.Kenneth Graunke2013-07-091-2/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::perf_debug to brw_context.Kenneth Graunke2013-07-091-3/+0
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::stats_wm to brw_context.Kenneth Graunke2013-07-091-1/+1
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Move intel_context::reduced_primitive to brw_context.Kenneth Graunke2013-07-091-2/+2
| | | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: Pass brw_context to functions rather than intel_context.Kenneth Graunke2013-07-091-20/+21
| | | | | | | | | | | | | | This makes brw_context available in every function that used intel_context. This makes it possible to start migrating fields from intel_context to brw_context. Surprisingly, this actually removes some code, as functions that use OUT_BATCH don't need to declare "intel"; they just use "brw." Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Chris Forbes <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Anuj Phogat <[email protected]>
* i965: fix alpha test for MRTChris Forbes2013-07-061-4/+6
| | | | | | | | | | | | | | | | | Include src0 alpha in the RT write message when using MRT, so it is used for the alpha test instead of the normal per-RT alpha value. Fixes broken rendering in Dota2 under Wine [FDO #62647]. No Piglit regressions on Ivybridge. V2: reuse (and simplify) existing sample_alpha_to_coverage flag in the FS key, rather than adding another redundant one. Signed-off-by: Chris Forbes <[email protected]> Reviewd-by: Paul Berry <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62647 NOTE: This is a candidate for the stable branches.
* intel: Stop doing special _NEW_STENCIL state flagging on drawbuffers.Eric Anholt2013-06-251-1/+1
| | | | | | | | 2/3 packets depending on Stencil._Enabled already checked for _NEW_BUFFERS, so just add _NEW_BUFFERS to the remaining one. Reviewed-by: Anuj Phogat <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa: don't flag _NEW_DEPTH in Begin/EndQuery if driver implements the functionsMarek Olšák2013-04-241-1/+2
| | | | | | | | | | | We don't want to set the flag for Gallium. I think only swrast needs the flag to be set for occlusion queries. v2: fix stats_wm updates in i965 Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove BRW_NEW_WM_INPUT_DIMENSIONS dirty bit.Kenneth Graunke2013-04-041-1/+0
| | | | | | | | This was only produced by the brw_wm_input_dimensions atom, which was removed in the previous commit. So there's no need for the dirty bit. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove now dead brw_wm_prog_key::proj_attrib_mask field.Kenneth Graunke2013-04-041-15/+0
| | | | | | | | | The previous commit removed the last user of this field, so there's no longer any point in setting it. Removing this should eliminate state-dependent recompiles, and make the precompile more reliable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Rename vp_outputs_written to input_slots_valid.Paul Berry2013-03-241-3/+3
| | | | | | | | | | | | | | With the introduction of geometry shaders, fragment inputs will no longer come exclusively from the vertex shader; sometimes they come from the geometry shader. So the name "vp_outputs_written" will become a misnomer. This patch renames vp_outputs_written to input_slots_valid, to reflect the true meaning of the bitfield from the fragment shader's point of view: it indicates which of the possible input slots contain valid data that was written by the previous shader stage. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use brw.vue_map_geom_out instead of VS output VUE map where appropriate.Paul Berry2013-03-241-4/+4
| | | | | | | | | | | | | This patch modifies post-GS pipeline stages (transform feedback, clip, sf, fs) to refer to the VUE map through brw->vue_map_geom_out rather than brw->vs.prog_data->vue_map. This ensures that when geometry shader support is added, these pipeline stages will consult the geometry shader output VUE map when appropriate, rather than the vertex shader output VUE map. v2: Fixed some stale "CACHE_NEW_VS_PROG" comments. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move brw_vs_prog_data::outputs_written into VUE map.Paul Berry2013-03-241-1/+1
| | | | | | | | | | | | | | Future patches will allow for there to be separate VUE maps when both a geometry shader and a vertex shader are in use. When this happens, we will want to have correspondingly separate outputs_written bitfields. Moving outputs_written into the VUE map will make this easy. For consistency with the terminology used in the VUE map, the bitfield is renamed to "slots_valid" in the process. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>