aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_wm.c
Commit message (Collapse)AuthorAgeFilesLines
* i965: Use _mesa_geometric_ functions appropriatelyKevin Rogovin2015-06-171-3/+4
| | | | | | | | | | | | | | Change references to gl_framebuffer::Width, Height, MaxNumLayers and Visual::samples to use the _mesa_geometry_ convenience functions for those places where the geometry of the gl_framebuffer is needed (in contrast to the geometry of the intersection of the attachments of the gl_framebuffer). This patch is to pave the way to enable GL_ARB_framebuffer_no_attachments on Gen7 and higher in i965. Reviewed-by: Ian Romanick <[email protected]> Signed-off-by: Kevin Rogovin <[email protected]>
* i965: Fill out the rest of brw_debug_recompile_sampler_key().Kenneth Graunke2015-04-251-0/+8
| | | | | | | | | | This makes INTEL_DEBUG=perf report shader recompiles due to CMS vs. UMS/IMS differences and Sandybridge textureGather workarounds. Previously, we just flagged them as "Something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Rename brw_compile to brw_codegenJason Ekstrand2015-04-221-2/+2
| | | | | | | | | | | | This name better matches what it's actually used for. The patch was generated with the following command: for file in *; do sed -i -e s/brw_compile/brw_codegen/g $file done Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Rename do_<stage>_prog to brw_compile_<stage>_prog (and export)Carl Worth2015-04-021-6/+7
| | | | | | | | | | | | This is in preparation for these functions to be called from other files. This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split out per-stage dirty-bit checking into separate functionsCarl Worth2015-04-021-16/+22
| | | | | | | | | | | | The dirty-bit checking from each brw_upload_<stage>_prog function is split out into its a new brw_<stage>_state_dirty function. This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Perform program state upload outside of atom handlingCarl Worth2015-02-231-25/+19
| | | | | | | | | | | | | | | | | | | | | | | | Across the board of the various generations, the intial few atoms in all of the atom lists are basically the same, (performing uploads for the various programs). The only difference is that prior to gen6 there's an ff_gs upload in place of the later gs upload. In this commit, instead of using the atom lists for this program state upload, we add a new function brw_upload_programs that calls into the per-stage upload functions which in turn check dirty bits and return immediately if nothing needs to be done. This commit is intended to have no functional change. The motivation is that future code, (such as the shader cache), wants to have a single function within which to perform various operations before and after program upload, (with some local variables holding state across the upload). It may be worth looking at whether some of the other functionality currently handled via atoms might also be more cleanly handled in a similar fashion. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Store floating point mode choice in brw_stage_prog_data.Kenneth Graunke2014-12-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | We use IEEE mode for GLSL programs, but need to use ALT mode for ARB programs so that 0^0 == 1. The choice is based entirely on the shader source language. Previously, our code to determine which mode we wanted was duplicated in 8 different places (VS and FS for Gen4-5, Gen6, Gen7, and Gen8). The ctx->_Shader->CurrentProgram[stage] == NULL check was confusing as well - we use CurrentProgram (non-derived state), but _Shader (derived state). It also relies on knowing that ARB programs don't use gl_shader_program structures today. The compiler already makes this assumption in a few places, but I'd rather keep that assumption out of the state upload code. With this patch, we select the mode at compile time, and store that choice in prog_data. The state upload code simply uses that decision. This eliminates a BRW_NEW_*_PROGRAM dependency in the state upload code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Move PSCDEPTH calculations from draw time to compile time.Kenneth Graunke2014-12-041-0/+21
| | | | | | | | | | | | | | | The "Pixel Shader Computed Depth Mode" value is entirely based on the shader program, so we can easily do it at compile time. This avoids the if+switch on every 3DSTATE_WM (Gen7)/3DSTATE_PS_EXTRA (Gen8+) upload, and shares a bit more code. This also simplifies the PMA stall code, making it match the formula more closely, and drops a BRW_NEW_FRAGMENT_PROGRAM dependency. (Note that the previous comment was wrong - the code and the documentation have != PSCDEPTH_OFF, not ==.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Remove "disable_derivative_optimization" driconf option.Kenneth Graunke2014-12-021-7/+2
| | | | | | | | | | | | | This was added in September 2013 when we first implemented the fast (but lower quality) derivatives. A quick Google search didn't turn up anyone using or recommending the option, so I suspect no one does. Applications that want to control the quality of their derivatives can use the new GL_ARB_derivative_control extension, or use the glHint mechanism. The driconf option seems superfluous. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add _CACHE_ in brw_cache_id enum names.Kenneth Graunke2014-11-291-3/+3
| | | | | | | | | | | | BRW_CACHE_VS_PROG is more easily associated with program caches than plain BRW_VS_PROG. While we're at it, rename BRW_WM_PROG to BRW_CACHE_FS_PROG, to move away from the outdated Windowizer/Masker name. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Alphabetize brw_tracked_state flags and use a consistent style.Kenneth Graunke2014-11-291-15/+15
| | | | | | | | | | | | | | | | Most of the dirty flags were listed in some arbitrary order. Some used bonus parenthesis. Some put multiple flags on one line, others put one per line. Some used tabs instead of spaces...but only on some lines. This patch settles on one flag per line, in alphabetical order, using spaces instead of tabs, and sheds the unnecessary parentheses. Sorting was mostly done with vim's visual block feature and !sort, although I alphabetized short lists by hand; it was pretty manual. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Set prog_data->uses_kill if simulating alpha test via discards.Kenneth Graunke2014-11-271-1/+4
| | | | | | | | | | | | | | When using MRT on Gen4-5, we have to simulate GL's alpha test feature by emitting discards in the fragment shader. In this case, it makes sense to set prog_data->uses_kill, which means the fragment shader may kill pixels via the discard mechanism. This saves us from having to look an extra key value in a couple of places, including in the generator. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add uses_kill to brw_wm_prog_dataJordan Justen2014-09-051-0/+1
| | | | | Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move curb_read_length/total_scratch to brw_stage_prog_data.Kenneth Graunke2014-09-031-2/+2
| | | | | | | | All shader stages have these fields, so it makes sense to store them in the common base structure, rather than duplicating them in each. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Store uniform constant values in a gl_constant_value instead of floatNeil Roberts2014-08-141-2/+3
| | | | | | | | | | | | | | | | | | | | | | The brw_stage_prog_data struct previously contained an array of float pointers to the values of parameters. These were then copied into a batch buffer to upload the values using a regular assignment. However the float values were also being overloaded to store integer values for integer uniforms. This can break if x87 floating-point registers are used to do the assignment because the fst instruction tries to fix up invalid float values. If an integer constant happened to look like an invalid float value then it would get altered when it was copied into the batch buffer. This patch changes the pointers to be gl_constant_value instead so that the assignment should end up copying without any alteration. This also makes it more obvious that the values being stored here are overloaded for multiple types. There are some static asserts where the values are uploaded to ensure that the size of gl_constant_value is the same as a float. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=81150 Reviewed-by: Kenneth Graunke <[email protected]>
* util: Move ralloc to a new src/util directory.Kenneth Graunke2014-08-041-1/+1
| | | | | | | | | | | | | | | | | | For a long time, we've wanted a place to put utility code which isn't directly tied to Mesa or Gallium internals. This patch creates a new src/util directory for exactly that purpose, and builds the contents as libmesautil.la. ralloc seemed like a good first candidate. These days, it's directly used by mesa/main, i965, i915, and r300g, so keeping it in src/glsl didn't make much sense. Signed-off-by: Kenneth Graunke <[email protected]> v2 (Jason Ekstrand): More realloc uses and some scons fixes Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Marek Olšák <[email protected]>
* i965/fs: Fix gl_SampleID for 2x MSAA and SIMD16 mode.Kenneth Graunke2014-07-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | We might be able to do this without an extra program key field, but this is non-invasive and fixes the bug, for now. This fixes the following Piglit tests on Broadwell: - ARB_sample_shading/builtin-gl-sample-id 2 - ARB_sample_shading/builtin-gl-sample-position 2 - EXT_framebuffer_multisample/multisample-blit 2 color - EXT_framebuffer_multisample/multisample-blit 2 color linear - EXT_framebuffer_multisample/multisample-blit 2 depth - EXT_framebuffer_multisample/no-color 2 depth combined - EXT_framebuffer_multisample/no-color 2 depth separate - EXT_framebuffer_multisample/no-color 2 depth single - EXT_framebuffer_multisample/no-color 2 depth-computed combined - EXT_framebuffer_multisample/no-color 2 depth-computed separate - EXT_framebuffer_multisample/no-color 2 depth-computed single - EXT_framebuffer_multisample/unaligned-blit 2 color msaa - EXT_framebuffer_multisample/unaligned-blit 2 depth msaa Signed-off-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80991 Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965: Add missing persample_shading field to brw_wm_debug_recompile.Kenneth Graunke2014-07-211-0/+2
| | | | | | | | Otherwise, the performance warning for shader recompiles will just say "something else". Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Support GL_CLAMP natively on Broadwell.Kenneth Graunke2014-06-051-1/+2
| | | | | | | | | | | | The new hardware actually supports this OpenGL 1.x feature natively, so we can finally drop our shader workarounds. Not many applications use GL_CLAMP, and most use it unintentionally, but it's trivial to do right, so we should. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Cc: "10.2" <[email protected]>
* i965/fs: Finally kill struct brw_wm_compile (better known as 'c').Kenneth Graunke2014-05-181-11/+11
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Stop copying the program key.Kenneth Graunke2014-05-181-6/+4
| | | | | | | | | We already have a perfectly good copy of the program key, and nobody is going to modify it. The only reason we copied it was because the brw_wm_compile structure embedded the key rather than pointing to it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Rip struct brw_wm_compile out of the visitors and generators.Kenneth Graunke2014-05-181-1/+2
| | | | | | | | | Instead, just pass the key and prog_data as separate parameters. This moves it up a level - one step further toward getting rid of it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Plumb a mem_ctx all the way through the FS compile.Kenneth Graunke2014-05-181-4/+5
| | | | | | | | 'c' is going away, but we still need a memory context that lives for the duration of the compile. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Actually free program data on the error path.Kenneth Graunke2014-05-181-1/+3
| | | | | | | | | We throw away the data generated during compilation on the success path, so we really ought to on the failure path as well. The caller has no access to it anyway, so it's purely leaked. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Move total_scratch calculation into fs_visitor::run().Kenneth Graunke2014-05-181-4/+1
| | | | | | | | | | | With this one use gone, c->last_scratch is now only used inside fs_visitor. The rest of the driver uses prog_data->total_scratch. We already compute similar prog_data fields in fs_visitor, so this seems reasonable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Move perf_debug about register spilling to a more obvious spot.Kenneth Graunke2014-05-181-4/+0
| | | | | | | | | | | | | The if (!allocated_without_spills) block is an obvious spot for this performance warning message. In the Vec4 backend, scratch is also used for indirect access of temporary arrays. The FS backend doesn't implement that yet, but if it did, this message would be inaccurate, since scratch access wouldn't necessarily mean spilling. Moving it preemptively fixes that. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* mesa: Replace use of _ReallyEnabled as a boolean with use of _Current.Eric Anholt2014-04-301-1/+1
| | | | | | | | | | | | | I'm probably not the only person that has tried to kill _ReallyEnabled. This does the mechanical part of the work, and cleans _ReallyEnabled from i965. I think that using _Current makes texture management clearer: You can't have multiple targets in use in the same texture image unit at the same time, because there's just that one pointer. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove unused sampler key fieldsTopi Pohjolainen2014-04-081-10/+0
| | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* mesa/sso: rename Shader to the pointer _ShaderGregory Hainaut2014-03-251-1/+1
| | | | | | | | | | | | | | | | Basically a sed but shaderapi.c and get.c. get.c => GL_CURRENT_PROGAM always refer to the "old" UseProgram behavior shaderapi.c => the old api stil update the Shader object directly V2: formatting improvement V3 (idr): * Rebase fixes after a block of code was moved from ir_to_mesa.cpp to shaderapi.c. * Trivial reformatting. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/fs: Use a separate variable to keep track of the last uniform index seen.Francisco Jerez2014-02-191-0/+1
| | | | | | | Like the VEC4 back-end does. It will make dynamic allocation of the param_size array easier in a future commit. Reviewed-by: Paul Berry <[email protected]>
* i965: Move up duplicated fields from stage-specific prog_data to ↵Francisco Jerez2014-02-191-17/+9
| | | | | | | | | | | | | brw_stage_prog_data. There doesn't seem to be any reason for nr_params, nr_pull_params, param, and pull_param to be duplicated in the stage-specific subclasses of brw_stage_prog_data. Moving their definition to the common base class will allow some code sharing in a future commit, the removal of brw_vec4_prog_data_compare and brw_*_prog_data_free, and the simplification of the stage-specific brw_*_prog_data_compare. Reviewed-by: Paul Berry <[email protected]>
* i965: Add Gen6 gather wa to sampler keyChris Forbes2014-02-081-0/+21
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Ignore 'centroid' interpolation qualifier in case of persample shadingAnuj Phogat2014-01-281-1/+2
| | | | | | | | | | | | I missed this change in commit f5cfb4a. It fixes the incorrect rendering caused in Dolphin Emulator. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73915 Cc: [email protected] Signed-off-by: Anuj Phogat <[email protected]> Tested-by: Markus Wick <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Switch from BRW_MAX_TEX_UNIT to the actual limit.Kenneth Graunke2014-01-221-1/+2
| | | | | | | | | | BRW_MAX_TEX_UNIT is about to grow, but only Gen7+ will be able to support the new larger value. On older platforms, we don't want to allocate the extra space - it would just be a waste. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Use sample barycentric coordinates with per sample shadingAnuj Phogat2014-01-211-2/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current implementation of arb_sample_shading doesn't set 'Barycentric Interpolation Mode' correctly. We use pixel barycentric coordinates for per sample shading. Instead we should select perspective sample or non-perspective sample barycentric coordinates. It also enables using sample barycentric coordinates in case of a fragment shader variable declared with 'sample' qualifier. e.g. sample in vec4 pos; A piglit test to verify the implementation has been posted on piglit mailing list for review. V2: Do not interpolate all the 'in' variables at sample position if fragment shader uses 'sample' qualifier with one of them. For example we have a fragment shader: #version 330 #extension ARB_gpu_shader5: require sample in vec4 a; in vec4 b; main() { ... } Only 'a' should be sampled at sample location, not 'b'. Cc: [email protected] Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add an option to ignore sample qualifierAnuj Phogat2014-01-211-1/+1
| | | | | | | | | | This will be useful in my next patch which depends on a functionality of _mesa_get_min_invocations_per_fragment() to ignore the sample qualifier (prog->IsSample) based on a flag passed to it. Cc: [email protected] Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* s/Tungsten Graphics/VMware/José Fonseca2014-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tungsten Graphics Inc. was acquired by VMware Inc. in 2008. Leaving the old copyright name is creating unnecessary confusion, hence this change. This was the sed script I used: $ cat tg2vmw.sed # Run as: # # git reset --hard HEAD && find include scons src -type f -not -name 'sed*' -print0 | xargs -0 sed -i -f tg2vmw.sed # # Rename copyrights s/Tungsten Gra\(ph\|hp\)ics,\? [iI]nc\.\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./g /Copyright/s/Tungsten Graphics\(,\? [iI]nc\.\)\?\(, Cedar Park\)\?\(, Austin\)\?\(, \(Texas\|TX\)\)\?\.\?/VMware, Inc./ s/TUNGSTEN GRAPHICS/VMWARE/g # Rename emails s/[email protected]/[email protected]/ s/[email protected]/[email protected]/g s/jrfonseca-at-tungstengraphics-dot-com/jfonseca-at-vmware-dot-com/ s/jrfonseca\[email protected]/[email protected]/g s/keithw\[email protected]/[email protected]/g s/[email protected]/[email protected]/g s/thomas-at-tungstengraphics-dot-com/thellstom-at-vmware-dot-com/ s/[email protected]/[email protected]/ # Remove dead links s@Tungsten Graphics (http://www.tungstengraphics.com)@Tungsten Graphics@g # C string src/gallium/state_trackers/vega/api_misc.c s/"Tungsten Graphics, Inc"/"VMware, Inc"/ Reviewed-by: Brian Paul <[email protected]>
* i965: Don't flag gather quirks for Gen8+Chris Forbes2013-12-071-1/+1
| | | | | | | | | | | My understanding is that Broadwell retains the same SCS mechanism that Haswell has, so even if the underlying issue with this format is not fixed, the w/a will be applied in SCS rather than needing shader code. Signed-off-by: Chris Forbes <[email protected]> Cc: Kenneth Graunke <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/Gen7: Include bitfield in the sampler key for CMS layoutChris Forbes2013-12-071-0/+13
| | | | | | | | | | | | | | | | | | | | We need to emit extra shader code in this case to sample the MCS surface first; we can't just blindly do this all the time since IVB will sometimes try to access the MCS surface even if disabled. V3: Use actual MSAA layout from the texture's mt, rather then computing what would have been used based on the format. This is simpler and less fragile - there's at least one case where we might want to have a texture's MSAA layout change based on what the app does (CMS SINT falling back to UMS if the app ever attempts to render to it with a channel disabled.) This also obsoletes V2's 1/10 -- compute_msaa_layout can now remain an implementation detail of the miptree code. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Drop trailing whitespace from the rest of the driver.Kenneth Graunke2013-12-051-6/+6
| | | | | | | Performed via: $ for file in *; do sed -i 's/ *//g'; done Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Fix texture swizzling on Broadwell.Kenneth Graunke2013-12-021-1/+1
| | | | | | | | Like Haswell, we do this in SURFACE_STATE rather than shader workarounds. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Gen4-5: Include alpha func/ref in program keyChris Forbes2013-11-061-0/+16
| | | | | | | V2: Better explanation of the rationale for doing this. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add FS backend for builtin gl_SampleIDAnuj Phogat2013-11-011-0/+6
| | | | | | | | | | | | | V2: - Update comments - Add compute_sample_id variables in brw_wm_prog_key - Add a special backend instruction to compute sample_id. V3: - Make changes to support simd16 mode. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Add FS backend for builtin gl_SamplePositionAnuj Phogat2013-11-011-0/+6
| | | | | | | | | | | | | V2: - Update comments. - Add compute_pos_offset variable in brw_wm_prog_key. - Add variable uses_pos_offset in brw_wm_prog_data. V3: - Make changes to support simd16 mode. Signed-off-by: Anuj Phogat <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Make a brw_stage_prog_data for storing the SURF_INDEX information.Eric Anholt2013-10-151-1/+2
| | | | | | | | | | | It would be nice to be able to pack our binding table so that programs that use 1 render target don't upload an extra BRW_MAX_DRAW_BUFFERS - 1 binding table entries. To do that, we need the compiled program to have information on where its surfaces go. v2: Rename size to size_bytes to be more explicit. Reviewed-by: Paul Berry <[email protected]>
* i965: Remove dead arguments from prog_data_compare.Eric Anholt2013-10-151-2/+1
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/ivb: Flag RG32F quirk for texture gather regardless of swizzlesChris Forbes2013-10-061-1/+1
| | | | | | | | | | | | As of ARB_gpu_shader5, textureGather doesn't always read the post-swizzle RED channel -- so we can't just look at the red swizzle state. Theoretically we could only flag the quirk if *some* green swizzle is in use, but that's probably more trouble than it's worth. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/hsw: Apply gather4 RG32F w/a using SCS instead of shader.Chris Forbes2013-10-031-2/+3
| | | | | | | | The new surface channel select bits allow us to avoid having to recompile the shader for this workaround. Signed-off-by: Chris Forbes <[email protected]> Reviewed-and-tested-by: Kenneth Graunke <[email protected]>
* i965: w/a for gather4 green RG32FChris Forbes2013-10-031-0/+9
| | | | | | | | | V4: Only flag quirks if there are any uses of gather in the shader, to avoid spurious recompiles just because someone happened to use RG32F. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: compute DDX in a subspan based only on top rowChia-I Wu2013-10-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Consider only the top-left and top-right pixels to approximate DDX in a 2x2 subspan, unless the application requests a more accurate approximation via GL_FRAGMENT_SHADER_DERIVATIVE_HINT or this optimization is disabled from the new driconf option disable_derivative_optimization. This results in a less accurate approximation. However, it improves the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at 95.0% confidence) on Haswell. No noticeable image quality difference observed. The improvement comes from faster sample_d. It seems, on Haswell, some optimizations are introduced to allow faster sample_d when all pixels in a subspan have the same derivative. I considered SAMPLE_STATE too, which allows one to control the quality of sample_d on Haswell. But it gave much worse image quality without giving better performance comparing to this change. No piglit quick.tests regression on Haswell (tested with v1). v2: better guess for precompile program key Signed-off-by: Chia-I Wu <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Chris Forbes <[email protected]>