mesa.git - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	i965/vec4: Delete the old vec4_vp code	Jason Ekstrand	2015-10-02	1	-1/+0
\| \| \| \|	Reviewed-by: Matt Turner <[email protected]>
*	i965/vec4: Import helpers to convert vectors into arrays and back.	Francisco Jerez	2015-09-25	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	These functions handle the conversion of a vec4 into the form expected by the dataport unit in message and message return payloads. The conversion is not always trivial because some messages don't support SIMD4x2 for some generations, in which case a strided copy may be necessary. v2: Split from the FS implementation. v3: Rewrite to avoid evil array_reg, emit_collect and emit_zip. Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/vec4: Introduce VEC4 IR builder.	Francisco Jerez	2015-09-25	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	See "i965/fs: Introduce FS IR builder." for the rationale. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965/fs: Add a very basic validation pass	Jason Ekstrand	2015-09-15	1	-0/+1
\| \| \| \| \| \| \| \|	Currently the validation pass only validates that regs_read and regs_written are consistent with the sizes of VGRF's. We can add more as we find it to be useful. Reviewed-by: Matt Turner <[email protected]>
*	i965: Move compute shader code around	Kristian Høgsberg Kristensen	2015-09-14	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This moves the compute shader code around in order to make the way the code is split up more consistent. There should be no functional changes. Typically we have a few files per stage: brw_vs.c, brw_wm.c brw_gs.c: code to drive code generation and implement precompiling and cache search. genX_<stage>_state.c gen specific implementation of the state emission for the shader stage. The brw_*_emit() functions are all in the same files as the visitor classes they use (with the exception of VS, which may use either vec4 or fs). To make compute follow this convention, we move the brw_cs_emit() function into brw_fs.cpp. We can then rename brw_cs.cpp to brw_cs.c and do this in C like the other similar files. Finally, move state setup and atoms to gen7_cs_state.c. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Signed-off-by: Kristian Høgsberg Kristensen <[email protected]>
*	i965/nir/gs: Handle geometry shaders inputs	Iago Toral Quiroga	2015-08-03	1	-0/+1
\| \| \| \| \| \| \|	Outputs from the vertex shader become array inputs in the geomtry shader, but the arrays are interleaved, so we need to map our inputs accordingly. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/nir/vec4: Add implementation placeholders for a new NIR->vec4 pass	Eduardo Lima Mitev	2015-08-03	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This patch will add a brw_vec4_nir.cpp file filled with entry point methods to the main functionality, following a structure similar to brw_fs_nir.cpp. Subsequent patches in this series will be adding the implementations for these methods, incrementally. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965/fs: Import surface message builder helper functions.	Francisco Jerez	2015-07-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Implement helper functions that can be used to construct and send untyped and typed surface read, write and atomic messages to the shared dataport unit easily. v2: Drop VEC4 suport. v3: Reimplement in terms of logical send opcodes. Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: Transplant PIPE_CONTROL routines to brw_pipe_control	Chris Wilson	2015-06-24	1	-0/+1
\| \| \| \| \| \| \| \| \|	Start trimming the fat from intel_batchbuffer.c. First by moving the set of routines for emitting PIPE_CONTROLS (along with the lore concerning hardware workarounds) to a separate brw_pipe_control.c Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Split VUE map handling out of brw_vs.c into brw_vue_map.c.	Kenneth Graunke	2015-06-22	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This was originally only used by the vertex shader, but it's now used by the geometry shader as well, and will also eventually be used for tessellation control and evaluation shaders. I suspect it will be easier to find in a file named after the concept. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965/fs: Introduce FS IR builder.	Francisco Jerez	2015-06-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The purpose of this change is threefold: First, it improves the modularity of the compiler back-end by separating the functionality required to construct an i965 IR program from the rest of the visitor god-object, what in turn will reduce the coupling between other components and the visitor allowing a more modular design. This patch doesn't yet remove the equivalent functionality from the visitor classes, as it involves major back-end surgery. Second, it improves consistency between the scalar and vector back-ends. The FS and VEC4 builders can both be used to generate scalar code with a compatible interface or they can be used to generate natural vector width code -- 1 or 4 components respectively. Third, the approach to IR construction is somewhat different to what the visitor classes currently do. All parameters affecting code generation (execution size, half control, point in the program where new instructions are inserted, etc.) are encapsulated in a stand-alone object rather than being quasi-global state (yes, anything defined in one of the visitor classes is effectively global due to the tight coupling with virtually everything else in the compiler back-end). This object is lightweight and can be copied, mutated and passed around, making helper IR-building functions more flexible because they can now simply take a builder object as argument and will inherit its IR generation properties in exactly the same way that a discrete instruction would from the same builder object. The emit_typed_write() function from my image-load-store branch is an example that illustrates the usefulness of the latter point: Due to hardware limitations the function may have to split the untyped surface message in 8-wide chunks. That means that the several functions called to help with the construction of the message payload are themselves required to set the execution width and half control correctly on the instructions they emit, and to allocate all registers with half the default width. With the previous approach this would require the used helper functions to be aware of the parameters that might differ from the default state and explicitly set the instruction bits accordingly. With the new approach they would get a modified builder object as argument that would influence all instructions emitted by the helper function as if it were the default state. Another example is the fs_visitor::VARYING_PULL_CONSTANT_LOAD() method. It doesn't actually emit any instructions, they are simply created and inserted into an exec_list which is returned for the caller to emit at some location of the program. This sort of two-step emission becomes unnecessary with the builder interface because the insertion point is one more of the code generation parameters which are part of the builder object. The caller can simply pass VARYING_PULL_CONSTANT_LOAD() a modified builder object pointing at the location of the program where the effect of the constant load is desired. This two-step emission (which pervades the compiler back-end and is in most cases redundant) goes away: E.g. ADD() now actually adds two registers rather than just creating an ADD instruction in memory, emit(ADD()) is no longer necessary. v2: Drop scalarizing VEC4 builder. v3: Take a backend_shader as constructor argument. Improve handling of debug annotations and execution control flags. v4: Drop Gen6 IF with inline comparison. Rename "instr" variable. Initialize cursor to NULL by default and add method to explicitly point the builder at the end of the program. Reviewed-by: Matt Turner <[email protected]>
*	i965: Remove the old fragment program code	Jason Ekstrand	2015-05-28	1	-1/+0
\| \| \| \| \| \| \|	Now that everything is running through NIR, this is all dead. Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: add brw_cs.h to the sources list	Emil Velikov	2015-05-19	1	-0/+1
\| \| \| \|	Signed-off-by: Emil Velikov <[email protected]>
*	i965: Use predicate enable bit for conditional rendering w/o stalling	Neil Roberts	2015-05-12	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers were not in the whitelist of available registers in the kernel driver until v3.19. This patch uses the command parser version from intel_screen to detect whether to attempt to set the predicate data registers. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. v2: Use the command parser version to detect whether we can write to the predicate data registers instead of trying to execute a register load command. v3: Simple rebase v4: Changes suggested by Kenneth Graunke: Split the load_64bit_register function out to a separate patch so it can be a shared public function. Avoid calling _mesa_check_conditional_render if we've already determined that there's no query object. Some styling fixes. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Implement DispatchCompute() back-end	Paul Berry	2015-05-02	1	-0/+1
\| \| \| \| \| \| \|	brw_emit_gpgpu_walker will be implemented in a subsequent patch. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/cache: Add support for CS in program state cache	Jordan Justen	2015-05-02	1	-0/+1
\| \| \| \| \| \|	Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Create NIR during LinkShader() and ProgramStringNotify().	Kenneth Graunke	2015-04-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, we translated into NIR and did all the optimizations and lowering as part of running fs_visitor. This meant that we did all of that work twice for fragment shaders - once for SIMD8, and again for SIMD16. We also had to redo it every time we hit a state based recompile. We now generate NIR once at link time. ARB programs don't have linking, so we instead generate it at ProgramStringNotify time. Mesa's fixed function vertex program handling doesn't bother to inform the driver about new programs at all (which is rather mean), so we generate NIR at the last minute, if it hasn't happened already. shader-db runs ~9.4% faster on my i7-5600U, with a release build. v2: Check NirOptions != NULL in ProgramStringNotify(). Don't bother using _mesa_program_enum_to_shader_stage as we already know it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
*	i965: add the remaining files to the tarball	Emil Velikov	2015-03-24	1	-0/+3
\| \| \| \| \|	Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Add a NIR analysis pass for determining when a boolean resolve is needed	Jason Ekstrand	2015-03-23	1	-0/+2
\| \| \| \| \| \| \| \| \| \|	v2: Fix the spelling of analyze and re-arrange code for better readability as per Connor's comments. v3: Make the naming of things more consistent and add a pile of comments v4: Stop trying to avoid vectors Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
*	i965/fs: Add pass to combine immediates.	Matt Turner	2015-02-17	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	total instructions in shared programs: 5885407 -> 5940958 (0.94%) instructions in affected programs: 3617311 -> 3672862 (1.54%) helped: 3 HURT: 23556 GAINED: 31 LOST: 165 ... but will allow us to always emit MAD instructions. Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Refactor tiled memcpy functions and move them into their own file	Sisinty Sasmita Patra	2015-01-26	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit refactors the tiled_memcpy code in intel_tex_subimage.c and moves it into its own file intel_tiled_memcpy files. Also, xtile_copy and ytile_copy are renamed to linear_to_xtiled and linear_to_ytiled respectively. The *_faster functions are similarly renamed. There was also a bit of logic to select between the the libc provided memcpy function and our custom memcpy that does an RGBA -> BGRA swizzle. This was moved into an intel_get_memcpy function so that rgba8_copy can live (and be inlined) in intel_tiled_memcpy.c. v2: Jason Ekstrand <[email protected]> - Better commit message - Fix up the copyright on the intel_tiled_memcpy files - Various whitespace fixes - Moved a bunch of stuff that did not need to be exposed from intel_tiled_memcpy.h to intel_tiled_memcpy.c - Added proper documentation for intel_get_memcpy - Incorperated the ptrdiff_t tweaks from commit 225a09790 v3: Jason Ekstrand <[email protected]> - Fixed a comment - Move the tile size constants into the .c file Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965/fs: Add pass to propagate conditional modifiers.	Matt Turner	2015-01-23	1	-0/+1
\| \| \| \| \| \| \| \| \|	total instructions in shared programs: 5974160 -> 5959463 (-0.25%) instructions in affected programs: 1743737 -> 1729040 (-0.84%) GAINED: 0 LOST: 12 Reviewed-by: Kenneth Graunke <[email protected]>
*	i965/fs: add a NIR frontend	Connor Abbott	2015-01-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	This is similar to the GLSL IR frontend, except consuming NIR. This lets us test NIR as part of an actual compiler. v2: Jason Ekstrand <[email protected]>: Make brw_fs_nir build again Only use NIR of INTEL_USE_NIR is set whitespace fixes
*	i965: Add headers to distribution.	Matt Turner	2014-12-12	1	-0/+47
\|
*	i965: Alphabetize source list.	Matt Turner	2014-12-12	1	-35/+35
\|
*	i965/vec4: Rewrite dead code elimination to use live in/out.	Matt Turner	2014-12-01	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improves 359 shaders by >=10% 114 shaders by >=20% 91 shaders by >=30% 82 shaders by >=40% 22 shaders by >=50% 4 shaders by >=60% 2 shaders by >=80% total instructions in shared programs: 5845346 -> 5822422 (-0.39%) instructions in affected programs: 364979 -> 342055 (-6.28%) Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Add functions to convert float <-> VF.	Matt Turner	2014-11-25	1	-0/+1
\| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Rename brw_vec4_gs.[ch] to brw_gs.[ch].	Kenneth Graunke	2014-10-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	These source files support actual geometry shaders, so using "gs" for the name makes a lot of sense. We're going to be adding SIMD8 geometry shader support as well, at which point "vec4_gs" will be a misnomer. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Iago Toral Quiroga <[email protected]>
*	i965: Rename brw_gs{,_emit}.[ch] to brw_ff_gs{,_emit}.[ch].	Kenneth Graunke	2014-10-29	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The brw_gs.[ch] and brw_gs_emit.c source files contain code for emulating fixed-function unit functionality (VF primitive decomposition or SOL) using the GS unit. They do not contain code to support proper geometry shaders. We've taken to calling that code "ff_gs" (see brw_ff_gs_prog_key, brw_ff_gs_prog_data, brw_context::ff_gs, brw_ff_gs_compile, brw_ff_gs_prog). So it makes sense to make the filenames match. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Matt Turner <[email protected]> Acked-by: Jason Ekstrand <[email protected]> Acked-by: Iago Toral Quiroga <[email protected]>
*	i965/gen6/gs: Add initial implementation for a gen6 geometry shader visitor.	Iago Toral Quiroga	2014-09-19	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Geometry shaders in gen6 are significantly different from gen7+ so it is better to have them implemented in a different file rather than adding gen6 branching paths all over brw_vec4_gs_visitor.cpp. This commit adds an initial implementation that only handles point output, which is the simplest case. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
*	i965: Split gen6 depth hiz state out from brw	Jordan Justen	2014-08-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \|	We will program the gen6 hiz depth state differently to enable layered rendering on gen6. v2: * Remove unneeded gen6_emit_depthbuffer as suggested by Topi Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Split gen6 renderbuffer surface state from gen5 and older	Jordan Justen	2014-08-15	1	-0/+1
\| \| \| \| \| \| \| \| \|	We will program the gen6 renderbuffer surface state differently to enable layered rendering on gen6. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Implement fast color clears using meta operations	Kristian Høgsberg	2014-08-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch uses the infrastructure put in place by previous patches to implement fast color clears and replicated color clears in terms of meta operations. This works all the way back to gen7 where fast clear was introduced and adds support for fast clear on gen8. It replaces the blorp path completely and improves on a few cases. Layered clears are now done using instanced rendering and multiple render-target clears use a MRT shader with rep16 writes. Signed-off-by: Kristian Høgsberg <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
*	android: dri/i9*5: remove used _INCLUDES variable	Emil Velikov	2014-08-13	1	-6/+1
\| \| \| \| \| \|	No longer needed as of last commit. Signed-off-by: Emil Velikov <[email protected]>
*	i965: Delete the Gen8 code generators.	Kenneth Graunke	2014-08-12	1	-4/+0
\| \| \| \| \| \| \| \|	We now use the brw_eu_emit.c code instead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
*	i965: Add support for ARB_copy_image	Jason Ekstrand	2014-08-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This, together with the meta path, provides a complete implemetation of ARB_copy_image. v2: Add a fallback memcpy path for when the texture is too big for the blitter v3: Properly support copying between two places on the same texture in the memcpy fallback v4: Properly handle blit between the same two images in the fallback path v5: Properly handle blit between the same two compressed images in the fallback path v6: Fix a typo in a comment Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Neil Roberts <[email protected]>
*	i965: Stop using gen7_update_sampler_state; rm gen7_sampler_state.c.	Kenneth Graunke	2014-08-02	1	-1/+0
\| \| \| \| \| \| \| \|	The code in brw_sampler_state.c now handles all generations; we don't need the extra Gen7+ only code anymore. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965: Rename brw_wm_sampler_state.c to brw_sampler_state.c.	Kenneth Graunke	2014-08-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the driver was originally written, it only supported texturing in the pixel shader backend; vertex and geometry shader texturing came much later. Originally, the pixel shader was referred to as "WM" (the Windowizer/Masker unit). So, this code happened to only be relevant for the WM stage, at the time. However, sampler state really applies to all stages, so putting "wm" in the filename doesn't make sense. I dropped it in gen7_sampler_state.c; at this point the asymmetry just trips people up. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965/vec4: Add basic common subexpression elimination.	Kenneth Graunke	2014-07-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	[mattst88]: Modified to perform CSE on instructions with the same writemask. Offered no improvement before. total instructions in shared programs: 1995633 -> 1995185 (-0.02%) instructions in affected programs: 14410 -> 13962 (-3.11%) Reviewed-by: Matt Turner <[email protected]> Signed-off-by: Kenneth Graunke <[email protected]>
*	i965: Rename intel_asm_printer -> intel_asm_annotation.	Matt Turner	2014-07-05	1	-1/+1
\| \| \| \| \| \|	The #ifndef include guards already said the right thing :) Reviewed-by: Topi Pohjolainen <[email protected]>
*	i965/disasm: Delete gen8_disasm.c.	Kenneth Graunke	2014-06-30	1	-1/+0
\| \| \| \| \| \| \| \|	The functionality has been merged into brw_disasm.c; use that instead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
*	i965: Add annotation data structure and support code.	Matt Turner	2014-05-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Will be used to print disassembly after jump targets are set and instructions are compacted, while still retaining higher-level IR annotations and basic block information. An array of 'struct annotation' will live along side the generated assembly. The generators will populate the array with their IR annotations, and basic block pointers if the instructions began or ended a basic block pointer. We'll then update the instruction offset when we compact instructions and then using the annotations print the disassembly. Reviewed-by: Eric Anholt <[email protected]>
*	i965/meta: Stencil blits	Topi Pohjolainen	2014-05-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	v2: Create the intel renderbuffer with level hardcoded to zero instead of overriding it in the surface state configuration. Also moved the dimension adjustments for tiling, mip level, msaa into the render buffer creation. Finally prepares for another blit path needed for miptree updownsampling. v3 (Ken): Dropped unnecessary memory context for "ralloc_asprintf()" Cc: "10.2" <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]>
*	i965/blorp: Expose coordinate scissoring and mirroring	Topi Pohjolainen	2014-05-12	1	-0/+1
\| \| \| \| \| \|	Cc: "10.2" <[email protected]> Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
*	i965: Delete the intel_regions.c code.	Eric Anholt	2014-05-01	1	-1/+0
\| \| \| \| \|	Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
*	i965/fs: Reimplement dead_code_elimination().	Matt Turner	2014-04-15	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	total instructions in shared programs: 1653399 -> 1651790 (-0.10%) instructions in affected programs: 92157 -> 90548 (-1.75%) GAINED: 2 LOST: 2 Also significantly reduces the number of optimization loop iterations: total loop iterations in shared programs: 39724 -> 31651 (-20.32%) loop iterations in affected programs: 21617 -> 13544 (-37.35%) Including some great pathological cases, like 29 -> 3 in Strike Suit Zero and 24 -> 3 in Dota2. Reviewed-by: Eric Anholt <[email protected]>
*	i965/fs: Split fs_visitor::register_coalesce() into its own file.	Matt Turner	2014-04-05	1	-0/+1
\| \| \| \| \| \| \|	The function has gotten large, and brw_fs.cpp is the largest source file in the driver. Reviewed-by: Anuj Phogat <[email protected]>
*	i965/gen8: Change the winsys MSAA blits from blorp to meta.	Eric Anholt	2014-03-24	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This gets us equivalent code paths on BDW and pre-BDW, except for stencil (where we don't have MSAA stencil resolve code yet) Improves MSAA-forced citybench by 7.94496% +/- 2.38429% (n=16). Reduces DRI2 MSAA glxgears performance by -12.3559% +/- 1.52845% (n=9). v2: Move the new meta code to brw_meta_updownsample.c, name it brw_meta_updownsample(), add a comment about intel_rb_storage_first_mt_slice(), and rename that function and move the RB generation into it (review ideas by Ken). v3: Fix 2 src vs dst pasteos in previous change. v4: Skip this path pre-gen8 for now, until we can analyze the glxgears performance delta some more. Reviewed-by: Kenneth Graunke <[email protected]>
*	glsl/i965: move lower_offset_array up to GLSL compiler level.	Dave Airlie	2014-02-25	1	-1/+0
\| \| \| \| \| \| \| \|	This lowering pass will be useful for gallium drivers as well, in order to support the GL TG4 oddity that is textureGatherOffsets. Reviewed-by: Chris Forbes <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
*	i965: Update GS state for Broadwell.	Kenneth Graunke	2014-01-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is quite similar to the Gen7 code. The main changes: - 48-bit relocations - Thread count is specified as U/2-1 instead of U-1. - An extra DWord (DW9) with clip planes, URB entry output length/offsets - We need to program the "Expected Vertex Count" (VerticesIn) v2: Set the number of binding table entries so they can be prefetched (requested by Eric Anholt). v3: Add a WARN_ONCE for a missing workaround. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>