aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/Makefile.sources
Commit message (Collapse)AuthorAgeFilesLines
* i965/blorp: introduce separate eu-emitter for blit compilerTopi Pohjolainen2014-01-231-0/+1
| | | | | | | | | | | | Prepares for presenting blorp blit programs using FS IR that allows EU-assembly generation using i965 glsl-compiler backend (fs_generator). v2: rebased on top of endif-jump counter fix (moving the added brw_set_uip_jip() into the emitter) Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Create a new fragment shader backend for Broadwell.Kenneth Graunke2014-01-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | This replaces the old fs_generator backend. v2: Port to the C-based representation of assembly instructions. Fix texturing after the texture-grf merge. v3: Add high quality derivative support. Fix SET_SIMD4X2_OFFSET. v4: Pass brw_context to gen8_instruction functions as required. v5: Fixes for MRT, as well as zero render targets (alpha test only). v6: Replace n-wide with SIMDn in comments and messages; port over Topi's blorp-generator changes; add missing TXF_MCS opcode, fix missing high quality derivatives for DDX; fix typo (all caught by Eric). Simplify ADDC/SUBB handling; drop "Used only on Gen6+" comment (caught by Matt). Emit SIMD16 versions of three source instructions (caught by both Eric and Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Create a new vec4 backend for Broadwell.Kenneth Graunke2014-01-181-0/+1
| | | | | | | | | | | | | | | | | | | | This replaces the old vec4_generator backend. v2: Port to use the C-based instruction representation. Also, remove Geometry Shader offset hacks - the visitor will handle those instead of this code. v3: Texturing fixes (including adding textureGather support). v4: Pass brw_context to gen8_instruction functions as required. v5: Add SHADER_OPCODE_TXF_MCS support; port DUAL_INSTANCED gs fixes (caught by Eric). Simplify ADDC/SUBB handling; add comments to gen8_set_dp_message calls (suggested by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Add a new infrastructure for generating Broadwell shader assembly.Kenneth Graunke2014-01-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | This replaces the brw_eu_emit.c layer for Broadwell. It will be used by both the vector and scalar shader backends. v2: Port to use the C-based instruction representation. v3: Fix destination register type for CMP. v4: Pass brw to gen8_instruction functions (required by rebase). v5: Remove bogus assertion on math instructions (caught by Piglit). v6: Remove more restrictions on math instructions (caught by Eric). Make ADDC and SUBB helpers set accumulator writes, like MAC and MACH (caught by Matt). v7: Don't implicitly force ALU3 operations to SIMD8 (we've been able to do SIMD16 versions since Haswell, but didn't when I originally wrote this code). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Implement a disassembler for Broadwell's new instruction encoding.Kenneth Graunke2014-01-181-0/+1
| | | | | | | | | | | | | | | | | | | | | Heavily based on Keith Packard's existing brw_disasm.c code. I've tried to go through most of the pieces (like SFIDs) and update the lists to include features added in recent generations. v2: Port to use the C-based instruction emitters. This allows us to use C99 array initializers, which tidies up some of the code. v3: Improve decoding of render target write messages. v4: Update for BRW_REGISTER_TYPE becoming an abstraction. v5: Rebase on Chris Forbes' SFID message defines. v6: Fix disassembly of UV immediates; remove silly casts. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Matt Turner <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add a new representation for Broadwell shader instructions.Kenneth Graunke2014-01-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Broadwell significantly changes the EU instruction encoding. Many of the fields got moved to different bit positions; some even got split in two. With so many changes, it was infeasible to continue using struct brw_instruction. We needed a new representation. This new approach is a bit different: rather than a struct, I created a class that has four DWords, and helper functions that read/write various bits. This has several advantages: 1. We can create several different names for the same bits. For example, conditional modifiers, SFID for SEND instructions, and the MATH instruction's function opcode are all stored in bits 27:24. In each situation, we can use the appropriate setter function: set_sfid(), set_math_function(), or set_cond_modifier(). This is much easier to follow. 2. Since the fields are expressed using the original 128-bit numbers, the code to create the getter/setter functions follows the table in the documentation very closely. To aid in debugging, I've enabled -fkeep-inline-functions when building gen8_instruction.c. Otherwise, these functions cannot be called by gdb, making it insanely difficult to print out anything. Kenneth Graunke wrote most of this code. Damien Lespiau ported it to C99. Xiang Haihao added media fields. Zhao Yakui added indirect addressing support. Eric Anholt added an assertion to make sure that values fit in the alloted number of bits. v2: Update for brw_reg_type_to_hw_type(), which necessitates passing brw_context pointers around everywhere. Signed-off-by: Kenneth Graunke <[email protected]> Signed-off-by: Damien Lespiau <[email protected]> Signed-off-by: Xiang, Haihao <[email protected]> Signed-off-by: Zhao Yakui <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Acked-by: Matt Turner <[email protected]>
* i965/fs: New peephole optimization to flatten IF/BREAK/ENDIF.Matt Turner2013-12-041-0/+1
| | | | | | | total instructions in shared programs: 1550713 -> 1550449 (-0.02%) instructions in affected programs: 7931 -> 7667 (-3.33%) Reviewed-by: Paul Berry <[email protected]>
* i965/fs: New peephole optimization to generate SEL.Matt Turner2013-12-041-0/+1
| | | | | | | | | | | | | | | | | | | | | fs_visitor::try_replace_with_sel optimizes only if statements whose "then" and "else" bodies contain a single MOV instruction. It also could not handle constant arguments, since they cause an extra MOV immediate to be generated (since we haven't run constant propagation, there are more than the single MOV). This peephole fixes both of these and operates as a normal optimization pass. fs_visitor::try_replace_with_sel is still arguably necessary, since it runs before pull constant loads are lowered. total instructions in shared programs: 1559129 -> 1545833 (-0.85%) instructions in affected programs: 167120 -> 153824 (-7.96%) GAINED: 13 LOST: 6 Reviewed-by: Paul Berry <[email protected]>
* i965: Add basic driver hooks and plumbing for AMD_performance_monitor.Kenneth Graunke2013-11-211-0/+1
| | | | | | | These stub functions will be filled out in later patches. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add a pass to remove dead control flow.Matt Turner2013-11-201-0/+1
| | | | | | | | | | | | | Removes IF/ENDIF and IF/ELSE/ENDIF with no intervening instructions. total instructions in shared programs: 1360393 -> 1360387 (-0.00%) instructions in affected programs: 157 -> 151 (-3.82%) (no change in vertex shaders) Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add function to query the GPU reset status for a contextIan Romanick2013-11-071-0/+1
| | | | | | | v2: Update based on kernel interface / libdrm changes. Signed-off-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* Revert "i965: Add support for GL_AMD_performance_monitor on Ironlake."Kenneth Graunke2013-11-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts most of commit 0f2da773070c06b6d20ad264d3abb19c4dfd9761. (I chose to leave the additions to brw_defines.h.) My previous Ironlake implementation was somewhat broken: counter data was global, rather than per-context. This meant that performance monitors captured data from your compositor, 2D driver, and other 3D programs. Originally, I believed that Sandybridge and later had an easy way to avoid this problem (setting per-context flags in OACONTROL), while Ironlake did not. So I'd intended to leave it as a known limitation of performance monitoring support on Ironlake. However, this turned out not to be true. Unfortunately, our hardware only has one set of aggregating performance counters shared between all 3D programs, and their values are not saved or restored by hardware contexts. Also, at least on Sandybridge and Ivybridge, the counters lose their values if the GPU goes to sleep. To work around both of these problems, we have to snapshot the performance counters at the beginning and end of each batch, similar to how we handle query objects on platforms that don't support hardware contexts. For occlusion queries, this batch bookending approach is fairly simple: only one occlusion query can be active at a time, and the result is a single integer. Performance monitors are more complex: an arbitrary number of monitors can be active at a time, each monitoring some subset of our ~30 observability counters. Individual monitors can be started and stopped at any point during the batch. Tracking where each monitor started/ended relative to batch flushes ends up being a pain. And you can run out of space in the buffer. Properly supporting this required some serious rearchitecting of the code. Rather than writing patches to try and morph a broken system into a working one (which operates quite differently), I decided it would be simplest to revert the old code and start fresh. Parts will look familiar, but other parts are new. I also decided it would be best to include Sandybridge and Ivybridge support from the start, since the newer platforms have added complexity that I wanted to make sure worked. They're also what most people care about these days. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Combine gen6_clip_state.c and gen7_clip_state.c.Kenneth Graunke2013-11-041-1/+0
| | | | | | | | | | The changes between Gen6-7 are minimal, and can easily be solved with an extra generation check. This cuts a lot of duplicated code. It also helps prevent even more duplication for Broadwell. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Add lowering pass to fold offset into unnormalized coordsChris Forbes2013-10-261-0/+1
| | | | | | | | | | | | | | | It turns out that nonzero offsets with gsampler2DRect don't work -- they just return garbage. Work around this by folding the offset into the coord. Done as an IR pass rather than yet another hack in the visitors because it's clear what's going on this way. Can possibly reuse this to replace the existing txf coord+offset hacks. V2: Use ir_builder Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add lowering pass for splitting textureGatherOffsetsChris Forbes2013-10-261-0/+1
| | | | | | | | | | | | | | | | Rewrites textureGatherOffsets(s, p, offsets) into gvec4( textureGatherOffset(s, p, offsets[0]).w, textureGatherOffset(s, p, offsets[1]).w, textureGatherOffset(s, p, offsets[2]).w, textureGatherOffset(s, p, offsets[3]).w ) V2: Use ir_builder to be slightly clearer. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fold brwInitVtbl() into brwCreateContext().Kenneth Graunke2013-10-171-1/+0
| | | | | | | | | With most of the virtual functions gone, brwInitVtbl() is now tiny. Merging it into the caller allows us to delete the entire file. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Merge intel_context.c into brw_context.c.Kenneth Graunke2013-10-131-1/+0
| | | | | | | | | | | There's no point in having two files for context functions. This patch moves the code from intel_context.c into brw_context.c unmodified (other than whitespace fixes). Right now, this looks silly; future patches will merge functions and tidy things up. Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Add a new brw_device_info structure.Kenneth Graunke2013-10-131-0/+1
| | | | | | | | | | | | | | | The idea is that struct brw_device_info should store statically-known information about hardware features. Using the new family name in the PCI ID table, we can easily grab the right structure. This is basically the equivalent of intel_device_info in the kernel. This patch also makes the new structure available from intel_screen, but nothing uses it. Right now, it looks very redundant with existing fields, but that will change. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Pull out INTEL_DEBUG handling into new intel_debug.[ch] files.Kenneth Graunke2013-10-131-0/+1
| | | | | | | | | | | | | Now that there isn't an intel_context structure, the split between brw_context.[ch] and intel_context.[ch] is rather awkward and arbitrary. Removing intel_context.[ch] seems desirable, but not everything really belongs in brw_context.[ch], either. Moving INTEL_DEBUG handling into separate intel_debug.[ch] files should make them relatively easy to find. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Add support for GL_AMD_performance_monitor on Ironlake.Kenneth Graunke2013-09-261-0/+1
| | | | | | | | | | | | | | | | | | | Ironlake's counters are always enabled; userspace can simply send a MI_REPORT_PERF_COUNT packet to take a snapshot of them. This makes it easy to implement. The counters are documented in the source code for the intel-gpu-tools intel_perf_counters utility. v2: Adjust for core data structure changes. Add a table mapping buffer object offsets to exposed counters (which changes each generation). Finally, add report ID assertions to sanity check the BO layout (thanks to Carl Worth). v3: Update for core BeginPerfMonitor hook changes (requested by Brian). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965: Move binding table code to a new file, brw_binding_tables.c.Kenneth Graunke2013-09-191-0/+1
| | | | | | | | | | | | The code to upload the binding tables for each stage was scattered across brw_{vs,gs,wm}_surface_state.c and brw_misc_state.c, which also contain a lot of code to populate individual SURFACE_STATE structures. This patch brings all the binding table upload code together, and splits it out from the code which fills in SURFACE_STATE entries. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Rename brw_{fs,vec4}_emit.cpp to brw_{fs,vec4}_generator.cpp.Kenneth Graunke2013-09-181-2/+2
| | | | | | | | | | | | | | | | | | The previous names were really confusing to talk about: - brw_fs_visitor() contained methods named emit_whatever(). - brw_fs_generator() contained methods named generate_whatever(), but lived in brw_fs_emit.cpp. So when someone said "the emit layer", or "emit code", we weren't sure whether they meant the visitor's emit() functions or the generator in brw_fs_emit.cpp. By renaming these files, the method names, class names, and file names all match, which is much less confusing. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Eric Anholt <[email protected]>
* i965/gs: Add a state atom to set up geometry shader state.Paul Berry2013-09-111-0/+1
| | | | | | | | | | | | v2: Do not attempt to share the code that uploads 3DSTATE_BINDING_TABLE_POINTERS_GS, 3DSTATE_SAMPLER_STATE_POINTERS_GS, or 3DSTATE_GS with VS. Reviewed-by: Ian Romanick <[email protected]> v3: Add _NEW_TRANSFORM to gen7_gs_state. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Move vs-specific code out of brw_vec4_visitor.cpp.Paul Berry2013-09-051-0/+1
| | | | | | | | | | | This patch creates a new file brw_vec4_vs_visitor.cpp, to contain code that is specific to the vertex shader. Now the organization of vertex shader and geometry shader visitor code is symmetric: vs-specific code is in brw_vec4_vs_visitor.cpp, gs-specific code is in brw_vec4_gs_visitor.cpp, and code shared between vs and gs is in brw_vec4_visitor.cpp. Acked-by: Kenneth Graunke <[email protected]>
* i965/gs: make the state atom for compiling Gen7 geometry shaders.Paul Berry2013-08-311-0/+1
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> v2: Use "unsigned" rather than "GLuint".
* i965/gs: Implement support for geometry shader surfaces.Paul Berry2013-08-311-0/+1
| | | | | | | | | | | | | | This patch implements pull constant upload, binding table upload, and surface setup for geometry shaders, by re-using vertex shader code that was generalized in previous patches. Based on work by Eric Anholt <[email protected]>. v2: Update ditry bits for brw_gs_ubo_surfaces to account for commit 77d8fbc (mesa: add & use a new driver flag for UBO updates instead of _NEW_BUFFER_OBJECT). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: add GS visitors.Paul Berry2013-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces the vec4_gs_visitor class, which translates geometry shaders from GLSL IR to back-end opcodes. This class is derived from vec4_visitor (which is also the base class for vec4_vs_visitor), so as a result most of the back end code is shared. The only parts that differ are: - Geometry shaders use a different input payload organization, since the inputs need to match up with the outputs of the previous pipeline stage (vec4_gs_visitor::setup_payload() and vec4_gs_visitor::setup_varying_inputs()). - Geometry shader input array dereferences need a special stride computation, since all geometry shader inputs are interleaved into one giant array (vec4_gs_visitor::compute_array_stride()). - There are no geometry shader system values (vec4_gs_visitor::make_reg_for_system_value()). - At the beginning of a geometry shader, extra data in R0 needs to be zeroed out, and a vertex counter needs to be initialized (vec4_gs_visitor::emit_prolog()). - When EmitVertex() appears in the shader, the current contents of output variables need to be emitted to the URB, and the vertex counter needs to be incremented (vec4_gs_visitor::visit(ir_emit_vertex *)). - When generating a URB_WRITE message to output vertex data, the current state of the vertex counter needs to be used to store a write offset in the message header (vec4_gs_visitor::emit_urb_write_header()). - The URB_WRITE message that outputs vertex data needs to be sent using GS_OPCODE_URB_WRITE, since VS_OPCODE_URB_WRITE would overwrite the offsets in the message header (vec4_gs_visitor::emit_urb_write_opcode()). - At the end of a geometry shader, the final vertex count needs to be delivered using a URB WRITE message (vec4_gs_visitor::emit_thread_end()). - EndPrimitive() functionality is not implemented yet (vec4_gs_visitor::visit(ir_end_primitive *)). - There is no support for assembly shaders (vec4_gs_visitor::emit_program_code()). v2: Make num_input_vertices const. Refer to registers as rN rather than gN, for consistency with the PRM. Fix misspelling. Improve comment in the ir_emit_vertex visitor explaining why we emit vertices inside a conditional. Enclose the conditional code in the ir_emit_vertex visitor between curly braces. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Split intel_upload code out into a separate file.Kenneth Graunke2013-08-161-0/+1
| | | | | | | | | | This code upload performs batched uploads via a BO. By moving it out to a separate file, intel_buffer_objects.c only provides the core buffer object functionality. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Move GL_APPLE_object_purgeable functionality into a new file.Kenneth Graunke2013-08-161-0/+1
| | | | | | | | | | | | | | | | | | | | GL_APPLE_object_purgeable creates a mechanism for marking OpenGL objects as "purgeable" so they can be thrown away when system resources become scarce. It specifically applies to buffer objects, textures, and renderbuffers. The intel_buffer_objects.c file provides core functionality for GL buffer objects, such as MapBufferRange and CopyBufferSubData. Having texture and renderbuffer functionality in that file is a bit strange. The 2010 copyright on the new file is because Chris Wilson first added this code in January 2010 (commit 755915fa). v2: Actually remember to call the new dd table setup function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965 Gen4/5: Introduce 'interpolation map' alongside the VUE mapChris Forbes2013-08-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | The interpolation map (in brw->interpolation_mode) is a new auxiliary structure alongside the post-GS VUE map, which describes the interpolation modes for each VUE slot, for use by the clip and SF stages. This patch introduces a new state atom to compute the interpolation map, and adjusts the program keys for the clip and SF stages, but it is not actually used yet. [V1-2]: Signed-off-by: Olivier Galibert <galibert at pobox.com> V3: Updated for vue_map changes, intel -> brw merge, etc. (Chris Forbes) V4: Compute interpolation map as a new state atom rather than tacking it on the front of the clip setup V5: Rework commit message, make interpolation_mode_map a struct. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Paul Berry <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move the rest of intel_tex_layout.c into brw_tex_layout.c.Kenneth Graunke2013-07-031-1/+0
| | | | | | | | | | | The texture alignment unit functions are called from brw_tex_layout.c, so it makes sense to put them there. Since the only caller of intel_get_texture_alignment_unit() is in brw_tex_layout.c, it could be made into a static function. However, this patch instead simply folds it into the caller, as it's only two lines anyway. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Delete brw_print_reg() function.Kenneth Graunke2013-07-031-1/+0
| | | | | | | | | | This wasn't called from anywhere; presumably it was used to examine brw_regs when debugging shader assembly. However, it prints registers in a different notation than brw_disasm.c which everyone is used to...which means I doubt anyone will want to use it. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chad Versace <[email protected]>
* i965: Split surface format code into a new file (brw_surface_formats.c).Kenneth Graunke2013-06-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | brw_wm_surface_state.c has gotten rather large and unwieldy. At this point, it consists of two separate portions: 1. Surface format code This includes the giant table of surface formats and what features they support on each generation, as well as the code to translate between Mesa formats and hardware formats. This is used across all generations. 2. Binding table (SURFACE_STATE) related code. This is the code to generate SURFACE_STATE entries for renderbuffers, textures, transform feedback buffers, constant buffers, and so on, as well as the code to assemble them into binding tables. This is only used on Gen4-6; gen7_surface_state.c has Gen7+ code. Since the two are logically separate, and one is reused on every generation while the other is not, it makes a lot of sense to split them out. It should also make finding code easier. No code is changed by this patch. I simply copied the file then deleted portions of both. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Emit the depth/stencil state pointer directly, not via atoms.Kenneth Graunke2013-06-111-1/+0
| | | | | | | | | | | | | | | | | | | | | See two commits ago for the rationale. This allows us to delete the whole gen7_cc_state.c file. This does move these commands before the depth stall flushes from brw_emit_depthbuffer, which may be a problem. The documentation for 3DSTATE_DEPTH_BUFFER mentions that depth stall flushes are required before changing any depth/stencil buffer state, but explicitly lists 3DSTATE_DEPTH_BUFFER, 3DSTATE_HIER_DEPTH_BUFFER, 3DSTATE_STENCIL_BUFFER, and 3DSTATE_CLEAR_PARAMS. It does not mention this particular packet (_3DSTATE_DEPTH_STENCIL_STATE_POINTERS). No observed Piglit regressions on Sandybridge or Ivybridge. Together with the last two commits, this makes a cairo-gl benchmark faster by 0.324552% +/- 0.258355% on Ivybridge. No statistically significant change on Sandybridge. (Thanks to Eric for the numbers.) Signed-off-by: Kenneth Graunke <[email protected]>
* i965: Rely on hardware contexts for query objects on Gen6+.Kenneth Graunke2013-05-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Hardware contexts greatly simplify the query object code. The pipeline statistics counters get saved and restored with the context, which means that we don't need to worry about other workloads polluting them. This means that we can simply write a single pair of values (one at BeginQuery and one at EndQuery) rather than a series of pairs. This also means we don't need to worry about the BO getting full. We also don't need to delay BO allocation and starting snapshot until the first draw. The generation split here is a little off: technically, Ironlake can also support hardware contexts. However, the kernel currently doesn't, and even if it were to do so someday, we'd need to wait a while before bumping the kernel requirement to take advantage of it. v2: Incorporate Paul's feedback. - Clarify which functions are Gen4/5-only via assertions and comments. - Change how driver hook initialization happens. - Update comments. - Squash a bug fix from a later commit here where it belongs. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]> [v1] Acked-by: Paul Berry <[email protected]>
* i965: Move FS instruction scheduling to a non-FS-specific file.Eric Anholt2013-05-021-1/+1
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Implement color clears using a simple shader in blorp.Eric Anholt2013-04-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | The upside is less CPU overhead in fiddling with GL error handling, the ability to use the constant color write message in most cases, and no GLSL clear shaders appearing in MESA_GLSL=dump output. The downside is more batch flushing and a total recompute of GL state at the end of blorp. However, if we're ever going to use the fast color clear feature of CMS surfaces, we'll need this anyway since it requires very special state setup. This increases the fail rate of some the GLES3conform ARB_sync tests, because of the initial flush at the start of blorp. The tests already intermittently failed (because it's just a bad testing procedure), and we can return it to its previous fail rate by fixing the initial flush. Improves GLB2.7 performance 0.37% +/- 0.11% (n=71/70, outlier removed). v2: Rename the key member, use the core helper for sRGB, and use BRW_MASK_* enums, fix comment and indentation (review by Paul). v3: Rewrite a comment, drop a silly temporary variable (review by Ken) Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Remove the last spans code!Eric Anholt2013-04-301-1/+0
| | | | | | | | The remaining bits happen to do nothing that _swrast_span_render_start()/finish() don't do. Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Brian Paul <[email protected]>
* intel: Fold the one last function intel_tex_format.c into the caller.Eric Anholt2013-04-291-1/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Delete brw_vs_constval.c and the brw_wm_input_sizes atom.Kenneth Graunke2013-04-041-1/+0
| | | | | | | | This was only used to compute proj_attrib_mask, which was removed by the previous commit. That makes this dead code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove the old brw_optimize() code.Eric Anholt2013-04-011-1/+0
| | | | | | This is now done in the VS backend before instruction emit. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Move brw_wm_lookup_iz() to fs_visitor::setup_payload_gen4().Kenneth Graunke2012-11-261-1/+1
| | | | | | | This necessitates compiling brw_wm_iz.c as C++. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* automake: Merge per-type *_FILES variables in intel drivers.Eric Anholt2012-11-121-32/+29
| | | | | Reviewed-by: Chad Versace <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: Delete the old vertex shader backend.Kenneth Graunke2012-11-011-1/+0
| | | | | | It's no longer used for anything. Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Replace brw_vs_emit.c with dumping code into the vec4_visitor.Kenneth Graunke2012-11-011-0/+1
| | | | | | | | | | | | | | | | | | | | Rather than having two separate backends, just create a small layer that translates the subset of Mesa IR used for ARB_vertex_program and fixed function programs to the Vec4 IR. This allows us to use the same optimization passes, code generator, register allocator as for GLSL. v2: Incorporate Eric's review comments. - Fix use of uninitialized src_swiz[] values in the SWIZZLE_ZERO/ONE case: just initialize it to 0 (.x) since the value doesn't matter (those channels get writemasked out anyway). - Properly reswizzle source register's swizzles, rather than overwriting the swizzle. - Port the old brw_vs_emit code for computing .x of the EXP2 opcode. - Update comments, removing mention of NV_vertex_program, etc. - Delete remaining #warning lines and debug comments. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* i965/vs: Improve live interval calculation.Eric Anholt2012-10-171-0/+1
| | | | | | | | | | This is derived from the FS visitor code for the same, but tracks each channel separately (otherwise, some typical fill-a-channel-at-a-time patterns would produce excessive live intervals across loops and cause spilling). Reviewed-by: Kenneth Graunke <[email protected]> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48375 (crash -> failure, can turn into pass by forcing unrolling still)
* i965: Move brw_fs_cfg.* to brw_cfg.*.Eric Anholt2012-10-171-1/+1
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Remove the old ARB_fragment_program backend.Eric Anholt2012-10-081-6/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Replace brw_wm_* with dumping code into the fs_visitor.Eric Anholt2012-10-081-0/+1
| | | | | | | | | | | This makes a giant pile of code newly dead. It also fixes TXB on newer chipsets, which has been totally broken (I now have a piglit test for that). It passes the same set of Ian's ARB_fragment_program tests. It also improves high-settings ETQW performance by 3.2 +/- 1.9% (n=3), thanks to better optimization and having 8-wide along with 16-wide shaders. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=24355 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add support for instruction compaction.Eric Anholt2012-09-171-0/+1
| | | | | | | | | | | | | | | This reduces program size by using some smaller encodings for common bit patterns in the Gen ISA, with the hope of making programs fit in the instruction cache better. v2: Use larger bitshifts for the uncompressed field setups, in line with the way it's described in the spec. Consistently name a brw_compile "p" like all other code. Add a couple more tests. Consistently call things "compacted" not "compressed" (which is a different feature). Drop the explicit check for not compacting SENDs, which is unjustified and already implied by our lack of support for immediate values. Reviewed-by: Paul Berry <[email protected]>