aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: Move struct brw_compile (p) entirely inside fs_generator.Kenneth Graunke2012-11-266-6/+4
| | | | | | | | | | | | | | | | The brw_compile structure contains the brw_instruction store and the brw_eu_emit.c state tracking fields. These are only useful for the final assembly generation pass; the earlier compilation stages doesn't need them. This also means that the code generator for future hardware won't have access to the brw_compile structure, which is extremely desirable because it prevents accidental generation of Gen4-7 code. v2: rzalloc p, as suggested by Eric. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Split final assembly code generation out of fs_visitor.Kenneth Graunke2012-11-263-78/+156
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compiling shaders requires several main steps: 1. Generating FS IR from either GLSL IR or Mesa IR 2. Optimizing the IR 3. Register allocation 4. Generating assembly code This patch splits out step 4 into a separate class named "fs_generator." There are several reasons for doing so: 1. Future hardware has a different instruction encoding. Splitting this out will allow us to replace fs_generator (which relies heavily on the brw_eu_emit.c code and struct brw_instruction) with a new code generator that writes the new format. 2. It reduces the size of the fs_visitor monolith. (Arguably, a lot more should be split out, but that's left for "future work.") 3. Separate namespaces allow us to make helper functions for generating instructions in both classes: ADD() can exist in fs_visitor and create IR, while ADD() in fs_generator() can create brw_instructions. (Patches for this upcoming.) Furthermore, this patch changes the order of operations slightly. Rather than doing steps 1-4 for SIMD8, then 1-4 for SIMD16, we now: - Do steps 1-3 for SIMD8, then repeat 1-3 for SIMD16 - Generate final assembly code for both modes together This is because the frontend work can be done independently, but final assembly generation needs to pack both into a single program store to feed the GPU. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Abort on unsupported opcodes rather than failing.Kenneth Graunke2012-11-261-1/+1
| | | | | | | | | | | | | Final code generation should never fail. This is a bug, and there should be no user-triggerable cases where this could occur. Also, we're not going to have a fail() method in a moment. v2: Just abort() rather than assert, to cover the NDEBUG case (suggested by Eric). Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Make it possible to create a cfg_t without a backend_visitor.Kenneth Graunke2012-11-262-3/+18
| | | | | | | | | | All we really need is a memory context and the instruction list; passing a backend_visitor is just convenient at times. This will be necessary two patches from now. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Move uses of brw_compile from do_wm_prog to brw_wm_fs_emit.Kenneth Graunke2012-11-263-14/+20
| | | | | | | | | | | | The brw_compile structure is closely tied to the Gen4-7 hardware encoding. However, do_wm_prog is very generic: it just calls out to get a compiled program and then uploads it. This isn't ultimately where we want it, but it's a step in the right direction: it's now closer to the code generator. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Pass the brw_context pointer into fs_visitor explicitly.Kenneth Graunke2012-11-263-5/+7
| | | | | | | | We used to steal it out of the brw_compile struct...but fs_visitor isn't going to have one of those in the future. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Move brw_wm_compile::fp to fs_visitor.Kenneth Graunke2012-11-268-17/+19
| | | | | | | | Also change it from a brw_fragment_program to a gl_fragment_program, since that seems to be what everything wants anyway. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Remove struct brw_shader * parameter to fs_visitor constructor.Kenneth Graunke2012-11-263-5/+8
| | | | | | | | | | We can easily recover it from prog, and this makes it clear that we aren't passing additional information in. v2: Use an if-statement rather than the ?: operator (suggested by Eric). Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Move brw_wm_compile::dispatch_width into fs_visitor.Kenneth Graunke2012-11-269-66/+64
| | | | | | | | | | | | | | Also, rather than having brw_wm_fs_emit poke at it directly, make it a parameter to the fs_visitor constructor. All other changes generated by search and replace (with occasional whitespace fixup). v2: Make dispatch_width const (as suggested by Paul); fix doxygen mistake (pointed out by Eric); update for rebase. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Move brw_wm_lookup_iz() to fs_visitor::setup_payload_gen4().Kenneth Graunke2012-11-265-85/+82
| | | | | | | This necessitates compiling brw_wm_iz.c as C++. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Move brw_wm_payload_setup() to fs_visitor::setup_payload_gen6()Kenneth Graunke2012-11-264-68/+63
| | | | | | | | Now that we only have the one backend, there's no real point in keeping this separate. Moving it should allow some future simplifications. Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/fs: Remove brw_wm_compile::computes_depth field.Kenneth Graunke2012-11-264-6/+1
| | | | | | | | | | Everybody determines this by checking if fp's OutputsWritten field contains the FRAG_RESULT_DEPTH bit. Rather than having payload setup check this and set the computes_depth flag, we can just do the check in the only place that actually used it: emit_fb_writes(). Reviewed-by: Eric Anholt <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Enable ARB_vertex_type_2_10_10_10_rev on Gen4+.Chris Forbes2012-11-261-0/+1
| | | | | | | | v2 (Kayden): Move the enable into an existing intel->gen >= 4 block (as suggested by Ian). Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: emit w/a for packed attribute formats in VSChris Forbes2012-11-263-13/+126
| | | | | | | | | | | | | | | | Implements BGRA swizzle, sign recovery, and normalization as required by ARB_vertex_type_10_10_10_2_rev. V2: Ported to the new VS backend, since that's all that's left; fixed normalization. V3: Moved fixups out of the GLSL-only path, so it works for FF/VP too. V4 (Kayden): Rework ES3 normalization, don't heap allocate registers; tidy comments. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: set attribute w/a bits for packed formatsChris Forbes2012-11-261-4/+26
| | | | | | | | Flag the need for various workarounds to be applied by the vertex shader. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Generalize GL_FIXED VS w/a supportChris Forbes2012-11-263-14/+26
| | | | | | | | | | | Next few patches build on this to add other workarounds for packed formats. V2: rename BRW_ATTRIB_WA_COMPONENTS to BRW_ATTRIB_WA_COMPONENT_MASK; V3 (Kayden): remove separate bit for ES3 signed normalization Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: support 2_10_10_10 formats in get_surface_type.Chris Forbes2012-11-261-1/+19
| | | | | | | | | Always use R10G10B10A2_UINT; Most of the other formats we'd like don't actually work on the hardware. Will emit w/a for scaling, sign recovery and BGRA swizzle in the VS. Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: implement get_size for 2_10_10_10 formatsChris Forbes2012-11-261-0/+5
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vs: add support for emitting SHL, SHR, ASRChris Forbes2012-11-262-4/+10
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Fix hangs with FP KIL instructions pre-gen6.Eric Anholt2012-11-251-0/+2
| | | | | | | | | We can't support IF statements in 16-wide on these. To get back to 16-wide for these shaders, we need to support predicate on discard instructions in the backend IR, which is something we've sort of got on the list to do anyway. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55828 Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen4: Fix memory leak each time compile_gs_prog() is called.Eric Anholt2012-11-251-1/+1
| | | | | | | | | Commit 774fb90db3e83d5e7326b7a72e05ce805c306b24 introduced a ralloc context to each user of struct brw_compile, but for this one a NULL context was used, causing the later ralloc_free(mem_ctx) to not do anything. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55175 NOTE: This is a candidate for the stable branches.
* i965/gen4: Fix LOD bias texturing since my fixed reg classes change.Eric Anholt2012-11-251-10/+18
| | | | | | | | | | We have a special case where non-shadow comparison with LOD requires using a SIMD16 vec4 in an 8-wide shader, which appears in the register allocator as a size 8 vgrf. Fixes assertions in various piglit tests and webgl conformance. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56521
* scons: Append x11 library path if linking x11 library.Vinson Lee2012-11-211-0/+1
| | | | Signed-off-by: Vinson Lee <[email protected]>
* i915: Fix wrong sizeof argument in i915_update_tex_unit.Vinson Lee2012-11-211-1/+1
| | | | | | | | | The bug was found by Coverity. NOTE: This is a candidate for the stable branches. Signed-off-by: Vinson Lee <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* i965/fs: Add helper functions for IF and CMP and use them.Eric Anholt2012-11-204-85/+90
| | | | | v2: Rebase on gen6-if fix. Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965/fs: Add helper functions for generating ALU ops, like in the VS.Eric Anholt2012-11-204-209/+241
| | | | | | | | This gives us checking of our arguments (no more passing 1 operand to BRW_OPCODE_MUL!), at the cost of a couple of extra parens. v2: Rebase on gen6-if fix. Reviewed-by: Kenneth Graunke <[email protected]> (v1)
* i965/gen4: Fix crash with fragment programs and texture rectangle.Eric Anholt2012-11-191-1/+1
| | | | | | | | | | | | | This was a regression in the brw_fs_fp.cpp change. We just need to return something good enough to get the IR generation to the end without crashing, but ir->type isn't initialized and we wanted something of the coordinate's type anyway. Fixes around 30 piglit cases on my ilk system in drawpixels and framebuffer blit. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56962 Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Disable the GB clip test when a limited viewport is set.Eric Anholt2012-11-192-3/+19
| | | | | | | | | | | | | | The theory of the guardband is that you extend the clip volume to avoid expensive clipping computation, and just let fragments outside the viewport get clipped by the drawable's bounds. But if a smaller-than-window-size viewport is set, and we don't also happen to have a scissor set, then rendering could incorrectly extend outside of the viewport when it should have been clipped to the viewport. Fixes the new piglit triangle-guardband-viewport test. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for the 9.0 branch.
* i965: Use fewer temporary variables in clip setup.Eric Anholt2012-11-192-28/+18
| | | | | | | | When you're comparing to the spec, you're trying to immediately see what numbered dword of the packet your bit ends up in. Reviewed-by: Kenneth Graunke <[email protected]> NOTE: This is a candidate for the 9.0 branch.
* Revert "i965/fs: Fix conversions float->bool, int->bool"Eric Anholt2012-11-191-7/+7
| | | | | | | This reverts commit cf0bbb30f6bd9d3fa61b5207320e8f34c563a2c6. It was just papering over the bug fixed in the previous commit. Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Fix the gen6-specific if handling for 80ecb8f15b9ad7d6edcEric Anholt2012-11-191-24/+11
| | | | | | | | | Fixes oglconform shad-compiler advanced.TestLessThani. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=48629 NOTE: This is a candidate for the 9.0 branch. Acked-by: Kenneth Graunke <[email protected]>
* intel: Use designated initializers for DRI extension structsChad Versace2012-11-191-16/+20
| | | | | | | | | | | | | | | All Intel code is compiled with -std=c99. There is no excuse to not use designated initializers. As a nice benefit, the code is now more friendly to grep. Without designated initializers, psychic prowess is required to find the initialization of DRI extension function pointers with grep. I have observed several people, when they first encounter the DRI code, fail at statically chasing the DRI function pointers due to this problem. Reviewed-by: Matt Turner <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* dri: Use designated initializers for DRI extension structsChad Versace2012-11-191-27/+30
| | | | | | | | | | | | | | | The dri directory is compiled with -std=c99. There is no excuse to not use designated initializers. As a nice benefit, the code is now more friendly to grep. Without designated initializers, psychic prowess is required to find the initialization of DRI extension function pointers with grep. I have observed several people, when they first encounter the DRI code, fail at statically chasing the DRI function pointers due to this problem. Reviewed-by: Matt Turner <[email protected]> Acked-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965: Use the separate stencil buffer's offsets for stencil setup.Eric Anholt2012-11-191-15/+38
| | | | | | | | | | | | For a packed depth/stencil buffer on separate stencil hardware, the separate depth miptree is set up with alignment of 4,4 and the separate stencil miptree is setup with alignment of 8,8. We can't just use the irb->draw_{x,y} offsets for stencil, since that is the offset in the depth miptree. Fixes 12 piglit depthstencil testcases on ivb. Acked-by: Chad Versace <[email protected]>
* i965: Move all the depth/stencil/hiz offset logic into the workaround.Eric Anholt2012-11-193-187/+139
| | | | | | | | | | Given that we have the mask information here (assuming the rebase is to the same tiling, which is safe), we can just save a set of miptrees and offsets and the global intra-tile offset in the context and cut out a bunch of logic. This will also save emitting the next fix I need to do twice. Acked-by: Chad Versace <[email protected]>
* i965: When rebasing depth or stencil, update x/y before deciding the other.Eric Anholt2012-11-191-13/+36
| | | | | | | | | | | | Fixes a theoretical problem where we had an aligned depth buffer and a misaligned stencil buffer with a matching tile offset, so we would fail to rebase depth even after the needed tile offset changed due to the rebase of stencil. It should also fix double-rebase of a misaligned packed depth/stencil renderbuffer, which may have been a performance issue. Acked-by: Chad Versace <[email protected]>
* intel: Push face/level -> slice handling to the caller of get_image_offset().Eric Anholt2012-11-1910-44/+26
| | | | | | | We were always passing 0 for one of the two fields, and the code just used whichever one wasn't 0. Reviewed-by: Chad Versace <[email protected]>
* i965: Add some checks for array textures in unsupported paths.Eric Anholt2012-11-193-0/+14
| | | | | | | I noticed these in the next patch where these paths were using the Face of a teximage but didn't have array handling. Reviewed-by: Chad Versace <[email protected]>
* i965: Add a little bit more debug info for validate blits.Eric Anholt2012-11-191-1/+3
| | | | | | The kind of data you're copying is definitely an interesting variable. Reviewed-by: Chad Versace <[email protected]>
* intel: Remove dead function prototype.Eric Anholt2012-11-191-5/+0
| | | | Reviewed-by: Chad Versace <[email protected]>
* i965: Remove stale comment about wrapped_depth.Eric Anholt2012-11-191-14/+0
| | | | | | I removed that code almost a year ago. Reviewed-by: Chad Versace <[email protected]>
* i965/vs: Don't lose attribute type when converting ATTR to FIXED_HW_REG.Kenneth Graunke2012-11-191-0/+1
| | | | | | | | | | | | The new brw_reg always had type BRW_REGISTER_TYPE_F, rather than inheriting the original type of the ATTR file register. In the past, this hasn't been a problem since we only execute this code when fixing up GL_FIXED attributes, which always have float types. However, we'll soon be using it for ARB_vertex_type_10_10_10_2 support, which uses D and UD types. Reviewed-by: Eric Anholt <[email protected]>
* i965: Validate requested GLES context version in brwCreateContextChad Versace2012-11-191-33/+25
| | | | | | | | | | | | | | For GLES1 and GLES2, brwCreateContext neglected to validate the requested context version received from the DRI layer. If DRI requested an OpenGL ES2 context with version 3.9, we provided it one. Before this fix, the switch statement that validated the requested GL context flavor was an ugly #ifdef copy-paste mess. Instead of reproducing the copy-past-mess for GLES1 and GLES2, I first refactored it. Now the switch statement is readable. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Chad Versace <[email protected]>
* i965/fs: Properly patch special values during VGRF compaction.Kenneth Graunke2012-11-171-0/+27
| | | | | | | | | | | | | | | | | | | | In addition to registers used by instructions, fs_visitor maintains direct references to certain "special" values used for inputs/outputs. When I added VGRF compaction, I overlooked these, believing that these direct references weren't used once instructions were generated. That was wrong. For example, pixel_x/y are used in virtual_grf_interferes(), which is called by optimization passes and register allocation. This patch treats all of them as used and patches them after compacting. While it's not strictly necessary to patch all of them (as some aren't used after emitting code), it seems safer to simply fix them all. Fixes oglconform's textureswizzle/advanced.shader.targets, piglit's glsl-fs-lots-of-tex, and glean's texCombine on pre-Gen6 hardware. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=56790 Reviewed-by: Eric Anholt <[email protected]>
* i965/gen4: Respect the VERTEX_PROGRAM_TWO_SIDE vertex program/shader flag.Eric Anholt2012-11-171-3/+4
| | | | | | Fixes piglit "vertex-program-two-side enabled front back" and 4 others. Reviewed-by: Kenneth Graunke <[email protected]>
* intel: Use core mesa support for determining lastLevel.Eric Anholt2012-11-171-3/+3
| | | | | | | We had similar issues with using depth in determining the lastLevel of array textures. Reviewed-by: Chad Versace <[email protected]>
* i965/fs: Unify the param pointer allocation for FP/non-FP.Eric Anholt2012-11-171-13/+7
| | | | | | | Now that we're using the new backend, we may actually put things into push constants if you have too many uniform values uploaded. Also, correctly account for texture rectangle params and drop the old special case for the 0.0/1.0 params from the old backend.
* Remove OpenVMS supportMatt Turner2012-11-169-165/+0
| | | | | | | | | | Not maintained since 2008. Doubtful that it's worked in quite a while. Also see commit 32ac8cb05 which removed VMS stuff from Makefile in 2009. Cc: Jouk Jansen <[email protected]> Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Andreas Boll <[email protected]>
* i965/fs: Don't calculate_live_intervals() in opt_algebraic().Kenneth Graunke2012-11-151-2/+0
| | | | | | There's no point: opt_algebraic() doesn't use any liveness information. Reviewed-by: Eric Anholt <[email protected]>
* i965: Remove duplicate brw_opcodes table in favor of opcode_descs.Kenneth Graunke2012-11-154-65/+4
| | | | | | | | brw_optimize.c's brw_opcodes table was a copy of brw_disasm.c's opcode_descs table, but with an additional field: is_arith. Now that I've deleted that, the two are identical. Keep the one in brw_disasm.c. Reviewed-by: Eric Anholt <[email protected]>