summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ilo: update SF_CLIP_VIEWPORT for Gen8Chia-I Wu2015-02-123-14/+40
|
* ilo: update streamout related functions for Gen8Chia-I Wu2015-02-123-44/+78
|
* ilo: update 3DSTATE_{DS,HS,GS} for Gen8Chia-I Wu2015-02-121-8/+24
|
* ilo: update 3DSTATE_CONSTANT_x for Gen8Chia-I Wu2015-02-121-3/+16
|
* ilo: update 3DSTATE_URB_x for Gen8Chia-I Wu2015-02-121-1/+8
|
* ilo: update 3DSTATE_PUSH_CONSTANT_ALLOC_x for Gen8Chia-I Wu2015-02-121-7/+8
|
* ilo: update render engine common helpers for Gen8Chia-I Wu2015-02-124-34/+91
|
* ilo: update BLT helpers for Gen8Chia-I Wu2015-02-121-25/+58
|
* ilo: update MI helpers for Gen8Chia-I Wu2015-02-122-30/+59
|
* ilo: add functions for Gen8 relocsChia-I Wu2015-02-121-6/+39
| | | | | Extend ilo_builder_writer_reloc() for Gen8 memory addressing. Add new wrappers, ilo_builder_surface_reloc64(() and ilo_builder_batch_reloc64().
* ilo: update the toy compiler for Gen8Chia-I Wu2015-02-125-91/+501
| | | | Based on what we know from the classic driver.
* ilo: update genhw headersChia-I Wu2015-02-1219-529/+1704
| | | | | | | Accumulated changes for various renames and additions, including Gen8 definitions. Some of the dynamic state __SIZE no longer means the size of an element, but the size of an array of elements. The changes can be seen in ilo_render_dynamic.c.
* ilo: clean up ilo_gpe_init_dsa()Chia-I Wu2015-02-121-54/+82
| | | | | Add dsa_get_stencil_enable_gen6(), dsa_get_depth_enable_gen6(), and dsa_get_alpha_enable_gen6() to be called from ilo_gpe_init_dsa().
* ilo: clean up ilo_gpe_init_blend()Chia-I Wu2015-02-123-87/+106
| | | | Make ilo_blend_state more space efficient and forward-looking.
* ilo: clean up sample patternsChia-I Wu2015-02-125-68/+71
| | | | | Use signed int for sample positions and add helpers to access them. Call them patterns instead of positions.
* glsl: Optimize (f2i(trunc x)) into (f2i x).Matt Turner2015-02-111-0/+9
| | | | | | total instructions in shared programs: 5950326 -> 5949286 (-0.02%) instructions in affected programs: 88264 -> 87224 (-1.18%) helped: 692
* glsl: Optimize round-half-up pattern.Matt Turner2015-02-111-0/+33
| | | | | Hurts some Psychonauts shaders, but after the next patch (which this enables) they're fewer instructions than before this patch.
* glsl: Add trunc() to ir_builder.Matt Turner2015-02-112-0/+6
|
* i965: Add LINTERP/CINTERP to can_do_cmod().Matt Turner2015-02-111-0/+2
| | | | | | | | | | | | LINTERP is implemented as a PLN instruction or a LINE+MAC. PLN and MAC can do conditional mod. CINTERP is just a MOV. total instructions in shared programs: 5952103 -> 5950284 (-0.03%) instructions in affected programs: 324573 -> 322754 (-0.56%) helped: 1819 We lose the SIMD16 in one Unigine Heaven shader which appears six times in shader-db.
* program: Remove _mesa_nop_vertex_program/_mesa_nop_fragment_program.Matt Turner2015-02-112-97/+0
| | | | | | | | | | | | Dead since commit 284ce20901b0c2cfab1d952cc129b8f3cd068f12 Author: Eric Anholt <eric@anholt.net> Date: Fri Aug 20 10:52:14 2010 -0700 Remove remnants of the old glsl compiler. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir: Recognize open-coded fmin/fmax.Matt Turner2015-02-111-0/+2
| | | | | | | | | And unfortunately other shaders do the same thing but with >=/<= which we can't apply this optimization to because of NaNs. instructions in affected programs: 23309 -> 22938 (-1.59%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* nir: Add algebraic opt for int comparisons with identical operands.Eric Anholt2015-02-111-0/+9
| | | | | | | | | No change on shader-db on i965. v2: Reword the comment due to feedback from Erik Faye-Lund Reviewed-by: Connor Abbott <cwabbott0@gmail.com> (v1) Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> (v1)
* nir: Fix load_const comparisons for CSE.Eric Anholt2015-02-111-1/+1
| | | | | | | | | | | | | | | | We want the size of a float per component, not the size of a whole vec4. NIR instructions on i965: total instructions in shared programs: 1261937 -> 1261929 (-0.00%) instructions in affected programs: 114 -> 106 (-7.02%) Looking at one of these examples (tesseract), it's from vec4 load_consts for a MRT solid fill, which do get CSEed now that we don't memcmp off the end of the const value and into the SSA def. For the 1-component loads that are common in i965, we were only memcmping off into the rest of the usually zero-filled const_value. Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* i965/fs: Remove conditional mod when optimizing a SEL into a MOV.Matt Turner2015-02-111-0/+1
| | | | Missed in commit ca675b73, but got right in the companion commit 3c28b2c0.
* darwin: build fixJeremy Huddleston Sequoia2015-02-101-0/+5
| | | | | | | | | xfont.c:237:14: error: implicit declaration of function 'GetGLXDRIDrawable' is invalid in C99 [-Werror,-Wimplicit-function-declaration] glxdraw = GetGLXDRIDrawable(CC->currentDpy, CC->currentDrawable); ^ Fixes regression from 291be28476ea60c6fb1eb2a882e2e25def5d3735 Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
* darwin: build fixJeremy Huddleston Sequoia2015-02-101-0/+1
| | | | | | ../../../src/mesa/main/compiler.h:47:10: fatal error: 'util/macros.h' file not found Signed-off-by: Jeremy Huddleston Sequoia <jeremyhu@apple.com>
* glsl: Optimize 1/exp(x) into exp(-x).Matt Turner2015-02-101-0/+6
| | | | | | | | | | | | | Lots of shaders divide by exp2(...) which we turn into a multiplication by the reciprocal. We can avoid the reciprocal by simply negating exp2's argument. total instructions in shared programs: 5947154 -> 5946695 (-0.01%) instructions in affected programs: 118661 -> 118202 (-0.39%) helped: 380 Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
* nir: Remove casts from void*.Matt Turner2015-02-104-14/+13
| | | | Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* nir: Replace assert(0) with unreachable().Matt Turner2015-02-101-7/+7
| | | | Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
* nir: Remove unused has_indirect variable.Matt Turner2015-02-101-4/+0
| | | | Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
* i965/vec4: Emit MADs from (x + abs(y * z)).Matt Turner2015-02-101-3/+15
| | | | | | | | | | Same as commit 3654b6d4 to the fs backend. total instructions in shared programs: 5945788 -> 5945787 (-0.00%) instructions in affected programs: 36 -> 35 (-2.78%) helped: 1 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Emit MADs from (x + -(y * z)).Matt Turner2015-02-101-0/+12
| | | | | | | | | | | | | | Same as commit c4fab711 to the fs backend. total instructions in shared programs: 5945998 -> 5945788 (-0.00%) instructions in affected programs: 74665 -> 74455 (-0.28%) helped: 399 HURT: 180 It hurts some programs because we make no attempts in the vec4 backend to avoid MADs if they have constant (or vector uniform) arguments. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/skl: Implement WaDisable1DDepthStencilNeil Roberts2015-02-101-0/+12
| | | | | | | | | | | Skylake+ doesn't support setting a depth buffer to a 1D surface but it does allow pretending it's a 2D texture with a height of 1 instead. This fixes the GL_DEPTH_COMPONENT_* tests of the copyteximage piglit test (and also seems to avoid a subsequent GPU hang). Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89037 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/gen7-8: Implement glMemoryBarrier().Francisco Jerez2015-02-102-0/+41
| | | | Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Generalize the update_null_renderbuffer_surface vtbl hook to ↵Francisco Jerez2015-02-104-56/+55
| | | | | | | | | | | | | non-renderbuffers. Null surfaces are going to be useful to have something to point unbound image units to, as the ARB_shader_image_load_store extension requires us to behave deterministically in cases where some shader tries to access an unbound image unit: Invalid stores and atomics are supposed to be discarded and invalid loads are supposed to return zero, which is precisely what the null surface does. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Allocate binding table space for shader images.Francisco Jerez2015-02-102-0/+12
| | | | | | | v2: Bump the number of supported image uniforms to 32 (Ken). Reviewed-by: Paul Berry <stereotype441@gmail.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965: Don't tile 1D miptrees.Francisco Jerez2015-02-101-0/+7
| | | | | | | | | | | | It doesn't really improve locality of texture fetches, quite the opposite it's a waste of memory bandwidth and space due to tile alignment. v2: Check mt->logical_height0 instead of mt->target (Ken). Add short comment explaining why they shouldn't be tiled. Reviewed-by: Neil Roberts <neil@linux.intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
* i965/vec4: Don't set any dependency control bits for F32TO16 on Gen8.Francisco Jerez2015-02-101-0/+5
| | | | | | It's expanded to several instructions. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965: Handle negated unsigned immediate values in constant propagation.Francisco Jerez2015-02-103-19/+19
| | | | | | | | | Negation of UD/UW sources behaves the same as for D/W sources, taking the two's complement of the source, except for bitwise logical operations on Gen8 and up which take the one's complement. Fixes crash in a GLSL shader with subtraction of two unsigned values. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Take into account non-zero reg_offset during register allocation.Francisco Jerez2015-02-101-1/+3
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Add register classes up to MAX_VGRF_SIZE.Francisco Jerez2015-02-103-7/+9
| | | | | | | In preparation for some send from GRF instructions that will require larger payloads. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Init mlen for several send from GRF instructions.Francisco Jerez2015-02-103-5/+11
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Don't infer MRF dependencies for send from GRF instructions.Francisco Jerez2015-02-101-14/+18
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Fix the scheduler to take into account reads and writes of ↵Francisco Jerez2015-02-103-5/+29
| | | | | | | | multiple registers. v2: Avoid nested ternary operators in vec4_instruction::regs_read(). (Matt) Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Make vec4_visitor::implied_mrf_writes() return zero for sends ↵Francisco Jerez2015-02-101-1/+1
| | | | | | from GRF. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Pass dst register to the vec4_instruction constructor.Francisco Jerez2015-02-101-7/+5
| | | | | | So regs_written gets initialized with a sensible value. Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Initialize vec4_instruction::predicate and ::predicate_inverse.Francisco Jerez2015-02-101-0/+2
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/vec4: Implement equals() method for dst_reg too.Francisco Jerez2015-02-102-0/+18
| | | | Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Fix fs_inst::regs_written calculation for instructions with scalar dst.Francisco Jerez2015-02-101-1/+2
| | | | | | | | | | | Scalar registers are required to have zero stride, fix the regs_written calculation not to assume that the instruction writes zero registers in that case. v2: Rename CEILING() to DIV_ROUND_UP(). (Matt, Ken) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com>
* i965/fs: Fix stack allocation of fs_inst and stop stealing src array ↵Francisco Jerez2015-02-102-37/+39
| | | | | | | | | | provided on construction. Using 'ralloc*(this, ...)' is wrong if the object has automatic storage or was allocated through any other means. Use normal dynamic memory instead. Reviewed-by: Matt Turner <mattst88@gmail.com>