summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* freedreno/ir3: split float/int abs/negRob Clark2015-04-055-64/+213
| | | | | | | | | | | | Even though in the end, they map to the same bits, the backend will need to be able to differentiate float abs/neg vs integer abs/neg. Rather than making the backend figure it out based on instruction opcode (which when combined with mov/absneg instructions, can be awkward), just split out different flags for each so the frontend can signal it's intentions more clearly. Also, since (neg) for bitwise op's is actually a bitwise- not, split it out into bnot flag. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: add ir3 builder helpersRob Clark2015-04-053-4/+162
| | | | | | | | Add helpers for constructing SSA forms of instructions. Only partial cat5/cat6 coverage.. but we can add stuff as needed. Signed-off-by: Rob Clark <[email protected]>
* freedreno/ir3: fix sam argument order commentRob Clark2015-04-051-1/+1
| | | | Signed-off-by: Rob Clark <[email protected]>
* xa: support for drivers which use NIRRob Clark2015-04-053-0/+18
| | | | | | | | | | We need to pull in libnir.la and it's dependency libglsl_util.la. Also, _mesa_error_no_memory() must be defined. Fortunately with libnir.la (vs pulling in all of libglsl.la) we don't also need libstdc++. Signed-off-by: Rob Clark <[email protected]>
* build: add libnir.laRob Clark2015-04-051-1/+7
| | | | | | | | | | If we want to use NIR from state trackers that don't already pull in the whole of glsl (ie. anything other than mesa state tracker), we need a separate more minimal libnir. Possibly NIR should be better split out from glsl, but for now, generate a second smaller libnir.la for those who just want NIR but not all of glsl. Signed-off-by: Rob Clark <[email protected]>
* gallium/ttn: MOD is an integer instructionRob Clark2015-04-051-1/+1
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]
* gallium/ttn: add UMADRob Clark2015-04-051-1/+11
| | | | | Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Eric Anholt <[email protected]>
* nir: add lowering for idiv/udiv/umodRob Clark2015-04-053-0/+159
| | | | | | | | | | | | | | | | | Based on the algo from NV50LegalizeSSA::handleDIV() and handleMOD(). See also trans_idiv() in freedreno/ir3/ir3_compiler.c (which was an adaptation of the nv50 code from Ilia Mirkin). A python/numpy script which implements the same algorithm (and is possibly useful for debugging or analysis) can be found here: http://people.freedesktop.org/~robclark/div-lowering.py I've tested this on i965 hacked up to insert the idiv lowering pass, and on freedreno with NIR frontend. Signed-off-by: Rob Clark <[email protected]> Tested-by: Eric Anholt <[email protected]> (vc4)
* nir: fix typo for f2b/i2b/b2i expressions (v2)Rob Clark2015-04-051-3/+3
| | | | | | | | v2: discovered that i2b/b2i are also confused Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Connor Abbott <[email protected]>
* nir: add option to lower slt/sge/seq/sneRob Clark2015-04-052-0/+7
| | | | | | | | | | | | | | In freedreno these get implemented as the matching f* instruction plus a u2f to convert the result to float 1.0/0.0. But less lines of code to just let nir_opt_algebraic handle this for us, plus opens up some small window for other opt passes to improve (ie. if some shader ended up with both a flt and slt with same src args, for example). v2: use b2f rather than u2f Signed-off-by: Rob Clark <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: Remove unused variables left over from 107ae27e57d.Mathias Froehlich2015-04-051-4/+0
| | | | | Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* i965: Implement support for ARB_clip_control.Mathias Fröhlich2015-04-0511-16/+21
| | | | | | | | | | | | | Switch between the two clip space definitions already available in hardware. Update winding order dependent state according to the clip control state. This change did not introduce new piglit quick.test regressions on an Ivybridge Mobile and a GM45 Express chipset. Also it enables and passes the clip-control and clip-control-depth-precision tests on these two chipsets. Reviewed-by: Kenneth Graunke <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* mesa: Remove the _WindowMap from gl_viewport_attrib.Mathias Froehlich2015-04-055-81/+4
| | | | | | | | The _WindowMap can be dropped from gl_viewport_attrib now. Simplify gl_viewport_attrib handling where possible. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* tnl: Maintain the _WindowMap matrix in TNLcontext v2.Mathias Froehlich2015-04-054-9/+22
| | | | | | | | | | | | This is the only real user of _WindowMap which has the depth buffer scaling multiplied in. Maintain the _WindowMap of the one and only viewport inside TNLcontext. v2: Remove unneeded parentheses. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* radeon: Make use of _mesa_get_viewport_xform v2.Mathias Froehlich2015-04-052-16/+18
| | | | | | | | | | | | Instead of _WindowMap just use the translation and scale of the viewport transform directly. Thereby avoid dividing by _DepthMaxF again. v2: Change order of assignments. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* i965: Make use of _mesa_get_viewport_xform.Mathias Froehlich2015-04-054-32/+36
| | | | | | | | | Instead of _WindowMap just use the translation and scale of the viewport transform directly. Thereby avoid dividing by _DepthMaxF again. Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Mathias Froehlich <[email protected]>
* nv50: allocate more offset space for occlusion queriesIlia Mirkin2015-04-041-5/+5
| | | | | | | | | | | | | | | Commit 1a170980a09 started writing to q->data[4]/[5] but kept the per-query space at 16, which meant that in some cases we would write past the end of the buffer. Rotate by 32, like nvc0 does. This ensures that we always have 32 bytes in front of us, and the data writes will go within the allocated space. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89679 Signed-off-by: Ilia Mirkin <[email protected]> Tested-by: Nick Tenney <[email protected]> Reviewed-by: Samuel Pitoiset <[email protected]> Reviewed-by: Tobias Klausmann <[email protected]> Cc: "10.4 10.5" <[email protected]>
* nir/lower_samplers: Use the right memory context for realloc'ing tex sourcesJason Ekstrand2015-04-031-1/+1
| | | | | | | | As of da5ec2a, we allocate instruction sources out of the instruction itself. When we realloc the texture sources we need to use the right memory context or ralloc will get angry and assert-fail Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Use brw_nir_cubemap_normalize for NIR shadersJason Ekstrand2015-04-032-1/+5
| | | | Reviewed-by: Jordan Justen <[email protected]>
* nir: Add a cubemap normalizing passJason Ekstrand2015-04-033-0/+113
| | | | | | | | | | | | | | | This commit adds a pass to L1-normalize cube-map coordinates. Some hardware such as i965 requires that largest cube-map coordinate is +-1. We had a pass to perform this normalization in GLSL IR but we need it in NIR for cube maps on ARB programs to work correctly. Reviewed-by: Jordan Justen <[email protected]> v2 (Suggested by Eric): - Do a vector fabs and split into components later - Move to core NIR Reviewed-by: Eric Anholt <[email protected]>
* i965: Check the INTEL_USE_NIR environment variable once at context creationJason Ekstrand2015-04-033-4/+14
| | | | Reviewed-by: Jordan Justen <[email protected]>
* nir/from_ssa: Don't set reg->parent_instr for ssa_undef instructionsJason Ekstrand2015-04-031-4/+5
| | | | Reviewed-by: Jordan Justen <[email protected]>
* nir: Add a src_get_parent_instr functionJason Ekstrand2015-04-032-14/+12
| | | | Reviewed-by: Jordan Justen <[email protected]>
* i965: Use the tex projector lowering pass instead of hand-rolling it.Eric Anholt2015-04-031-10/+4
| | | | | | | | | | | | | This only impacts the ARB_fp path. We can't quite disable the GLSL-level lowering pass, because it needs to apply before brw_do_lower_unnormalized_offset(). total instructions in shared programs: 5667857 -> 5667847 (-0.00%) instructions in affected programs: 1114 -> 1104 (-0.90%) helped: 16 HURT: 6 Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add a lowering pass for texture projectors.Eric Anholt2015-04-033-0/+144
| | | | | | | Not much hardware wants them these days, and it might give us a chance to do CSE or algebraic at the NIR level. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add an interface to turn a nir_src into a nir_ssa_def.Eric Anholt2015-04-031-0/+19
| | | | | | | We use nir_ssa_defs for nir_builder args, so this takes a nir_src and makes one so it can be passed in. Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Add an interface for the builder to insert instructions before.Eric Anholt2015-04-031-4/+23
| | | | | | | | So far we'd only used nir_builder to build brand new programs. But if we're doing modifications to instructions (like in a lowering pass), then we want to generate new stuff before the instruction we're modifying. Reviewed-by: Jason Ekstrand <[email protected]>
* gallium: fix gcc compile errors when using _XOPEN_SOURCE=600 but not std=c99Jose Fonseca2015-04-031-1/+6
| | | | | | The fpclassify stuff either needs std=c99 or _XOPEN_SOURCE=600 passed to gcc, but when using the latter the lrint family of function will be defined too.
* i965: Rename do_<stage>_prog to brw_compile_<stage>_prog (and export)Carl Worth2015-04-029-32/+51
| | | | | | | | | | | | This is in preparation for these functions to be called from other files. This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split out per-stage dirty-bit checking into separate functionsCarl Worth2015-04-024-35/+59
| | | | | | | | | | | | The dirty-bit checking from each brw_upload_<stage>_prog function is split out into its a new brw_<stage>_state_dirty function. This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Split out brw_<stage>_populate_key into their own functionsCarl Worth2015-04-023-40/+64
| | | | | | | | | | | | | | This commit splits portions of the existing brw_upload_vs_prog and brw_upload_gs_prog function into new brw_vs_populate_key and brw_gs_populate_key functions. This follows the same style as is already present for all other stages, (see brw_wm_populate_key, etc.). This commit is intended to have no functional change. It exists in preparation for some upcoming code movement in preparation for the shader cache. Reviewed-by: Ian Romanick <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* nv50/ir: avoid folding immediates into imad operationsIlia Mirkin2015-04-021-1/+2
| | | | | | | | Commit 09ee907266 added logic to fold immediates into mad operations, but the emission code is only there for fmad. Only allow it on float types. Signed-off-by: Ilia Mirkin <[email protected]>
* nv50/ir: fix imad emission when dst == src2Ilia Mirkin2015-04-021-1/+1
| | | | | | | Commit fb63df22151f added 4-byte mad support, but only supported emission for floats. Disable it for ints for now. Signed-off-by: Ilia Mirkin <[email protected]>
* nir: Allocate nir_tex_instr::sources out of the instruction itself.Kenneth Graunke2015-04-021-1/+1
| | | | | | | | The lifetime of the sources array needs to be match the nir_tex_instr itself. So, allocate it using the instruction itself as the context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Allocate predecessor and dominance frontier sets from block itself.Kenneth Graunke2015-04-021-2/+2
| | | | | | | | These sets are part of the block, and their lifetime needs to match the block itself. So, allocate them using the block itself as the context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Allocate register fields out of the register itself.Kenneth Graunke2015-04-021-3/+3
| | | | | | | | | The lifetime of each register's use/def/if_use sets needs to match the register itself. So, allocate them using the register itself as the context. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Make nir_create_function() strdup the function name.Kenneth Graunke2015-04-021-1/+1
| | | | | | | | | | | | glsl_to_nir passes in the ir_function's name field; we were copying the pointer, but not duplicating the memory. We want to be able to free the linked GLSL IR program after translating to NIR, so we'll need to create a copy of the function name that the NIR shader actually owns. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Free dead variables when removing them.Kenneth Graunke2015-04-021-1/+3
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir: Combine remove_dead_local_vars() and remove_dead_global_vars().Kenneth Graunke2015-04-021-14/+4
| | | | | | | We can just pass a pointer to the list of variables, and reuse the code. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* ralloc: Implement a new ralloc_adopt() API.Kenneth Graunke2015-04-022-0/+33
| | | | | | | | | | | | | | | | | | | | | ralloc_adopt() reparents all children from one context to another. Conceptually, ralloc_adopt(new_ctx, old_ctx) behaves like this pseudocode: foreach child of old_ctx: ralloc_steal(new_ctx, child) However, ralloc provides no way to iterate over a memory context's children, and ralloc_adopt does this task more efficiently anyway. One potential use of this is to implement a memory-sweeper pass: first, steal all of a context's memory to a temporary context. Then, walk over anything that should be kept, and ralloc_steal it back to the original context. Finally, free the temporary context. This works when the context is something that can't be freed (i.e. an important structure). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]>
* nir/opt_peephole_ffma: Fix a couple typos in a commentJason Ekstrand2015-04-021-2/+2
| | | | Acked-by: Matt Turner <[email protected]>
* mesa: add ARB_depth_buffer_float to ES3.0 required extension listIlia Mirkin2015-04-021-0/+1
| | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* vc4: Add support for nir_iabs.Eric Anholt2015-04-021-0/+5
| | | | | Tested using the GLSL 1.30 tests for integer abs(). Not currently used, but it was one of the new opcodes used by robclark's idiv lowering.
* i965/generator: Get rid of the ! in the unreachable statementJason Ekstrand2015-04-021-1/+1
| | | | Reviewed-by: Mark Janes <[email protected]>
* nir/print: Correctly print swizzles for explicitly sized alu sourcesJason Ekstrand2015-04-021-12/+12
| | | | Reviewed-by: Connor Abbott <[email protected]>
* freedreno/a3xx: add MRT supportIlia Mirkin2015-04-029-139/+221
| | | | | | | The hardware only supports 4 MRTs. It should be possible to emulate support for 8, but doesn't seem worth the trouble. Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: convert blit program to array for each number of rtsIlia Mirkin2015-04-0212-21/+45
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add support for laying out MRTs in gmemIlia Mirkin2015-04-022-16/+43
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno: add core infrastructure support for MRTsIlia Mirkin2015-04-024-8/+14
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS propertyIlia Mirkin2015-04-022-1/+10
| | | | | | | This will enable the driver to tell which regids to link up to which MRT outputs. Signed-off-by: Ilia Mirkin <[email protected]>