summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* vega: Use pipe_context::blit instead of util_blit_pixels_tex.José Fonseca2013-09-182-20/+26
| | | | | | | | Only compile-tested but it seems straightforward. Reviewed-by: Zack Rusin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* i965: Rename brw_{fs,vec4}_emit.cpp to brw_{fs,vec4}_generator.cpp.Kenneth Graunke2013-09-183-4/+4
| | | | | | | | | | | | | | | | | | The previous names were really confusing to talk about: - brw_fs_visitor() contained methods named emit_whatever(). - brw_fs_generator() contained methods named generate_whatever(), but lived in brw_fs_emit.cpp. So when someone said "the emit layer", or "emit code", we weren't sure whether they meant the visitor's emit() functions or the generator in brw_fs_emit.cpp. By renaming these files, the method names, class names, and file names all match, which is much less confusing. Signed-off-by: Kenneth Graunke <[email protected]> Acked-by: Paul Berry <[email protected]> Acked-by: Eric Anholt <[email protected]>
* glsl: Correctly validate fma()'s types.Matt Turner2013-09-171-0/+6
| | | | | | lrp() can take a scalar as a third argument, and fma() cannot. Reviewed-by: Kenneth Graunke <[email protected]>
* glsl: Add frexp signatures and implementation.Matt Turner2013-09-171-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I initially implemented frexp() as an IR opcode with a lowering pass, but since it returns a value and has an out-parameter, it would break assumptions our optimization passes make about ir_expressions being pure (i.e., having no side effects). For example, if opt_tree_grafting encounters this code: uniform float u; void main() { int exp; float f = frexp(u, out exp); float g = float(exp)/256.0; float h = float(exp) + 1.0; gl_FragColor = vec4(f, g, h, g + h); } it may try to optimize it to this: uniform float u; void main() { int exp; float g = float(exp)/256.0; float h = float(exp) + 1.0; gl_FragColor = vec4(frexp(u, out exp), g, h, g + h); } Some hardware has an instruction which performs frexp(), but we would need some other compiler infrastructure to be able to generate it, such as an intrinsics system that would allow backends to emit specific code for particular bits of IR. Reviewed-by: Paul Berry <[email protected]>
* i965: Lower ldexp.Matt Turner2013-09-171-1/+2
| | | | | v2: Drop frexp lowering. Reviewed-by: Paul Berry <[email protected]>
* glsl: Add ldexp_to_arith lowering pass.Matt Turner2013-09-172-0/+129
| | | | Reviewed-by: Paul Berry <[email protected]>
* glsl: Allow vectors to be created from ir_constant().Matt Turner2013-09-173-29/+41
| | | | | | | | Note the parameter name change in the int version of ir_constant, to avoid the conflict with the loop iterator. v2: Make analogous change to builtin_builder::imm(). Reviewed-by: Paul Berry <[email protected]>
* glsl: Add support for ldexp.Matt Turner2013-09-1710-0/+51
| | | | | v2: Drop frexp. Rebase on builtins rewrite. Reviewed-by: Paul Berry <[email protected]>
* i965: Add some missing bits to {mesa,brw,cache}_bits[].Paul Berry2013-09-172-0/+14
| | | | | | | | | | | | | These data structures are used for debug output, so it wasn't hurting anything that there were missing bits. But it's good to keep things up to date. This patch also adds static asserts so that the {brw,cache}_bits[] arrays are the proper size, so that we don't forget to add to them in the future. Unfortunately there's no convenient way to assert that mesa_bits[] is the proper size. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: Implement basic gl_PrimitiveIDIn functionality.Paul Berry2013-09-174-0/+11
| | | | | | | | | | | | | | | | | If the geometry shader refers to the built-in variable gl_PrimitiveIDIn, we need to set a bit in 3DSTATE_GS to tell the hardware to dispatch primitive ID to r1, and we need to leave room for it when allocating registers. Note: this feature doesn't yet work properly when software primitive restart is in use (the primitive ID counter will incorrectly reset with each primitive restart, since software primitive restart works by performing multiple draw calls). I plan to address that in a future patch series. Fixes piglit test "spec/glsl-1.50/execution/geometry/primitive-id-in". Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: New gs primitive types are supported by HW primitive restart.Paul Berry2013-09-171-0/+4
| | | | | | | | | | When we previously implemented primitive restart, we didn't add cases to brw_primitive_restart.c's can_cut_index_handle_prims() for the primitive types that are introduced with geometry shaders. It turns out that all of the new primitive types are supported by hardware primitive restart. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: Add new primitive types.Paul Berry2013-09-172-3/+7
| | | | | | | | As part of its support for geometry shaders, GL 3.2 introduces four new primitive types: GL_LINES_ADJACENCY, GL_LINE_STRIP_ADJACENCY, GL_TRIANGLES_ADJACENCY, and GL_TRIANGLE_STRIP_ADJACENCY. Reviewed-by: Kenneth Graunke <[email protected]>
* gallivm: some bits of seamless cube filtering implementationRoland Scheidegger2013-09-183-14/+29
| | | | | | | | | | | | Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a correct implementation for nearest filtering, and it's way better than using repeat wrap for instance for linear filtering (though obviously this doesn't actually do seamless filtering). v2: fix s/t wrap not r/s... Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Jose Fonseca <[email protected]>
* i965: Remove MIPLAYOUT_BELOW from Gen4-6 constant buffer surface state.Kenneth Graunke2013-09-171-1/+0
| | | | | | | | | | Specifying a miptree layout makes no sense for constant buffers. This has no functional change since BRW_SURFACE_MIPMAPLAYOUT_BELOW is just a #define for 0. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Use gen7_upload_constant_state for 3DSTATE_CONSTANT_PS as well.Kenneth Graunke2013-09-161-27/+1
| | | | | | | Now we use gen7_upload_constant_state() for all three shader stages. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Set brw_stage_state::push_const_size for PS constants.Kenneth Graunke2013-09-161-1/+6
| | | | | | | | | This paves the way for using gen7_upload_constant_state for PS data. The formula is copied from gen7_wm_state.c. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965: Introduce a prog_data temporary in gen6_upload_wm_push_constants.Kenneth Graunke2013-09-161-8/+8
| | | | | | | This saves a bit of typing and shortens a few lines. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Paul Berry <[email protected]>
* i965/gen6+: Support 128 varying components.Paul Berry2013-09-161-0/+3
| | | | | | | | | | | | | | | | GL 3.2 requires us to support 128 varying components for geometry shader outputs and fragment shader inputs, and 64 varying components otherwise. But there's no hardware limitation that restricts us to 64 varying components, and core Mesa doesn't currently allow different stages to have different maximum values, so just go ahead and enable 128 varying components for all stages. This gets us better test coverage anyway. Even though we are only working on GL 3.2 support for gen7 right now, gen6 also supports 128 varying components, so go ahead and switch it on there too. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/ff_gs: Generate URB writes using a loop.Paul Berry2013-09-161-23/+38
| | | | | | | | | | | | | Previously we only ever did 1 URB write, since the maximum number of varyings we support is small enough to fit in 1 URB write (when using BRW_URB_SWIZZLE_NONE, which is what the pre-Gen7 GS always uses). But we're about to increase the number of varying components we support from 64 to 128. With 128 varyings, the most URB writes we'll have to do is 2, but it's just as easy to write a general-purpose loop. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6: Fix assertions on VS/GS URB size.Paul Berry2013-09-161-2/+2
| | | | | | | | | The "{VS,GS} URB Entry Allocation Size" fields of 3DSTATE_URB allow values in the range 0-4, but they are U8-1 fields, so the range of possible allocation sizes is 1-5. We were erroneously prohibiting a size of 5. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Generate URB writes using a loop.Paul Berry2013-09-161-31/+21
| | | | | | | | | Previously we only ever did 1 or 2 URB writes, since the maximum number of varyings we support is small enough to fit in 2 URB writes. But GL 3.2 requires the geometry shader to support 128 output varying components, and this could require up to 3 URB writes. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: When >64 input components, order them to match prev pipeline stage.Paul Berry2013-09-162-7/+45
| | | | | | | | | | | | | Since the SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots, we can't arrange the fragment shader inputs in an arbitrary order if there are more than 16 input varying slots in use. We need to make sure that slots 16-31 match the corresponding outputs of the previous pipeline stage. The easiest way to accomplish this is to just make all varying slots match up with the previous pipeline stage. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Simplify computation of key.input_slots_valid during precompile.Paul Berry2013-09-161-11/+1
| | | | | | | | | | | The for loop was rather silly. In addition to checking brw->gen < 6 on each loop iteration, it took pains to exclude bits from fp->Base.InputsRead that don't correspond to fragment shader inputs. But those bits would never have been set in the first place, since the only bits that are ever set in fp->Base.InputsRead are fragment shader inputs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gs: Stop storing an input VUE map in the GS program key.Paul Berry2013-09-163-5/+8
| | | | | | | | | | | | | Now that the vertex shader output VUE map is determined solely by a 64-bit bitfield, we don't have to store it in its entirety in the geometry shader program key; instead, we can just store the bitfield, and let the geometry shader infer the VUE map at compile time. This dramatically reduces the size of the geometry shader program key, which we want to keep small since it gets recomputed whenever the active program changes. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/gen6+: Remove VUE map dependency on userclip_active.Paul Berry2013-09-163-17/+26
| | | | | | | | | | | | | | | | | | | | | | | | Previously, on Gen6+, we laid out the vertex (or geometry) shader VUE map differently depending whether user clipping was active. If it was active, we put the clip distances in slots 2 and 3 (where the clipper expects them); if it was inactive, we assigned them in the order of the gl_varying_slot enum. This made for unnecessary recompiles, since turning clipping on/off for a shader that used gl_ClipDistance might rearrange the varyings. It also required extra bookkeeping, since it required the user clipping flag to be provided to brw_compute_vue_map() as a parameter. With this patch, we always put clip distances at in slots 2 and 3 if they are written to. do_vs_prog() and do_gs_prog() are responsible for ensuring that clip distances are written to when user clipping is enabled (as do_vs_prog() previously did for gen4-5). This makes the only input to brw_compute_vue_map() a bitfield of which varyings the shader writes to, a fact that we'll take advantage of in forthcoming patches. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Stop wasting input attribute space on gl_FragCoord and gl_FrontFacing.Paul Berry2013-09-163-9/+11
| | | | | | | | | | | | | | | | | | Previously, if a fragment shader accessed gl_FragCoord or gl_FrontFacing, we would assign them their own slots in the fragment shader input attribute array, using up space that could be made available to real varyings. This was not strictly necessary (since these values are not true varyings, and are instead computed from other data available in the FS payload). But we had to do it anyway because the SF/SBE setup code assumed that every 1 bit in the gl_program::InputsRead bitfield corresponded to a genuine varying variable. Now that the SF/SBE code consults brw_wm_prog_data and only sets up the attributes that the fragment shader actually needs, we don't have to do this anymore. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/sf: Consult brw_wm_prog_data when setting up SF/SBE state.Paul Berry2013-09-162-23/+35
| | | | | | | | | | | | | | | | | | | | Previously, the SF/SBE setup code delivered varying inputs to the FS in the order in which they appear in the gl_program::InputsRead bitfield, since that's what the FS expects. When we add support for more than 64 varying components, this will no longer always be the case, because the Gen6+ SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots. So, when there are more than 16 vec4's worth of varying inputs, the FS will have to adjust the order its input varyings in order to partially match the order of outputs from the geometry or vertex shader. To allow extra flexibility in the ordering of FS varyings, this patch causes the SF/SBE to deliver varying inputs to the FS in exactly the order that the FS requests, by consulting brw_wm_prog_data::urb_setup and brw_wm_prog_data::num_varying_inputs. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/sf: Consolidate common code for setting up gen6-7 attribute overrides.Paul Berry2013-09-163-129/+97
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/sf: Use BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoded values.Paul Berry2013-09-164-4/+12
| | | | | | | | | We always program the SF unit to start reading the vertex URB entry at offset 1. In upcoming patches, we'll be adding FS code that relies on this. So consistently use the constant BRW_SF_URB_ENTRY_READ_OFFSET rather than hardcoding a 1. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Consult brw_wm_prog_data::num_varying_inputs when setting up WM state.Paul Berry2013-09-162-3/+5
| | | | | | | | | | | | | | | | | | | | | | | Previously, we assumed that the number of varying inputs consumed by the fragment shader was equal to the number of bits set in gl_program::InputsRead. However, we'll soon be making two changes that will cause that not to be true: - We'll stop wasting varying input space for gl_FragCoord and gl_FrontFacing, which aren't varyings. - For fragment shaders that have more than 16 varying inputs, we'll adjust the layout of the inputs to account for the fact that the SF/SBE pipeline stage can't reorder inputs beyond the first 16; if there are GS outputs that the FS doens't use (or vice versa) this may cause the number of FS varying inputs to change. So, instead of trying to guess the number of FS inputs from gl_program::InputsRead, simply read it from brw_wm_prog_data:num_varying_inputs, which is guaranteed to be correct since it's populated by fs_visitor::calculate_urb_setup(). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Change brw_wm_prog_data::urb_read_length to num_varying_inputs.Paul Berry2013-09-163-5/+7
| | | | | | | | | | | | | | | | | | On gen4-5, the FS stage reads varying inputs from URB entries that were output by the SF thread, where each register stores the interpolation setup for two components of a vec4, therefore the FS urb_read_length is twice the number of FS input varyings. On gen6+, varying inputs are directly deposited in the FS payload by the SF/SBE fixed function logic, so urb_read_length is irrelevant. However, in future patches, it will be nice to be able to consult brw_wm_prog_data to determine how many varying inputs the FS expects (rather than inferring it from gl_program::InputsRead). So instead of storing urb_read_length, we simply store num_varying_inputs in brw_wm_prog_data. On gen4-5, we multiply this by 2 to recover the URB read length. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Expose "urb_setup" as part of brw_wm_prog_data.Paul Berry2013-09-164-8/+14
| | | | | | | | | | | | | | | | At the moment, for Gen6+, the FS assumes that all varying inputs are delivered to it in the order in which they appear in the gl_program::InputsRead bitfield, and the SF/SBE setup code ensures that they are delivered in this order. When we add support for more than 64 varying components, this will no longer always be possible, because the Gen6+ SF/SBE stage is only capable of performing arbitrary reorderings of 16 varying slots. To allow extra flexibility in the ordering of FS varyings, this patch causes the FS to advertise exactly what ordering it expects. Reviewed-by: Kenneth Graunke <[email protected]>
* ilo: make ilo_bind_sampler_states return voidChia-I Wu2013-09-171-16/+25
| | | | | So that it can be hooked up pipe_context::bind_sampler_states that is currently living on another branch.
* glsl/tests: Update .gitignore for new unit test.Kenneth Graunke2013-09-161-0/+1
| | | | | | I rarely run 'git status', so I failed to notice this was missing. Signed-off-by: Kenneth Graunke <[email protected]>
* glsl/tests: Add a test for properties of sampler types.Kenneth Graunke2013-09-152-0/+114
| | | | | | | | | | | | For each sampler type, this tests that: - The base type is GLSL_TYPE_SAMPLER. - The dimensionality is set correctly. - The returned data type is correct. - The sampler_array and sampler_shadow flags are set correctly. - sampler_coordinate_components() returns the correct value. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Ian Romanick <[email protected]>
* st/mesa: don't dereference stObj->pt if NULLDave Airlie2013-09-161-1/+1
| | | | | | | | | | | It seems a user app can get us into this state, I trigger the fail running fbo-maxsize inside virgl, it fails to create the backing storage for the texture object, but then segfaults here when it should fail the completeness test. Cc: "9.2" <[email protected]> Reviewed-by: Brian Paul <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* nouveau: fix regression since float comparison instructions (v2)Dave Airlie2013-09-166-14/+14
| | | | | | | | | | Fix the return type and allow src and dst types for comparison to be separate, this at least fixes the two test cases I've written. v2: drop the u32->s32 change Acked-by: Christoph Bumiller <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* vdpau/decode: Check max width and max height.Rico Schüller2013-09-151-0/+20
| | | | Reviewed-by: Christian König <[email protected]>
* freedreno: PIPE_TRANSFER_DISCARD_WHOLE_RESOURCERob Clark2013-09-141-17/+43
| | | | | | | When the old contents do not need to be preserved, it is faster to create a new backing bo rather than stall. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: fix VFD_INDEX_MAX overflowRob Clark2013-09-141-1/+1
| | | | | | | max_index may be 0xffffffff. The hardware does not need 1 + max_index (although it does not hurt unless max_index wraps around to zero). Signed-off-by: Rob Clark <[email protected]>
* freedreno: add debug option to disable GMEM bypassRob Clark2013-09-143-1/+3
| | | | | | Useful for debugging. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: handle front_ccwRob Clark2013-09-145-17/+14
| | | | | | Used by supertuxkart. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: stencil fixesRob Clark2013-09-149-7/+30
| | | | | | | | For mem->gmem we don't sample depth/stencil as it's native type. So we need to setup the swizzle state for the sampler based on the format used for sampling. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: alpha-testRob Clark2013-09-147-7/+29
| | | | | | | Needed by some games, like etuxracer and supertuxkart which use alpha test rather than blending, to handle texture transparency. Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx/compiler: implement SUBRob Clark2013-09-141-3/+11
| | | | Signed-off-by: Rob Clark <[email protected]>
* freedreno/a3xx: use INDIRECT state load for shadersRob Clark2013-09-143-8/+29
| | | | | | | | | With a debug option to force DIRECT (mainly to make it easier for capturing cmdstream dumps). Using INDIRECT for large shaders at least makes a noticable reduction in CPU load, which helps for CPU limited games. Signed-off-by: Rob Clark <[email protected]>
* freedreno: avoid stalling at ringbuffer wraparoundRob Clark2013-09-142-22/+41
| | | | | | | | | Because of how the tiling works, we can't really flush at arbitrary points very easily. So wraparound is handled by resetting to top of ringbuffer. Previously this would stall until current rendering is complete. Instead cycle through multiple ringbuffers to avoid a stall. Signed-off-by: Rob Clark <[email protected]>
* freedreno: emit markers to scratch registersRob Clark2013-09-144-0/+33
| | | | | | | | | Emit markers by writing to scratch registers in order to "triangulate" gpu lockup position from post-mortem register dump. By comparing register values in post-mortem dump to command-stream, it is possible to narrow down which DRAW_INDX caused the lockup. Signed-off-by: Rob Clark <[email protected]>
* freedreno: split out WFI helperRob Clark2013-09-145-10/+12
| | | | | | Mostly just to give an easy debug/instrumentation point. Signed-off-by: Rob Clark <[email protected]>
* freedreno: fd_draw helperRob Clark2013-09-146-52/+55
| | | | | | | Have a single helper that all draws come through.. mainly for a convenient debug and instrumentation point. Signed-off-by: Rob Clark <[email protected]>