summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* intel/fs: Protect opt_algebraic from OOB BROADCAST indicesJason Ekstrand2017-11-071-2/+11
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/fs/nir: Don't stomp 64-bit values to D in get_nir_srcJason Ekstrand2017-11-071-13/+24
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/fs/nir: Minor refactor of store_outputJason Ekstrand2017-11-071-4/+3
| | | | | | | | | | | Stop retyping the output of shuffle_64bit_data_for_32bit_write. It's always BRW_REGISTER_TYPE_D which is perfectly fine for writing out. Also, when we change get_nir_src to return something with a 64-bit type for 64-bit values, the retyping will not be at all what we want. Also, retyping the output based on src.type before we whack it back to 32 bits is a problem because the output is always 32 bits. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/fs: Return a fs_reg from shuffle_64bit_data_for_32bit_writeJason Ekstrand2017-11-072-29/+12
| | | | | | | | | All callers of this function allocate a fs_reg expressly to pass into it. It's much easier if we just let the helper allocate the register. While we're here, we switch it to doing the MOVs with an integer type so that we don't accidentally canonicalize floats on half of a double. Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/fs/nir: Simplify 64-bit store_outputJason Ekstrand2017-11-071-19/+6
| | | | | | | | | The swizzles weren't doing any good because swiz is just XYZW. Also, we were emitting an extra set of MOVs because shuffle_64bit_data_for_32bit already does a MOV for us. Finally, the temporary was only ever used inside the inner loop so there's no need for it to actually be an array. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Use the original destination region for int MUL loweringJason Ekstrand2017-11-071-7/+9
| | | | | | | | | Some hardware (CHV, BXT) have special restrictions on register regions when doing integer multiplication. We want to respect those when we lower to DxW multiplication. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/fs: Fix integer multiplication lowering for src/dst hazardsJason Ekstrand2017-11-071-2/+8
| | | | | Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/fs: Fix MOV_INDIRECT for 64-bit values on little-coreJason Ekstrand2017-11-071-36/+39
| | | | | | | | | The same workaround we need for 64-bit values on little core also takes care of the Ivy Bridge problem and does so a bit more efficiently so we can drop that code while we're here. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/eu: Fix broadcast instruction for 64-bit values on little-coreJason Ekstrand2017-11-071-2/+24
| | | | | | | | We're not using broadcast for any 32-bit types right now since we mostly use it for emit_uniformize on 32-bit buffer indices. However, SPIR-V subgroups are going to need it for 64-bit so let's make it work. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/eu/reg: Add a subscript() helperJason Ekstrand2017-11-071-0/+16
| | | | | | | This is similar to the identically named fs_reg helper. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/eu: Just modify the offset in brw_broadcastJason Ekstrand2017-11-071-4/+5
| | | | | | | This means we have to drop const from a variable but it also means that 100% of the code which deals with the offset limit is in one place. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/compiler: Add some restrictions to MOV_INDIRECT and BROADCASTJason Ekstrand2017-11-073-0/+20
| | | | | | | These restrictions effectively already existed due to the way we use indirect sources but weren't being directly enforced. Reviewed-by: Iago Toral Quiroga <[email protected]>
* intel/fs: Use a pair of 1-wide MOVs instead of SEL for any/allJason Ekstrand2017-11-071-9/+33
| | | | | | | | | | | For some reason, the any/all predicates don't work properly with SIMD32. In particular, it appears that a SEL with a QtrCtrl of 2H doesn't read the correct subset of the flag register and you end up getting garbage in the second half. Work around this by using a pair of 1-wide MOVs and scattering the result. This fixes the any/all instructions for SIMD32. Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* intel/fs: Use an explicit D type for vote any/all/eq intrinsicsJason Ekstrand2017-11-071-0/+6
| | | | | | | | | | | | | The any/all intrinsics return a boolean value so D or UD is the correct type. Unfortunately, get_nir_dest has the annoying behavior of returnning a float type by default. This causes format conversion which gives us -1.0f or 0.0f in the register. If the consumer of the result does an integer comparison to zero, it will give you the right boolean value but if we do something more clever based on the 0/~0 assumption for booleans, this will give the wrong value. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/fs: Don't stomp f0.1 in SIMD16 ballotJason Ekstrand2017-11-071-2/+9
| | | | | | | | | In fragment shaders f0.1 is used for discards so doing ballot after a discard can potentially cause the discard to not happen. However, we don't support SIMD32 fragment shaders yet so this isn't a problem. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/fs: Use ANY/ALL32 predicates in SIMD32Jason Ekstrand2017-11-071-12/+30
| | | | | | | | | | | | We have ANY/ALL32 predicates and, for the most part, they work just fine. (See the next commit for more details.) Also, due to the way that flag registers are handled in hardware, instruction splitting is able to split the CMP correctly. Specifically, that hardware looks at the execution group and knows to shift it's flag usage up correctly so a 2H instruction will write to f0.1 instead of f0.0. Reviewed-by: Matt Turner <[email protected]> Cc: [email protected]
* intel/fs: Be more explicit about our placement of [un]zipJason Ekstrand2017-11-071-3/+17
| | | | | | | | | | | | | Before, we were careful to place the zip after the last of the split instructions but did unzip on-demand. This changes things so that the unzips go before all of the split instructions and the unzip comes explicitly after all the split instructions. As a side-effect of this change, we now emit the split instruction from highest SIMD group to lowest instead of low to high. We could have kept the old behavior, but it shouldn't matter and this made the code easier. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/fs: Pass builders instead of blocks into emit_[un]zipJason Ekstrand2017-11-071-26/+35
| | | | | | | | | This makes it far more explicit where we're inserting the instructions rather than the magic "before and after" stuff that the emit_[un]zip helpers did based on block and inst. Reviewed-by: Iago Toral Quiroga <[email protected]> Cc: [email protected]
* intel/fs: Use a pure vertical stride for large register stridesJason Ekstrand2017-11-071-3/+13
| | | | | | | | | | | | | | Register strides higher than 4 are uncommon but they can happen. For instance, if you have a 64-bit extract_u8 operation, we turn that into UB -> UQ MOV with a source stride of 8. Our previous calculation would try to generate a stride of <32;8,8>:ub which is invalid because the maximum horizontal stride is 4. To solve this problem, we instead use a stride of <8;1,0>. As noted in the comment, this does not work as a destination but that's ok as very few things actually generate that stride. Reviewed-by: Samuel Iglesias Gonsálvez <[email protected]> Cc: [email protected]
* broadcom/vc5: Skip emitting textures that aren't used.Eric Anholt2017-11-071-2/+4
| | | | | Fixes crashes when ARB_fp uses texture[1] but not 0, as in piglit's fp-fragment-position.
* broadcom/vc5: Add missing SRGBA8 ETC2 support.Eric Anholt2017-11-071-0/+1
| | | | Fixes piglit oes_compressed_etc2_texture-miptree srgb8-alpha8.
* broadcom/vc5: Disable early Z test when the FS writes Z.Eric Anholt2017-11-071-1/+2
| | | | Fixes piglit early-z.
* broadcom/vc5: Shift the min/max lod fields by the BASE_LEVEL.Eric Anholt2017-11-072-4/+15
| | | | | | | | The lod clamping is what limits you between base and last level, and the base level field is just there to help decide where the min/mag change happens. Fixes tex-miplevel-selection GL2:texture()
* broadcom/vc5: Add support for anisotropic filtering.Eric Anholt2017-11-071-0/+9
|
* broadcom/vc5: Fix mipmap filtering enums.Eric Anholt2017-11-072-8/+32
| | | | | | | | The ordering of the values was even less obvious than I thought, with both the mip filter and the min filter being in different bits depending on whether the mip filter is none. Fixes piglit fs-textureLod-miplevels.shader_test
* broadcom/vc5: Fix height padding of small UIF slices.Eric Anholt2017-11-071-1/+5
| | | | | | | | The HW doesn't pad the slice's height to make a full 4x4 group of UIF blocks. We just need to pad to columns, and the start of the next column appears in the bottom of the previous column's last block. Fixes piglit fs-textureOffset-2D.
* broadcom/vc5: Print the actual offsets in HW for our resource layout debug.Eric Anholt2017-11-071-34/+55
| | | | | The alignment of level 0 is non-obvious, so it's hard to turn a faulting address into a slice without this.
* broadcom/vc5: Set the available VS outputs to match the FS inputs.Eric Anholt2017-11-071-1/+4
| | | | Fixes piglit glsl-es-3.00/minimum-maximums.txt.
* broadcom/vc5: Set the max texture LOD bias.Eric Anholt2017-11-071-1/+1
| | | | | The field is signed 8.8, so the usual 16.0f fits. Fixes piglit gl-2.1-minmax.
* broadcom/vc5: Fix translation of stencil ops.Eric Anholt2017-11-072-8/+30
| | | | | They aren't quite in the same order as the gallium defines. Fixes piglit gl-2.0-two-sided-stencil.
* broadcom/vc5: Move stencil state packing to the CSO.Eric Anholt2017-11-073-27/+47
| | | | Only the stencil ref comes in as dynamic state at emit time.
* broadcom/vc5: Introduce a helper for pre-packing our V3DXX structs.Eric Anholt2017-11-072-165/+155
| | | | | | This is so much more pleasant to write than the manual V3D33_whatever_pack() calls, and will be useful for when we start doing actual per-V3D compiles.
* broadcom/vc5: Add a cl_emit() variant for merging with a pre-packed struct.Eric Anholt2017-11-072-19/+29
| | | | Cleans up the hand-written code, at the cost of another ugly macro.
* broadcom/vc5: Skip emitting depth offset while disabled.Eric Anholt2017-11-071-1/+4
| | | | | The enable flag is also in the rasterizer state, so it will be emitted once it's needed.
* broadcom/vc5: Don't emit stencil config if not doing stencil test.Eric Anholt2017-11-071-1/+2
| | | | | | As with blending, we'll have the bit flagged again when it gets reenabled in CONFIGURATION_BITS, so there's no need to emit test state if we're not testing.
* broadcom/vc5: Don't emit updated blend factors/funcs while disabled.Eric Anholt2017-11-071-1/+5
| | | | | The dirty bit will be flagged again when re-enbaled. Keeps us from emitting blend state in CLs that never do blending.
* broadcom/vc5: Fix missing enum decode for indexed primitives.Eric Anholt2017-11-071-2/+1
|
* broadcom/vc5: Drop padding bits from the bottom of the TSDA address.Eric Anholt2017-11-071-1/+1
| | | | Fixes misaligned-looking addresses in decode.
* broadcom/vc5: Make sure the TMU indirect struct is appropriately aligned.Eric Anholt2017-11-071-0/+2
| | | | | I was hoping that this would help with fbo-generatemipmap hangs, but no luck.
* broadcom/genxml: Fix decoding of groups with small fields.Kenneth Graunke2017-11-071-2/+4
| | | | | | | | | | | | | | | | | Groups containing fields smaller than a byte probably not being decoded correctly. For example: <group count="32" start="32" size="4"> <field name="Vertex Element Enables" start="0" end="3" type="uint"/> </group> gen_field_iterator_next would properly walk over each element of the array, incrementing group_iter. However, the code to print the actual values only considered iter->field->start/end, which are 0 and 3 in the above example. So it would always fetch bits 3:0 of the current byte, printing the same value over and over. Cc: Eric Anholt <[email protected]>
* broadcom/vc5: Use DEPTH24_STENCIL8 for rendering to depth-only textures.Eric Anholt2017-11-071-1/+1
| | | | | | | | | The HW puts the pad bits at the top for DEPTH_COMPONENT24, but we need it at the bottom for texturing. Using the format with stencil probably means we won't be able to do Z24 and separate S8, but I wasn't planning on supporting that anyway. Fixes hiz-depth-read-fbo-d24-s0
* anv: Suffix anv-private 'VK' tokens with 'ANV'Chad Versace2017-11-075-31/+31
| | | | | | | | | | | | | | | | | I saw VK_IMAGE_ASPECT_ANY_COLOR_BIT while hacking anv_formats.c and got confused. "Huh? What extension added that?". No extension defines it; anv_private.h defines it. To remove confusion, rename the anv-private VK tokens as if they were extension tokens with the ANV vendor suffix. I found only two such tokens: VK_IMAGE_ASPECT_ANY_COLOR_BIT VK_IMAGE_ASPECT_PLANES_BITS Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* anv: Remove unused variable 'gen'Chad Versace2017-11-071-4/+0
| | | | | | | In anv_physical_device_get_format_properties(). Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Lionel Landwerlin <[email protected]>
* radeonsi: add si_screen::has_ls_vgpr_init_bugMarek Olšák2017-11-074-3/+5
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use ac_create_target_machineMarek Olšák2017-11-073-17/+15
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: use ac_get_llvm_processor_nameMarek Olšák2017-11-075-39/+7
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: don't set gs_table_depthMarek Olšák2017-11-071-2/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi/gfx9: limit the scissor bug workaround to Vega10 and Raven onlyMarek Olšák2017-11-071-4/+4
| | | | Reviewed-by: Nicolai Hähnle <[email protected]>
* radeonsi: remove unused field in the PCI ID tableMarek Olšák2017-11-074-232/+232
| | | | Reviewed-by: Alex Deucher <[email protected]>
* mesa: fix deleting the dummy ATI_fsMiklós Máté2017-11-071-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DummyShader is used by GenFragmentShadersATI() as a placeholder to mark IDs as allocated. Context cleanup wants to delete everything in ctx->Shared->ATIShaders, and crashes on these placeholders with this backtrace: ==15060== Invalid free() / delete / delete[] / realloc() ==15060== at 0x482F478: free (vg_replace_malloc.c:530) ==15060== by 0x57694F4: _mesa_delete_ati_fragment_shader (atifragshader.c:68) ==15060== by 0x58B33AB: delete_fragshader_cb (shared.c:208) ==15060== by 0x5838836: _mesa_HashDeleteAll (hash.c:295) ==15060== by 0x58B365F: free_shared_state (shared.c:377) ==15060== by 0x58B3BC2: _mesa_reference_shared_state (shared.c:469) ==15060== by 0x578687F: _mesa_free_context_data (context.c:1366) ==15060== by 0x595E9EC: st_destroy_context (st_context.c:642) ==15060== by 0x5987057: st_context_destroy (st_manager.c:772) ==15060== by 0x5B018B6: dri_destroy_context (dri_context.c:217) ==15060== by 0x5B006D3: driDestroyContext (dri_util.c:511) ==15060== by 0x4A1CBE6: dri3_destroy_context (dri3_glx.c:170) ==15060== Address 0x7b5dae0 is 0 bytes inside data symbol "DummyShader" Also, DeleteFragmentShadersATI() should not assert on DummyShader, just remove the hash entry. Normally one would define a shader after GenFragmentShadersATI(), and BindFragmentShaderATI() replaces the placeholder with a real object. However, the specification doesn't say that one has to define a shader for each allocated ID. Signed-off-by: Miklós Máté <[email protected]> Signed-off-by: Marek Olšák <[email protected]>