summaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers
Commit message (Collapse)AuthorAgeFilesLines
* i965/fs: handle fp64 opcodes in brw_do_channel_expressionsIago Toral Quiroga2016-05-101-9/+14
| | | | | | | | | | | | | | In the case of the pack opcode we are already doing the lowering in NIR, so no need to do it here. The unpack opcode operates on scalars, so it should not be lowered. In the case of frexp_sig and frexp_exp, they are lowered in lower_instructions, so we don't have to care about them. All the remaining opcodes involve conversions from and to doubles and are business as usual. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add support for f2d and d2fConnor Abbott2016-05-101-0/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add a pass for legalizing d2fConnor Abbott2016-05-104-0/+81
| | | | | | | | | We need to do this late, in order to avoid partial writes during the optimization loop. v2: Use subscript() instead of stride(). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: fix dst width calculation in CSEConnor Abbott2016-05-101-1/+2
| | | | | | | | v2 (Sam): - Fix line width (Topi). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: fix regs_written in LOAD_PAYLOAD for doublesConnor Abbott2016-05-101-2/+6
| | | | | | | | v2: Account for the stride of the dst (Iago) Signed-off-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: fix is_copy_payload() for doublesConnor Abbott2016-05-101-1/+1
| | | | | | | | | v2 (Sam): - LOAD_PAYLOAD treats each header source as a 32B block regardless of the datatype. Drop the change (Curro) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965/fs: fix compares for doublesConnor Abbott2016-05-101-3/+31
| | | | | | | | | | | | The destination has to have the same source as the type, or else the simulator will complain. As a result, we need to emit a CMP that outputs a 64-bit wide result and then do a strided MOV to pick out the low 32 bits of each channel. v2: Use subscript() instead of stride() (Curro) Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: extend exec_size halving in the generatorConnor Abbott2016-05-101-6/+10
| | | | | | | | | | | | | The HW has a restriction that only vertical stride may cross register boundaries. Previously, this only mattered for SIMD16 instructions where we needed to use the same regioning parameters as the equivalent SIMD8 instruction but double the exec size. But we need to do the same splitting for 64-bit instructions as well as instructions with a stride of 2 (which effectively consume 64 bits per element). Fix up the code to do the right thing instead of special-casing SIMD16. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: fix assign_constant_locations() for doublesConnor Abbott2016-05-101-2/+6
| | | | | | | | | | | | | Uniform doubles will read two registers, in which case we need to mark both as being live. v2 (Sam): - Use a formula to get the number of registers read with proper units (Curro). Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/fs: use byte_offset() in offset() for uniformsConnor Abbott2016-05-101-3/+1
| | | | | | | | This makes things more consistent, and also fixes the offset calculation for double uniforms. Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: handle uniforms in byte_offset()Connor Abbott2016-05-101-1/+5
| | | | | | | | v2: Do it only for uniforms (Iago) Signed-off-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: fix type_size() for doublesConnor Abbott2016-05-101-1/+2
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: optimize unpack doubleIago Toral Quiroga2016-05-101-4/+26
| | | | | | | | | | | | When we are actually unpacking from a double that we have previously packed from its 32-bit components we can bypass the pack operation and source from its arguments directly. v2 (Sam): - Fix line overflow (Topi) - Bail if the parent instruction's source is not SSA (Connor) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: optimize pack doubleIago Toral Quiroga2016-05-101-0/+29
| | | | | | | | | | | | | | When we are actually creating a double using values obtained from a previous unpack operation we can bypass the unpack and source from the original double value directly. v2: - Style changes (Topi) - Bail is parent instruction's src is not SSA (Connor) v3: Use subscript() instead of stride() (Curro) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs/nir: translate double pack/unpackConnor Abbott2016-05-101-0/+12
| | | | | | | | | v2 (Sam): - Fix line overflow (Topi). v3: Use subscript() instead of stride() (Curro) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add a pass for lowering PACK opcodesConnor Abbott2016-05-104-0/+62
| | | | | | v2: Use subscript() instead of stride() (Curro) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add PACK opcodeConnor Abbott2016-05-105-1/+15
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Introduce helper to extract a field from each channel of a register.Francisco Jerez2016-05-101-0/+28
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/fs: always pass the bitsize to brw_type_for_nir_type()Connor Abbott2016-05-101-3/+9
| | | | | | | | | | v2 (Sam): - Add bitsize to brw_type_for_nir_type() in optimize_extract_to_float() v3 (Sam): - Fix line width (Topi). Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: add support for printing double immediatesConnor Abbott2016-05-101-0/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: don't propagate 64-bit immediatesConnor Abbott2016-05-101-0/+2
| | | | | | | | | They can only be used with 1-src instructions, which practically (since we should've constant-propagated away all 1-src instructions with 64-bit immediates in NIR) means that they must be kept in separate MOV's and can't be propagated. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: use the NIR bit size when creating registersConnor Abbott2016-05-101-8/+28
| | | | | | | | | | | | | | | | | | | | | v2 (Iago): - Squashed bits from 'support double precission constant operands for the implementation of 64-bit emit_load_const'. - Do not use BRW_REGISTER_TYPE_D for all 32-bit registers since that breaks asserts and functionality for some piglit tests. Just keep 32-bit types untouched and add 64-bit support. - Use DF instead of Q for 64-bit registers. Otherwise the code we generate will use Q sometimes and DF others and we hit unwanted DF/Q conversions, so always use DF. v3 (Sam): - Mark 'reg_type' occurrences as const (Topi). Signed-off-by: Topi Pohjolainen <[email protected]> Signed-off-by: Tapani Palli <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]> Signed-off-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: fixup uniform setup for doublesConnor Abbott2016-05-101-1/+6
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: two-argument instructions can only use 32-bit immediatesIago Toral Quiroga2016-05-101-0/+2
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: fix brw_abs_immediate() for doublesIago Toral Quiroga2016-05-101-2/+4
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: fix brw_saturate_immediate() for doublesIago Toral Quiroga2016-05-101-6/+27
| | | | | | | | | | v2 (Sam): - Mark 'size' as const (Topi). - Add comment to explain that we do copies 64-bits regardless of the type (Topi) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: fix is_zero(), is_one() and is_negative_one() for doublesConnor Abbott2016-05-101-4/+24
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: fix brw_negate_immediate() for doublesConnor Abbott2016-05-101-2/+4
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/eu: add support for DF immediatesConnor Abbott2016-05-101-7/+21
| | | | | | | | | v2 (Sam): - Remove 'however' from the comment (Topi) Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: add support for disassembling DF immediatesConnor Abbott2016-05-101-1/+1
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: add support for getting/setting DF immediatesConnor Abbott2016-05-101-0/+25
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: add brw_imm_dfConnor Abbott2016-05-102-0/+10
| | | | | | | | | | v2 (Iago) - Fixup accessibility in backend_reg Signed-off-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/eu: Allow 3-src float ops with doublesTopi Pohjolainen2016-05-101-6/+18
| | | | | | | | | | v2: - set 3src_src_type for BRW_REGISTER_TYPE_DF (Connor) Signed-off-by: Topi Pohjolainen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965/disasm: fix disasm of 3-src doublesConnor Abbott2016-05-101-0/+1
| | | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Tell backend register about double precision typeTopi Pohjolainen2016-05-101-1/+2
| | | | | | | | | Signed-off-by: Topi Pohjolainen <[email protected]> Signed-off-by: Tapani P\344lli <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Determine size of double precision float registerTopi Pohjolainen2016-05-101-0/+1
| | | | | | | | | | | | | | This is used to determine how many registers an instruction reads and writes as well as for offseting register region into a desired component. v2 (Connor): rebase on master Signed-off-by: Topi Pohjolainen <[email protected]> Signed-off-by: Tapani P\344lli <[email protected]> Signed-off-by: Abdiel Janulgue <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: Lower DFRACEXP/DLDEXPTopi Pohjolainen2016-05-101-0/+1
| | | | | | | | | | | | v2 (Connor): rebase on master which moved this to brw_link.cpp v3 (Sam): - Only enable DFREXP_DLDEXP_TO_ARITH in process_glsl_ir(). This is used for doubles. Single floating point op is lowered by NIR. Signed-off-by: Topi Pohjolainen <[email protected]> Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: use pack/unpackDouble loweringConnor Abbott2016-05-101-0/+1
| | | | | Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: use double lowering passConnor Abbott2016-05-102-0/+10
| | | | | | | | | | v2: also lower trunc, ceil, floor, fract and roundEven (Iago) v3: also lower mod for doubles (Sam) Signed-off-by: Iago Toral Quiroga <[email protected]> Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* i965: enable lrp lowering for doublesSamuel Iglesias Gonsálvez2016-05-101-0/+1
| | | | | | | | | Broadwell and previous generations does not support lrp instruction operating with doubles. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> Reviewed-by: Francisco Jerez <[email protected]>
* Revert "Revert "i965: Switch to scalar TCS by default.""Kenneth Graunke2016-05-091-1/+1
| | | | | | | | This reverts commit bd326c229c528a214c9fda705e7a961cfa49ac9e. Now that we've fixed the GPU hangs, let's turn it back on. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Actually assign binding table offsets for the TCS.Kenneth Graunke2016-05-091-0/+5
| | | | | | | | | | | | As far as I can tell, this was just entirely missing...honestly, I'm not sure how anything worked at all. Caught by noticing GPU hangs in image load store tests with scalar TCS, but probably has broader implications. Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Clamp "Maximum VP Index" to 1 when gl_ViewportIndex isn't written.Kenneth Graunke2016-05-091-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs_visitor::emit_urb_writes skips writing the VUE header for shaders that don't write gl_PointSize, gl_Layer, or gl_ViewportIndex. This leaves their values uninitialized. Kristian's nearby comment says: "But often none of the special varyings that live there are written and in that case we can skip writing to the vue header, provided the corresponding state properly clamps the values further down the pipeline." However, we were clamping gl_ViewportIndex to [0, 15], so we would end up using a random viewport. To fix this, detect when the shader doesn't write gl_ViewportIndex, and clamp it to [0, 0]. The vec4 backend always writes zeros to the VUE header, so it doesn't suffer from this problem. With vec4-style HWord writes, we can write the header and position together in a single message. In the FS world, we would need 4 extra MOVs of 0 and a longer message, or a separate OWord write. It's likely cheaper to just clamp the value. Fixes DiRT Showdown and Bioshock Infinite, which only rendered half of the screen - the lower left of two triangles. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93054 Cc: [email protected] Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/hsw: Fix brw_store_data_imm*Jordan Justen2016-05-091-10/+12
| | | | | | | | For Gen6 through Haswell dword 1 is MBZ. In gen 8 it becomes part of the 64-bit address. Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Reimplement ARB_transform_feedback2 on Haswell and later.Kenneth Graunke2016-05-095-12/+318
| | | | | | | | | | | | | | | | | | | | | | | | | | My old implementation accumulated <start, end> pairs in a buffer, and eventually processed that data on the CPU. This meant flushing the batchbuffer and waiting for it to completely execute before we could map it, resulting in really long stalls. We could also run out of space in the buffer, and have to do this early. Instead, we can use Haswell's MI_MATH command to do the (end - start) subtraction, as well as the multiplication by 2 or 3 to convert from the number of primitives written to the number of vertices written. We still need to CS stall to read the counters, but otherwise everything is completely pipelined - there's no CPU<->GPU synchronization required. It also uses only 80 bytes in the buffer, no matter what. Improves performance in Manhattan on Skylake GT3e at 800x600 by 6.1086% +/- 0.954166% (n=9). At 1920x1080, improves performance by 2.82103% +/- 0.148596% (n=84). v2: Fix number of primitives -> number of vertices calculation for GL_TRIANGLES (I was multiplying by 4 instead of 3.) Caught by Jordan Justen. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Add a brw_load_register_reg64 helper.Kenneth Graunke2016-05-092-0/+20
| | | | | | | | | It appears that we can't do this in a single command (like we do for MI_LOAD_REGISTER_IMM) - the Skylake simulator gets rather grumpy about the command length if I try to combine them. No matter. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Only enable ARB_query_buffer_object for newer kernels on Haswell.Kenneth Graunke2016-05-093-1/+15
| | | | | | | | | | | | | | On Haswell, we need version 6 of the kernel command parser in order to write the math registers. Our implementation of ARB_query_buffer_object heavily relies on MI_MATH, so we should only advertise it when MI_MATH is available. We also need MI_LOAD_REGISTER_REG, which requires version 7 of the command parser. To make these checks easier, introduce a screen->has_mi_math_and_lrr flag that will be set when both commands are supported. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* Revert "i965: Always use Y-tiled buffers on SKL+"Daniel Stone2016-05-094-30/+8
| | | | | | | | | | | | | | | | | | | This commit broke Weston, Mutter, and xf86-video-modesetting, on KMS. In order to use Y-tiled buffers, the kernel requires the tiling mode to be explicitly named through the I915_FORMAT_MOD_Y_TILED AddFB2 modifier; it disallows any attempt to infer the buffer's tiling mode. As the GBM API does not have a way to extract modifiers for a buffer, this commit broke all users of GBM on SKL+. Revert it for now, until we get a way to extract modifier information from GBM, and also let GBM users inform the implementation that it intends to use the modifiers. This reverts commit 6a0d036483caf87d43ebe2edd1905873446c9589. Signed-off-by: Daniel Stone <[email protected]> Acked-by: Ben Widawsky <[email protected]> Tested-by: Hans de Goede <[email protected]>
* Revert "i965: Switch to scalar TCS by default."Kenneth Graunke2016-05-051-1/+1
| | | | | | | This reverts commit b593737ed8349b280fa29242c35f565b59ab3025. Apparently it causes GPU hangs on some image load store tests. Let's turn it back off until we figure out why.
* i965/fs: Move handling of samples_identical into the switch statementJason Ekstrand2016-05-051-21/+19
| | | | | This is where we handle texop_texture_samples so it makes things more consistent.