aboutsummaryrefslogtreecommitdiffstats
path: root/src/mesa/drivers/dri/i965/brw_eu.h
Commit message (Collapse)AuthorAgeFilesLines
* i965: enable the emission of the DIM instructionSamuel Iglesias Gonsálvez2016-07-141-0/+1
| | | | | | | | | | v2 (Matt): - Take a DF source argument for the DIM instruction emission in the visitors. - Indentation. Signed-off-by: Samuel Iglesias Gonsálvez <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Clean up remaining uses of dispatch_width in the generator.Francisco Jerez2016-05-271-1/+0
| | | | | | | | Most of these are bugs because the intended execution size of an instruction and the dispatch width of the shader aren't necessarily the same (especially in SIMD32 programs). Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Remove brw_codegen::compressed and ::compressed_stack.Francisco Jerez2016-05-271-1/+0
| | | | Reviewed-by: Jason Ekstrand <[email protected]>
* i965/eu: Define alternative interface for setting compression and group ↵Francisco Jerez2016-05-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | controls. This implements some simple helper functions that can be used to specify the group of channel enable signals and compression enable that apply to a brw_inst instruction. It's intended to replace brw_set_default_compression_control eventually because the current interface has a number of shortcomings inherited from the Gen-4-5-centric representation of compression and group controls as a single non-orthogonal enum: On the one hand it doesn't work for specifying arbitrary group controls other than 1Q and 2Q, which are frequently useful in SIMD32 and FP64 programs. On the other hand the current interface forces you to update the compression *and* group controls simultaneously, which has been the source of a number of generator bugs (a bunch of them fixed in this series), because in many cases we would end up resetting the group controls to zero inadvertently even though everything we wanted to do was disable instruction compression -- The latter seems especially unfortunate on Gen6+ hardware which have no explicit compression control, so we would end up bashing the quarter control field of the instruction for no benefit. Instead of a single function that updates both at the same time introduce separate interfaces to update one or the other independently preserving the current value of the other (which typically comes from the back-end IR so it has to be respected). Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Add disassembler support for remaining opcodes.Matt Turner2016-05-031-2/+19
| | | | | | | | | For opcodes that changed meaning on different generations, we store a pointer to a secondary table and the table's size in a tagged union in place of the mnemonic and number of sources. Acked-by: Francisco Jerez <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make opcode_descs and gen_from_devinfo() static.Matt Turner2016-05-031-4/+0
| | | | | | | | | The previous commit replaced direct uses of opcode_descs with calls to the wrapper function, which should be the only method of accessing opcode_descs's data. As a result gen_from_devinfo() can also be made static. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Merge inst_info and opcode_desc tables.Matt Turner2016-05-031-2/+5
| | | | | | | I merged opcode_desc into inst_info (instead of the other way around) because inst_info was sorted by opcode number. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Move inst_info from brw_eu_validate.c to brw_eu.c.Matt Turner2016-05-031-0/+8
| | | | | | | | | Drop the uses of 'enum gen' to a plain int, so that we don't have to expose the bitfield definitions and GEN_GE/GEN_LE macros to other users of brw_eu.h. As a result, s/.gen/.gens/ to avoid confusion with devinfo->gen. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Wrap opcode_desc look-up in a function.Francisco Jerez2016-05-031-1/+5
| | | | | | | | | | | | | | The function takes a device info struct as argument in addition to the opcode number in order to disambiguate between multiple opcode_desc entries for different instructions with the same opcode number. Reviewed-by: Iago Toral Quiroga <[email protected]> [v1] [v2] mattst88: Put brw_opcode_desc() in brw_eu.c instead of moving it there in a later patch. Reviewed-by: Kenneth Graunke <[email protected]> [v2] [v3] mattst88: Return NULL if opcode >= ARRAY_SIZE(opcode_descs) Reviewed-by: Matt Turner <[email protected]>
* i965: Pass devinfo pointer to is_3src() helpers.Francisco Jerez2016-05-031-1/+1
| | | | | | | | | | | | | | This is not strictly required for the following changes because none of the three-source opcodes we support at the moment in the compiler back-end has been removed or redefined, but that's likely to change in the future. In any case having hardware instructions specified as a pair of hardware device and opcode number explicitly in all cases will simplify the opcode look-up interface introduced in a subsequent commit, since the opcode number alone is in general ambiguous. Reviewed-by: Iago Toral Quiroga <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Make brw_set_message_descriptor() non-static.Kenneth Graunke2015-12-111-0/+8
| | | | | | | | I want to use this directly from brw_vec4_generator.cpp. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Jason Ekstrand <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/gen9+: Switch thread scratch space to non-coherent stateless access.Francisco Jerez2015-11-261-0/+2
| | | | | | | | | | | | | | | | | | | | The thread scratch space is thread-local so using the full IA-coherent stateless surface index (255 since Gen8) is unnecessary and potentially expensive. On Gen8 and early steppings of Gen9 this is not a functional change because the kernel already sets bit 4 of HDC_CHICKEN0 which overrides all HDC memory access to be non-coherent in order to workaround a hardware bug. This happens to fix a full system hang when running any spilling code on a pre-production SKL GT4e machine I have on my desk (forcing all HDC access to non-coherent from the kernel up to stepping F0 might be a good idea though regardless of this patch), and improves performance of the OglPSBump2 SynMark benchmark run with INTEL_DEBUG=spill_fs by 33% (11 runs, 5% significance) on a production SKL GT2 (on which HDC IA-coherency is apparently functional so it wouldn't make sense to disable globally). Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Clean up #includes in the compiler.Matt Turner2015-11-241-2/+0
| | | | Reviewed-by: Ian Romanick <[email protected]>
* i965: Move MRF macros from brw_inst.h to brw_eu.h.Matt Turner2015-11-241-0/+9
| | | | | | brw_inst.h is only for the brw_inst/brw_compact_inst functions. Reviewed-by: Ian Romanick <[email protected]>
* i965: Add initial assembly validation pass.Matt Turner2015-11-121-0/+4
| | | | | | | Initially just checks that sources are non-NULL, which would have alerted us to the problem fixed by commit 6c846dc5. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Consolidate is_3src() functions.Matt Turner2015-11-121-0/+6
| | | | Otherwise I'll have to add another later in this series.
* i965/fs: Handle non-const sample number in interpolateAtSampleNeil Roberts2015-10-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a non-const sample number is given to interpolateAtSample it will now generate an indirect send message with the sample ID similar to how non-const sampler array indexing works. Previously non-const values were ignored and instead it ended up using a constant 0 value. The generator will try to determine if the sample ID is dynamically uniform via nir_src_is_dynamically_uniform. If not it will query the pixel interpolator in a loop, once for each different live sample number. The next live sample number is found using emit_uniformize. If multiple live channels have the same sample number then they will be handled in a single iteration of the loop. The loop is necessary because the indirect send message doesn't seem to have a way to specify a different value for each fragment. This fixes the following two Piglit tests: arb_gpu_shader5-interpolateAtSample-nonconst arb_gpu_shader5-interpolateAtSample-dynamically-nonuniform v2: Handle dynamically non-uniform sample ids. v3: Remove the BREAK instruction and predicate the WHILE directly. Make the tokens arrays const. (Matt Turner) v4: Iterate over the live channels instead of each possible sample number. v5: Don't special case immediate values in brw_pixel_interpolator_query. Make a better wrapper for the function to set up the PI send instruction. Ensure that the SHL instructions are scalar. (Francisco Jerez). Reviewed-by: Francisco Jerez <[email protected]>
* i965: Add brw_barrier to emit a Gateway Barrier SENDJordan Justen2015-06-121-0/+2
| | | | | | | | | | | | This will be used to implement the Gateway Barrier SEND needed to implement the barrier function. v2: * notify => gateway_notify (Ken) * combine short lines of brw_barrier proto/decl (mattst88) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Chris Forbes <[email protected]>
* i965: Add brw_WAIT to emit wait instructionJordan Justen2015-06-121-0/+2
| | | | | | | | | | | This will be used to implement the barrier function. v2: * Rename to brw_WAIT (mattst88) Signed-off-by: Jordan Justen <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Introduce the FIND_LIVE_CHANNEL pseudo-opcode.Francisco Jerez2015-05-041-0/+4
| | | | | | | | | | | | | This instruction calculates the index of an arbitrary channel enabled in the current execution mask. It's expected to be used as input for the BROADCAST opcode, but it's implemented as a separate instruction rather than being baked into BROADCAST because FIND_LIVE_CHANNEL has no dependencies so it can always be CSE'ed with other instances of the same instruction within a basic block. v2: Whitespace fixes. Reviewed-by: Matt Turner <[email protected]>
* i965: Introduce the BROADCAST pseudo-opcode.Francisco Jerez2015-05-041-0/+6
| | | | | | | | | | | | | | | | | | | The BROADCAST instruction picks the channel from its first source given by an index passed in as second source. This will be used in situations where all channels from the same SIMD thread have to agree on the value of something, e.g. a surface binding table index. This is in particular the case for UBO, sampler and image arrays, which can be indexed dynamically with the restriction that all active SIMD channels access the same index, provided to the shared unit as part of a single scalar field of the message descriptor. Simply taking the index value from the first channel as we were doing until now is incorrect, because it might contain an uninitialized value if the channel had previously been disabled by non-uniform control flow. v2: Minor style fixes. Improve commit message. Reviewed-by: Matt Turner <[email protected]>
* i965: Add memory fence opcode.Francisco Jerez2015-05-041-0/+4
| | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add typed surface access opcodes.Francisco Jerez2015-05-041-0/+24
| | | | | Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Add untyped surface write opcode.Francisco Jerez2015-05-041-0/+7
| | | | | Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: Fix the untyped surface opcodes to deal with indirect surface access.Francisco Jerez2015-05-041-5/+5
| | | | | | | | | | | | Change brw_untyped_atomic() and brw_untyped_surface_read() to take the surface index as a register instead of a constant and to use brw_send_indirect_message() to emit the indirect variant of send with a dynamically calculated message descriptor. This will be required to support variable indexing of image arrays for ARB_shader_image_load_store. Acked-by: Kenneth Graunke <[email protected]> Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Rename brw_compile to brw_codegenJason Ekstrand2015-04-221-64/+64
| | | | | | | | | | | | This name better matches what it's actually used for. The patch was generated with the following command: for file in *; do sed -i -e s/brw_compile/brw_codegen/g $file done Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Remove the context field from brw_compilerJason Ekstrand2015-04-221-3/+2
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Make the disassembler take a device_info instead of a contextJason Ekstrand2015-04-221-2/+2
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Make instruction compaction take a device_info instead of a contextJason Ekstrand2015-04-221-6/+6
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Make the brw_inst helpers take a device_info instead of a contextJason Ekstrand2015-04-221-5/+5
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965/eu: Add a devinfo parameter to brw_compileJason Ekstrand2015-04-221-0/+1
| | | | Reviewed-by: Matt Turner <[email protected]>
* i965: Replace guess_execution_size with something simpler.Matt Turner2015-04-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | guess_execution_size() does two things: 1. Cope with small destination registers. 2. Cope with SIMD8 vs SIMD16 mode. This patch replaces the first with a simple if block in brw_set_dest: if the destination register width is less than 8, you probably want the execution size to match. (I didn't put this in the 3src block because it doesn't seem to matter.) Since only the FS compiler cares about SIMD16 mode, it's easy to just set the default execution size there. This pattern was already been proven in the Gen8+ generator, but we didn't port it back to the existing generator when we combined the two. This is based on a patch from Ken from about a year ago. I've rebased it and and fixed a few bugs. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: Pass number of components explicitly to brw_untyped_atomic and ↵Francisco Jerez2015-03-201-2/+2
| | | | | | | | | | | | _surface_read. And calculate the message response size based on the number of components rather than the other way around. This simplifies their interface somewhat and allows the caller to request a writeback message with more than one vector component in SIMD4x2 mode. Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965: Factor out logic to build a send message instruction with indirect ↵Francisco Jerez2015-03-201-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | descriptor. This is going to be useful because the Gen7+ uniform and varying pull constant, texturing, typed and untyped surface read, write, and atomic generation code on the vec4 and fs back-end all require the same logic to handle conditionally indirect surface indices. In pseudocode: | if (surface.file == BRW_IMMEDIATE_VALUE) { | inst = brw_SEND(p, dst, payload); | set_descriptor_control_bits(inst, surface, ...); | } else { | inst = brw_OR(p, addr, surface, 0); | set_descriptor_control_bits(inst, ...); | inst = brw_SEND(p, dst, payload); | set_indirect_send_descriptor(inst, addr); | } This patch abstracts out this frequently recurring pattern so we can now write: | inst = brw_send_indirect_message(p, sfid, dst, payload, surface) | set_descriptor_control_bits(inst, ...); without worrying about handling the immediate and indirect surface index cases explicitly. v2: Rebase. Improve documentatation and commit message. (Topi) Preserve UW destination type cargo-cult. (Topi, Ken, Matt) Reviewed-by: Topi Pohjolainen <[email protected]> Acked-by: Kenneth Graunke <[email protected]>
* i965/fs: Implement SIMD16 dual source blending.Iago Toral Quiroga2015-03-091-0/+1
| | | | | | | | | | | | From the SNB PRM, volume 4, part 1, page 193: "The dual source render target messages only have SIMD8 forms due to maximum message length limitations. SIMD16 pixel shaders must send two of these messages to cover all of the pixels. Each message contains two colors (4 channels each) for each pixel in the message payload." Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82831 Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: Introduce brw_negate_cmod().Kenneth Graunke2015-02-271-0/+1
| | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/emit: Do the sampler index adjustment directly in header.0.3Jason Ekstrand2015-01-221-2/+1
| | | | | | | | | | | Prior to this commit, the adjust_sampler_state_pointer function took an extra register that it could use as scratch space. The usual candidate was the destination of the sampler instruction. However, if that register ever aliased anything important such as the sampler index, this would scratch all over important data. Fortunately, the calculation is such that we can just do it in place and we don't need the scratch space at all. Reviewed-by: Chris Forbes <[email protected]>
* i965/fs: Add a an optional source to the FS_OPCODE_FB_WRITE instructionJason Ekstrand2014-09-301-2/+2
| | | | | | | | | Previously, we were use the base_mrf parameter of fs_inst to store the MRF location. In preparation for doing FB writes from the GRF, we now also allow you to set inst->base_mrf to -1 and provide a source register. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: Use the GRF for UNTYPED_ATOMIC instructionsJason Ekstrand2014-09-301-1/+1
| | | | | Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Use BRW_MATH_DATA_SCALAR when source regioning is scalar.Matt Turner2014-09-291-1/+0
| | | | | | Notice the mistaken (but harmless) argument swapping in brw_math_invert(). Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Extract helper function for surface state pointer adjustmentChris Forbes2014-08-151-0/+5
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add low-level support for indirect sendsChris Forbes2014-08-151-0/+5
| | | | | | | This provides a reasonable place to enforce the hardware restriction that indirect descriptors must be in a0.0 Signed-off-by: Chris Forbes <[email protected]>
* i965/eu: Update jump distance scaling for Broadwell.Kenneth Graunke2014-08-101-0/+4
| | | | | | | | | | Broadwell measures jump distances in bytes, so we need to scale by 16. v2: Update the function in brw_eu.h, not in brw_eu_emit.c. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/eu: Refactor jump distance scaling to use a helper function.Kenneth Graunke2014-08-101-0/+20
| | | | | | | | | | | | | | | | | | | Different generations of hardware measure jump distances in different units. Previously, every function that needed to set a jump target open coded this scaling, or made a hardcoded assumption (i.e. just used 2). Most functions start with the number of instructions to jump, and scale up to the hardware-specific value. So, I made the function match that. Others start with a byte offset, and divide by a constant (8) to obtain the jump distance. This is actually 16 / 2 (the jump scale for Gen5-7). v2: Make the helper a static inline defined in brw_eu.h, instead of an actual function in brw_eu_emit.c (as suggested by Matt). Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/eu: Merge brw_CONT and gen6_CONT.Kenneth Graunke2014-08-081-1/+0
| | | | | | | The only difference is setting PopCount on Gen4-5. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: add low-level support for send to pixel interpolatorChris Forbes2014-07-131-0/+10
| | | | | Signed-off-by: Chris Forbes <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Rename intel_asm_printer -> intel_asm_annotation.Matt Turner2014-07-051-1/+1
| | | | | | The #ifndef include guards already said the right thing :) Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Make a brw_conditional_mod enum.Matt Turner2014-07-051-2/+2
| | | | Reviewed-by: Topi Pohjolainen <[email protected]>
* i965: Pass brw to brw_try_compact_instruction().Matt Turner2014-06-261-1/+1
| | | | | Signed-off-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Replace struct brw_compact_instruction with brw_compact_inst.Matt Turner2014-06-261-3/+2
| | | | | Signed-off-by: Matt Turner <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>