summaryrefslogtreecommitdiffstats
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* gallium: add a cap for max vertex streamsIlia Mirkin2014-07-0115-1/+28
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add an index argument to create_queryIlia Mirkin2014-07-0125-33/+52
| | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add support for stream in so infoIlia Mirkin2014-07-013-0/+3
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* gallium: add vertex stream argument to EMIT/ENDPRIMIlia Mirkin2014-07-013-8/+8
| | | | | | | Signed-off-by: Ilia Mirkin <[email protected]> Reviewed-by: Marek Olšák <[email protected]> Reviewed-by: Brian Paul <[email protected]> Reviewed-by: Roland Scheidegger <[email protected]>
* i965/fs: Mark predicated PLN instructions with dependency hints.Matt Turner2014-06-301-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | To implement the unlit_centroid_workaround, previously we emitted (+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 1Q }; (-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 1Q }; where the flag register contains the channel enable bits from g0. Since the predicates are complementary, the pair of pln instructions write to non-overlapping components of the destination, which is the case that the dependency control hints are designed for. Typically setting dependency control hints on predicated instructions isn't safe (if an instruction doesn't execute due to the predicate, it won't update the scoreboard, leaving it in a bad state) but since we must have at least one channel executing (i.e., +f0 is true for some channel) by virtue of the fact that the thread is running, we can put the +f0 pln instruction last and set the hints: (-f0) pln(8) g20<1>F g16.4<0,1,0>F g2<8,8,1>F { align1 NoDDClr 1Q }; (+f0) pln(8) g20<1>F g16.4<0,1,0>F g4<8,8,1>F { align1 NoDDChk 1Q }; Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Predicate PLN instructions used in unlit centroid WA.Matt Turner2014-06-301-6/+14
| | | | | | Maybe lets us skip some PLN instructions if whole subspans are disabled? Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Add no_dd_{clear,check} fields to fs_inst.Matt Turner2014-06-302-6/+10
| | | | | | | And plumb them through. Also make the assert in the generator look like the vec4 one. Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/fs: Let sat-prop ignore live ranges if producer already has sat.Matt Turner2014-06-301-4/+7
| | | | | | | | | | | | | | | | | | | This sequence (where both x and w are used afterwards) wasn't handled. mul.sat x, y, z ... mov.sat w, x We assumed that if x was used after the mov.sat, that we couldn't propagate the saturate modifier, but in fact x was already saturated. So ignore the live range check if the producing instruction already saturates its result. Cuts one instruction from hundreds of TF2 shaders. total instructions in shared programs: 1995631 -> 1994951 (-0.03%) instructions in affected programs: 155248 -> 154568 (-0.44%) Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Pass const references to emit functions.Matt Turner2014-06-302-12/+14
| | | | Cuts 10k of .text and saves a bunch of useless struct copies.
* i965/vec4: Pass const references to instruction functions.Matt Turner2014-06-302-41/+67
| | | | | | | | | | text data bss dec hex filename 4231165 123200 39648 4394013 430c1d i965_dri.so 4186277 123200 39648 4349125 425cc5 i965_dri.so Cuts 43k of .text and saves a bunch of useless struct copies. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Pass const references to vec4_instruction().Matt Turner2014-06-302-6/+7
| | | | | | | | | | text data bss dec hex filename 4244821 123200 39648 4407669 434175 i965_dri.so 4231165 123200 39648 4394013 430c1d i965_dri.so Cuts 13k of .text and saves a bunch of useless struct copies. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Pass const references to instruction functions.Matt Turner2014-06-302-34/+41
| | | | | | | | | | text data bss dec hex filename 4270747 123200 39648 4433595 43a6bb i965_dri.so 4244821 123200 39648 4407669 434175 i965_dri.so Cuts 25k of .text and saves a bunch of useless struct copies. Reviewed-by: Kenneth Graunke <[email protected]>
* radeonsi: Use dma_copy when possible for si_blit.Axel Davy2014-07-011-0/+19
| | | | | | | | | | | | This improves GLX DRI3 GPU offloading significantly on CPU bound benchmarks particularly. No performance impact for DRI2 GPU offloading. v2: Add missing tests Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Marek Olšák<[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* glx/dri3: add GPU offloading support.Axel Davy2014-07-012-32/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The differences with DRI2 GPU offloading are: a) There's no logic for GPU offloading needed in the Xserver b) for DRI2, the card would render to a back buffer, and the content would be copied to the front buffer (the same buffers everytime). Here we can potentially use several back buffers and copy to buffers with no tiling to share with X. We send them with the Present extension. That means than the DRI2 solution is forced to have tearings with GPU offloading. In the ideal scenario, this DRI3 solution doesn't have this problem. However without dma-buf fences, a race can appear (if the card is slow and the rendering hasn't finished before the server card reads the buffer), and then old content is displayed. If a user hits this, he should probably revert to the DRI2 solution (LIBGL_DRI3_DISABLE). Users with cards fast enough seem to not hit this in practice (I have an Amd hd 7730m, and I don't hit this, except if I force a low dpm mode) c) for non-fullscreen apps, the DRI2 GPU offloading solution requires compositing. This DRI3 solution doesn't have this requirement. Rendering to a pixmap also works. d) There is no need to have a DDX loaded for the secondary card. V4: Fixes some piglit tests Signed-off-by: Axel Davy <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* loader: Use drirc device_id parameter in complement to DRI_PRIMEAxel Davy2014-07-014-4/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | DRI_PRIME is not very handy, because you have to launch the executable with it set, which is not always easy to do. By using drirc, the user specifies the target executable and the device to use. After that the program will be launched everytime on the target device. For example if .drirc contains: <driconf> <device driver="loader"> <application name="Glmark2" executable="glmark2"> <option name="device_id" value="pci-0000_01_00_0" /> </application> </device> </driconf> Then glmark2 will use if possible the render-node of ID_PATH_TAG pci-0000_01_00_0. v2: Fix compilation issue v3: Add "-lm" and rebase. Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* loader: add gpu selection code via DRI_PRIME.Axel Davy2014-07-012-0/+192
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: Fix the leak of device_name v3: Rebased It enables to use the DRI_PRIME env var to specify which gpu to use. Two syntax are supported: If DRI_PRIME is 1 it means: take any other gpu than the default one. If DRI_PRIME is the ID_PATH_TAG of a device: choose this device if possible. The ID_PATH_TAG is a tag filled by udev. You can check it with 'udevadm info' on the device node. For example it can be "pci-0000_01_00_0". Render-nodes need to be enabled to choose another gpu, and they need to have the ID_PATH_TAG advertised. It is possible for not very recent udev that the tag is not advertised for render-nodes, then ones need to add a file containing: SUBSYSTEM=="drm", IMPORT{builtin}="path_id" in /etc/udev/rules.d/ Signed-off-by: Axel Davy <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* drirc: Add string supportAxel Davy2014-07-012-1/+35
| | | | | Reviewed-by: Dave Airlie <[email protected]> Signed-off-by: Axel Davy <[email protected]>
* dri: remove GL types from config queriesDave Airlie2014-07-011-3/+3
| | | | | | | | This in theory changes ABI for the boolean->bool I think, but nothing in the tree uses configQueryb AFAICS. Reviewed-by: Axel Davy <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* dri/xmlconfig: remove GL types.Dave Airlie2014-07-012-100/+100
| | | | | | | | | | This just drops all the GL types from the xmlconfig and use std C types from stdint and stdbool. v2: drop further double and header include. Reviewed-by: Eric Anholt <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
* dri3: cache pointer to back instead of looking up.Dave Airlie2014-07-011-8/+9
| | | | | | This is just prep work for the dri3 prime patches. Signed-off-by: Dave Airlie <[email protected]>
* targets/egl-static: use inline_drm_helper and Automake.inc helpersEmil Velikov2014-06-307-268/+54
| | | | | | | | | | | | | Update all three build systems, and add freedreno to the android build. Pending future work on the ST we can convert egl-static to provide either static or dynamic access to the pipe-drivers. There is no functional change with this patch. v2: Don't add freedreno to android build, drop the wrapper winsys. Cc: Chia-I Wu <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* targets/gbm: convert to static/shared pipe-driverEmil Velikov2014-06-306-121/+160
| | | | | | | | | | Move the gbm "target" code to the state-tracker, similar to other - dri, omx, vdpau... ST. v2: Drop inclusion of the wrapper winsys and softpipe/llvmpipe. Cc: Chia-I Wu <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* targets/xa: provide alternative(static) xa targetEmil Velikov2014-06-304-15/+107
| | | | | | | | | | | | | | | | Now we can build the xa target (libxatracker) with either static pipe-drivers or shared ones. Currently we default to static. - Remove the unused CFLAGS/CPPFLAGS. - Use GALLIUM_TARGET_CFLAGS where applicable. v2: Update the printout messages at configure. v3: Drop inclusion of the wrapper winsys and softpipe/llvmpipe. Cc: Jakob Bornecrantz <[email protected]> Cc: Rob Clark <[email protected]> Cc: Thomas Hellstrom <[email protected]> Signed-off-by: Emil Velikov <[email protected]>
* i965/disasm: Fix INTEL_DEBUG=fs on Broadwell for ARB_fp applications.Kenneth Graunke2014-06-301-1/+1
| | | | | | | | | | | Apparently INTEL_DEBUG=fs has crashed on Broadwell for anything using ARB_fragment_program since commit 9cee3ff5. We need to NULL-check the right field. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]> Cc: "10.2" <[email protected]>
* i965/disasm: Delete gen8_disasm.c.Kenneth Graunke2014-06-303-1031/+0
| | | | | | | | The functionality has been merged into brw_disasm.c; use that instead. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Stop using gen8_disassemble in favor of brw_disassemble.Kenneth Graunke2014-06-308-42/+8
| | | | | | | | | At this point, brw_disassemble can do everything gen8_disassemble can do - and, thanks to the new brw_inst API, it supports all generations. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Improve render target write message disassembly.Kenneth Graunke2014-06-301-30/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we decoded render target write messages as: render ( RT write, 0, 16, 12, 0) mlen 8 rlen 0 which made you remember (or look up) what the numbers meant: 1. The binding table index 2. The raw message control, undecoded: - Last Render Target Select - Slot Group Select - Message Type (SIMD8, normal SIMD16, SIMD16 replicate data, ...) 3. The dataport message type, again (already decoded as "RT write") 4. The write commit bit (0 or 1) Needless to say, having to decipher that yourself is annoying. Now, we do: render RT write SIMD16 LastRT Surface = 0 mlen 8 rlen 0 with optional "Hi" and "WriteCommit" for slot group/write commit. Thanks to the new brw_inst API, we can also stop duplicating code on a per-generation basis. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Rename msg_target to SFID.Kenneth Graunke2014-06-301-12/+8
| | | | | | | | | | We haven't used the name "message target" in a while - there are a lot of things called "target", and it gets confusing. SFID ("Shared Function ID") is the term commonly used in the modern documentation. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Fix typo in RT UNORM write message.Kenneth Graunke2014-06-301-1/+1
| | | | | | | | | The name of this message is "Render Target UNORM Write" (Sandybridge PRM, Volume 4 Part 1, Page 210). Drop the bogus 'c'. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Use Gen6+ SFID case labels.Kenneth Graunke2014-06-301-2/+4
| | | | | | | | | | Most developers will recognize the Gen6+ SFID names more quickly than the Gen4-5 ones. Given that they're the same values, just use the new names. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: "Handle" Gen8+ HF/DF immediate cases.Kenneth Graunke2014-06-301-0/+7
| | | | | | | | | | | We should print something properly, but I'm not sure how to properly print an HF, and we don't have any DFs today to test with. This is at least better than the current Gen8 disassembler, which would simply assert fail. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/disasm: Cut piles of duplicate swizzle printing.Kenneth Graunke2014-06-301-89/+26
| | | | | | | | Making a helper function saves us from cut and pasting this four times. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Properly decode negate source modifiers on Broadwell.Kenneth Graunke2014-06-301-4/+49
| | | | | | | | | This is a port of Abdiel's 6f9f916b9b042a294813ab0542390846a38739da to brw_disasm.c. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Improve disassembly of atomic messages on Haswell+.Kenneth Graunke2014-06-301-7/+21
| | | | | | | | | | This backports the atomic message disassembly support from gen8_disasm.c, which additionally offers support for decoding atomic surface read/write messages, and showing SIMD modes and other details. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Actually disassemble Gen7+ URB opcodes.Kenneth Graunke2014-06-301-3/+19
| | | | | | | | | | | I never bothered implementing the disassembler for Gen7+ URB opcodes, so we were just disassembling them as Ironlake/Sandybridge ones. This looked pretty bad when running Paul's GS EndPrimitive tests, as the "write OWord" message was decoded at ff_sync, which doesn't exist. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Decode Broadwell's invm/rsqrtm math functions.Kenneth Graunke2014-06-301-0/+2
| | | | | | | | We don't use these yet, but we may as well disassemble them. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Properly disassemble the "atomic" ThreadCtrl value.Kenneth Graunke2014-06-301-2/+3
| | | | | | Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Properly disassemble all32h/any32h align1 predicates.Kenneth Graunke2014-06-301-11/+13
| | | | | | | | | While we're adding things, use symbolic constants rather than magic numbers. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Add #defines for any32h/all32h predication.Kenneth Graunke2014-06-301-0/+2
| | | | | | | | | | These have existed since Ivybridge. We don't use them today, but the Gen8+ disassembler supports them, and I'd like to use symbolic names rather than magic numbers. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Mark ELSE as having UIP on Gen8+.Kenneth Graunke2014-06-301-0/+1
| | | | | | | | | This makes brw_disasm.c able to disassemble ELSE instructions correctly on Broadwell. (gen8_disasm.c already handles this correctly.) Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Properly disassemble jump targets on Gen4-5.Kenneth Graunke2014-06-301-0/+15
| | | | | | | | | | | | | | | | | | | Previously, our dissasembly for flow control instructions looked like: 0x00000040: else(8) ip 65540D { align16 switch }; It didn't print InstCount properly for ELSE/ENDIF, and didn't even attempt to disassemble PopCount. Now it looks like: 0x00000040: else(8) Jump: 4 Pop: 1 { align16 switch }; which is much more readable. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Improve disassembly of jump targets on Gen6+.Kenneth Graunke2014-06-301-18/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | Previously, flow control instructions generated output like: (+f0) if(8) 12 8 null 0x000c0008UD { align16 WE_normal 1Q }; which included a dissasembly of the register fields, even though those are meaningless for flow control instructions---those bits are reused for another purpose. It also wasn't immediately obvious which number was UIP and which was JIP. With this patch, we instead output: (+f0) if(8) JIP: 8 UIP: 12 { align16 WE_normal 1Q }; which is much clearer. The patch also introduces has_uip/has_jip helper functions which clear up a some generation/opcode checking mess. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Add support for new Gen8+ register types.Kenneth Graunke2014-06-301-16/+24
| | | | | | | | | While we're at it, use proper names rather than magic numbers for the existing fields. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965: Restyle brw_disasm.c.Kenneth Graunke2014-06-301-1234/+1231
| | | | | | | | | | | | | | | | | | | | brw_disasm.c basically wasn't following the Mesa coding style at all. It used 4-space indent instead of 3-space, didn't cuddle braces, didn't put function return types on a separate line, put extra spaces in function calls (between the name and parenthesis), and a number of other things. This made it fairly obnoxious to work on, since my editor is configured to follow Mesa style in the Mesa source repository. Fixing it to follow a consistent style now should save time dealing with it later. These modifications were originally generated by: $ indent -br -i3 -npcs -ce -cs -l80 --no-tabs with some manual changes afterwards to fit our style better. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Create an "opcode" temporary.Kenneth Graunke2014-06-301-31/+30
| | | | | | | | This saves typing brw_inst_opcode(brw, inst) everywhere. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* i965/disasm: Eliminate opcode pointer.Kenneth Graunke2014-06-301-8/+7
| | | | | | | | | opcode is just a pointer to opcode_descs; we may as well use that directly. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]> Reviewed-by: Kristian Høgsberg <[email protected]>
* Remove the ATI_envmap_bumpmap extensionJason Ekstrand2014-06-3031-860/+7
| | | | | | | | | | | As far as I can tell, the Intel mesa driver is the only driver in the world still supporting this legacy extension. If someone wants to do bump mapping, they can use shaders. Signed-off-by: Jason Ekstrand <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]> [v1] Reviewed-by: Chris Forbes <[email protected]> [v2] Reviewed-by: Ian Romanick <[email protected]> [v3]
* meta: Use AMD_vertex_shader_layer instead of a GS for layered clears.Kenneth Graunke2014-06-301-37/+16
| | | | | | | | | | | | | | | | | | | | | | | On i965, enabling and disabling the GS is not free: you have to do a full pipeline stall, reconfigure the URB and push constant space, and emit a bunch of state. Most clears aren't layered, so the GS isn't needed in the common case. But we turned it on universally. Using AMD_vertex_shader_layer allows us to skip setting up the GS altogether, while achieving the same effect. According to Ilia, current nVidia GPUs can't do AMD_vertex_shader_layer. However, since nouveau is Gallium-based, they're unlikely to ever care about this path. Intel and AMD GPUs both support the extension. Since i965 is the only driver using this path which does layered rendering, we may as well target it at that. v2: Improve commit message. No code changes. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Chris Forbes <[email protected]> Reviewed-by: Jordan Justen <[email protected]>
* i965: Enable vertex streams up to MAX_VERTEX_STREAMS.Iago Toral Quiroga2014-06-301-0/+4
| | | | Reviewed-by: Ian Romanick <[email protected]>
* mesa: Enable simultaneous queries on different streams.Iago Toral Quiroga2014-06-302-10/+11
| | | | | | | | It should be possible to query the number of primitives written to each individual stream by a geometry shader in a single draw call. For that we need to have up to MAX_VERTEX_STREAM separate query objects. Reviewed-by: Ian Romanick <[email protected]>