summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i965: enable ARB_shader_clock on gen7+Emil Velikov2015-10-302-0/+2
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965: Implement nir_intrinsic_shader_clockEmil Velikov2015-10-302-0/+19
| | | | | | | | | | | | | | | v2: - Add a few const qualifiers for good measure. - Drop unneeded retype()s (Matt) - Convert timestamp to SIMD8/16, as fs_visitor::get_timestamp() returns SIMD4 (Connor) v3: - Remove unneeded temporary + MOV (Connor) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/fs: move the fs_reg::smear() from get_timestamp() to the callersEmil Velikov2015-10-301-12/+17
| | | | | | | | | | | | | We're about to reuse get_timestamp() for the nir_intrinsic_shader_clock. In the latter the generalisation does not apply, so move the smear() where needed. This also makes the function analogous to the vec4 one. v2: Tweak the comment - The caller -> We (Matt, Connor). v3: More comment tweaks (Connor) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nir: add shader_clock intrinsicEmil Velikov2015-10-302-0/+14
| | | | | | | | | | | | v2: Add flags and inline comment/description. v3: None of the input/outputs are variables v4: Drop clockARB reference, relate code motion barrier comment wrt intrinsic flag. v5: Drop the "thus we can eliminate..." comment (Connor) Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Connor Abbott <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: add support for the clock2x32ARB functionEmil Velikov2015-10-301-0/+43
| | | | | | | v2: correctly set the return type Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* glsl: add ARB_shader_clock infrastructureEmil Velikov2015-10-303-0/+6
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* mesa: add infra for ARB_shader_clockEmil Velikov2015-10-302-0/+2
| | | | | Signed-off-by: Emil Velikov <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* nv50: do not create an invalid HW query typeSamuel Pitoiset2015-10-302-12/+30
| | | | | | | | While we are at it, store the rotate offset for occlusion queries to nv50_hw_query like on nvc0. Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50: move HW queries to nv50_query_hw.c/h filesSamuel Pitoiset2015-10-308-349/+476
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50: move nva0_so_target_save_offset() to its correct locationSamuel Pitoiset2015-10-303-21/+18
| | | | | Signed-off-by: Samuel Pitoiset <[email protected]> Reviewed-by: Pierre Moreau <[email protected]>
* nv50: add a header file for nv50_querySamuel Pitoiset2015-10-306-40/+49
| | | | | | | | | Like for nvc0, this will allow to split different types of queries and to prepare the way for both global performance counters and MP counters. While we are at it, make use of nv50_query struct instead of pipe_query. Signed-off-by: Samuel Pitoiset <[email protected]>
* st/va: add support to export a surface as dmabufJulien Isorce2015-10-303-2/+144
| | | | | | | | | | | | | | | | | | | | I.e. implements: VaAcquireBufferHandle VaReleaseBufferHandle for memory of type VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME And apply relatives change to: vlVaMapBuffer vlVaUnMapBuffer vlVaDestroyBuffer Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver Tested with gstreamer-vaapi with nouveau driver. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: implement VaDeriveImageJulien Isorce2015-10-303-5/+138
| | | | | | | | | | | | | | | | | | | | | And apply relatives change to: vlVaBufferSetNumElements vlVaCreateBuffer vlVaMapBuffer vlVaUnmapBuffer vlVaDestroyBuffer vlVaPutImage It is unfortunate that there is no proper va buffer type and struct for this. Only possible to use VAImageBufferType which is normally used for normal user data array. On of the consequences is that it is only possible VaDeriveImage is only useful on surfaces backed with contiguous planes. Implementation inspired from cgit.freedesktop.org/vaapi/intel-driver Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: add more errors checks in vlVaBufferSetNumElements and vlVaMapBufferJulien Isorce2015-10-301-0/+6
| | | | | Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: add headless support, i.e. VA_DISPLAY_DRMJulien Isorce2015-10-302-6/+73
| | | | | | | | | This patch allows to use gallium vaapi without requiring a X server running for your second graphic card. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: handle Video Post Processing for configsJulien Isorce2015-10-302-2/+25
| | | | | | | | | | | | | | Add support for VA_PROFILE_NONE and VAEntrypointVideoProc in the 4 following functions: vlVaQueryConfigProfiles vlVaQueryConfigEntrypoints vlVaCreateConfig vlVaQueryConfigAttributes Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: add colospace conversion through Video Post ProcessingJulien Isorce2015-10-304-52/+259
| | | | | | | | | | | | | | | | | | | | | | | Add support for VPP in the following functions: vlVaCreateContext vlVaDestroyContext vlVaBeginPicture vlVaRenderPicture vlVaEndPicture Add support for VAProcFilterNone in: vlVaQueryVideoProcFilters vlVaQueryVideoProcFilterCaps vlVaQueryVideoProcPipelineCaps Add handleVAProcPipelineParameterBufferType helper. One application is: VASurfaceNV12 -> gstvaapipostproc -> VASurfaceRGBA Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: implement dmabuf import for VaCreateSurfaces2Julien Isorce2015-10-301-1/+96
| | | | | | | | For now it is limited to RGBA, BGRA, RGBX, BGRX surfaces. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: implement VaCreateSurfaces2 and VaQuerySurfaceAttributesJulien Isorce2015-10-303-52/+249
| | | | | | | | | | | | | | | | | | Inspired from http://cgit.freedesktop.org/vaapi/intel-driver/ especially src/i965_drv_video.c::i965_CreateSurfaces2. This patch is mainly to support gstreamer-vaapi and tools that uses this newer libva API. The first advantage of using VaCreateSurfaces2 over existing VaCreateSurfaces, is that it is possible to select which the pixel format for the surface. Indeed with the simple VaCreateSurfaces function it is only possible to create a NV12 surface. It can be useful to create a RGBA surface to use with video post processing. The avaible pixel formats can be query with VaQuerySurfaceAttributes. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: do not destroy old buffer when new one failedJulien Isorce2015-10-301-6/+13
| | | | | | | | | | | If formats are not the same vlVaPutImage re-creates the video buffer with the right format. But if the creation of this new video buffer fails then the surface looses its current buffer. Let's just destroy the previous buffer on success. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* st/va: properly defines VAImageFormat formats and improve VaCreateImageJulien Isorce2015-10-303-11/+57
| | | | | | | | | | | | | | | | | Added PIPE_VIDEO_CHROMA_FORMAT_NONE in p_format.h and return it by default in ChromaToPipe. Renamed YCbCrToPipe to VaFourccToPipeFormat because it now contains RGB. Implemented PipeFormatToVaFourcc which will be used later in VlVaDeriveImage. Note that gstreamer-vaapi check all the VAImageFormat fields. Signed-off-by: Julien Isorce <[email protected]> Reviewed-by: Emil Velikov <[email protected]> Reviewed-by: Christian König <[email protected]>
* main: fix basename match's check if it's an array or structSamuel Iglesias Gonsalvez2015-10-301-1/+2
| | | | | | | | | | | | | | | | | | Commit 4565b6f did not update the basename match's check for the case that string would exactly match the name of the variable if the suffix "[0]" were appended to it. Fixes two dEQP-GLES31 tests: dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array dEQP-GLES31.functional.program_interface_query.shader_storage_block.resource_list.block_array_single_element v2: - Change the position of rname_has_array_index_zero to avoid an out-of-bounds read. Reported by Tapani Pälli. Signed-off-by: Samuel Iglesias Gonsalvez <[email protected]> Reviewed-by: Tapani Pälli <[email protected]>
* i965: Fix invalid memory accesses after resizing brw_codegen's store tableKristian Høgsberg2015-10-301-4/+13
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* i965/sched: use liveness analysis for computing register pressureConnor Abbott2015-10-301-56/+229
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, we were using some heuristics to try and detect when a write was about to begin a live range, or when a read was about to end a live range. We never used the liveness analysis information used by the register allocator, though, which meant that the scheduler's and the allocator's ideas of when a live range began and ended were different. Not only did this make our estimate of the register pressure benefit of scheduling an instruction wrong in some cases, but it was preventing us from knowing the actual register pressure when scheduling each instruction, which we want to have in order to switch to register pressure scheduling only when the register pressure is too high. This commit rewrites the register pressure tracking code to use the same model as our register allocator currently uses. We use the results of liveness analysis, as well as the compute_payload_ranges() function that we split out in the last commit. This means that we compute live ranges twice on each round through the register allocator, although we could speed it up by only recomputing the ranges and not the live in/live out sets after scheduling, since we only shuffle around instructions within a single basic block when we schedule. Shader-db results on bdw: total instructions in shared programs: 7130187 -> 7129880 (-0.00%) instructions in affected programs: 1744 -> 1437 (-17.60%) helped: 1 HURT: 1 total cycles in shared programs: 172535126 -> 172473226 (-0.04%) cycles in affected programs: 11338636 -> 11276736 (-0.55%) helped: 876 HURT: 873 LOST: 8 GAINED: 0 v2: use regs_read() in more places. Reviewed-by: Jason Ekstrand <[email protected]>
* i965/fs: split out calculation of payload live rangesConnor Abbott2015-10-302-22/+31
| | | | | | | | We'll need this for the scheduler too, since it wants to know when the live ranges of payload registers end in order to model them in our register pressure calculations. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: dump scheduling cycle estimatesConnor Abbott2015-10-304-10/+36
| | | | | | | | | | | | | The heuristic we're using is rather lame, since it assumes everything is non-uniform and loops execute 10 times, but it should be enough for measuring improvements in the scheduler that don't result in a change in the number of instructions. v2: - Switch loops and cycle counts to be compatible with older shader-db. - Make loop heuristic 10x to match with spilling code. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: always run the post-RA schedulerConnor Abbott2015-10-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Before, we would only do scheduling after register allocation if we spilled, despite the fact that the pre-RA scheduler was only supposed to be for register pressure and set the latencies of every instruction to 1. This meant that unless we spilled, which we rarely do, then we never considered instruction latencies at all, and we usually never bothered to try and hide texture fetch latency. Although a later commit removes the setting the latency to 1 part, we still want to always run the post-RA scheduler since it's able to take the false dependencies that the register allocator creates into account, and it can be more aggressive than the pre-RA scheduler since it doesn't have to worry about register pressure at all. Test master post-ra-sched diff %diff bench_OglPSBump2 396.730 402.386 5.656 +1.400% bench_OglPSBump8 244.370 247.591 3.221 +1.300% bench_OglPSPhong 241.117 242.002 0.885 +0.300% bench_OglPSPom 59.555 59.725 0.170 +0.200% bench_OglShMapPcf 86.149 102.346 16.197 +18.800% bench_OglVSTangent 388.849 395.489 6.640 +1.700% bench_trex 65.471 65.862 0.390 +0.500% bench_trexoff 69.562 70.150 0.588 +0.800% bench_heaven 25.179 25.254 0.074 +0.200% Reviewed-by: Jason Ekstrand <[email protected]>
* i965/sched: write-after-read dependencies are freeConnor Abbott2015-10-301-5/+5
| | | | | | | | | | | | Although write-after-write dependencies have the same latency as read-after-write dependencies due to how the register scoreboard works, write-after-read dependencies aren't checked by the EU at all, so they're purely a constraint on how the scheduler can order the instructions. v2: fix accumulator dependencies too. Reviewed-by: Jason Ekstrand <[email protected]>
* i965: fix cycle estimates when there's a pipeline stallConnor Abbott2015-10-301-7/+8
| | | | | | | | | | | | The issue time for an instruction is how many cycles it takes to actually put it into the pipeline. If there's a pipeline stall that causes the instruction to be delayed, we should first take that into account to figure out when the instruction would start executing and *then* add the issue time. The old code had it backwards, and so we would underestimate the total time whenever we thought there would be a pipeline stall by up to the issue time of the instruction. Reviewed-by: Jason Ekstrand <[email protected]>
* vc4: Allow user index buffers, to avoid slow readback for shadow IBs.Eric Anholt2015-10-294-10/+25
| | | | | Improves low-settings openarena performance by 31.9975% +/- 0.659931% (n=7).
* nv50: mark contexts shareable, compile at creation timeIlia Mirkin2015-10-292-1/+4
| | | | Signed-off-by: Ilia Mirkin <[email protected]>
* nv50: allow per-sample interpolation to be forced via rastIlia Mirkin2015-10-298-9/+52
| | | | | | | | Uses the same technique as for nvc0 of fixups before upload, and evicting in case of state change. Removes one source of variants kept by st/mesa. Signed-off-by: Ilia Mirkin <[email protected]>
* i965: Add INTEL_DEBUG=nocompact to disable instruction compaction.Matt Turner2015-10-293-0/+5
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Add INTEL_DEBUG=hex to print the hex with the disassembly.Matt Turner2015-10-293-1/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Print the type and writemask on null destinations.Matt Turner2015-10-291-1/+1
| | | | | | | | These are often useful in debugging, and the writemask (actually "Channel Enables") determines more than just what goes into the destination. Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Test fixed_hw_reg.file against BRW_IMMEDIATE_VALUE, not IMM.Matt Turner2015-10-291-4/+4
| | | | | | | | No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is the hardware value and IMM was the IR value -- and you can see that BRW_IMMEDIATE_VALUE was correctly used in the context of this patch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Test against BRW_IMMEDIATE_VALUE, not IMM.Matt Turner2015-10-291-1/+1
| | | | | | | | No functional change, since they were both 3, but BRW_IMMEDIATE_VALUE is the hardware value and IMM was the IR value -- and you can see that BRW_IMMEDIATE_VALUE was correctly used in the context of this patch. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/fs: Use group(4, 0) to emit an exec-size 4 MOV.Matt Turner2015-10-291-2/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/cfg: Handle no-idom case in cfg_t::dump_domtree().Matt Turner2015-10-291-1/+3
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965/disasm: Remove unused _addr_mode argument from src_ia1().Matt Turner2015-10-291-3/+0
| | | | Reviewed-by: Kenneth Graunke <[email protected]>
* i965: Set correct field for indirect align16 addrimm.Matt Turner2015-10-291-1/+1
| | | | | | This has been wrong since the initial import of the i965 driver. Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Drop brw_set_default_* before popping insn state.Matt Turner2015-10-291-3/+0
| | | | | Reviewed-by: Ben Widawsky <[email protected]> Reviewed-by: Kenneth Graunke <[email protected]>
* i965/vec4: Remove unnecessary #includes from the generator.Matt Turner2015-10-291-8/+0
| | | | Reviewed-by: Iago Toral Quiroga <[email protected]>
* r600: enable SB for geom shaders on pre-evergreenDave Airlie2015-10-301-4/+0
| | | | | | | | | | I've checked with piglit and one tests fails, but it fails on evergreen as well, so will get fixed later. Otherwise SB seems to be working fine for geom shaders on my rv635. Signed-off-by: Dave Airlie <[email protected]>
* i965/vec4: Eliminate the vec4_generator class altogether.Kenneth Graunke2015-10-291-284/+180
| | | | | | | | | | We really weren't taking advantage of vec4_generator being a class. By adding a "p" parameter to the helper methods, and "prog_data" to ones which need binding table information, we can convert everything to static functions. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Move vec4_generator class definition into the .cpp file.Kenneth Graunke2015-10-292-111/+110
| | | | | | | | | The public API for the generator is brw_vec4_generate_code(); nobody actually needs to use the class. This means we can extend it without triggering the recompiles associated with altering brw_vec4.h. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Wrap vec4_generator in a C function.Kenneth Graunke2015-10-294-9/+37
| | | | | | | | | vec4_generator is a class for convenience, but only exports a single method as its public API. It makes much more sense to just export a single function. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* i965/vec4: Convert src_reg/dst_reg to brw_reg at the end of the visitor.Kenneth Graunke2015-10-294-110/+92
| | | | | | | | | | | | | | | | | This patch makes the visitor convert registers to the HW_REG file at the very end, after register allocation, post-RA scheduling, and dependency control flagging. After that, everything is in fixed brw_regs. This simplifies the code generator, as it can just use the hardware registers rather than having to interpret our abstract files. In particular, interpreting the UNIFORM file meant reading prog_data to figure out where push constants are supposed to start. Having the part of the code that performs register allocation also translate everything to hardware registers seems sensible. Signed-off-by: Kenneth Graunke <[email protected]> Reviewed-by: Matt Turner <[email protected]>
* r600g: Fix special negative immediate constants when using ABS modifier.Ivan Kalvachev2015-10-293-6/+6
| | | | | | | | | | | | | | | | | | | | | Some constants (like 1.0 and 0.5) could be inlined as immediate inputs without using their literal value. The r600_bytecode_special_constants() function emulates the negative of these constants by using NEG modifier. However some shaders define -1.0 constant and want to use it as 1.0. They do so by using ABS modifier. But r600_bytecode_special_constants() set NEG in addition to ABS. Since NEG modifier have priority over ABS one, we get -|1.0| as result, instead of |1.0|. The patch simply prevents the additional switching of NEG when ABS is set. [According to Ivan Kalvachev, this bug was fond via https://github.com/iXit/Mesa-3D/issues/126 and https://github.com/iXit/Mesa-3D/issues/127] Signed-off-by: Ivan Kalvachev <[email protected]> Reviewed-by: Nicolai Hähnle <[email protected]> CC: <[email protected]>
* st/mesa: fix mipmap generation for immutable textures with incomplete pyramidsNicolai Hähnle2015-10-291-32/+36
| | | | | | | | | | | | | | | Without the clamping by NumLevels, the state tracker would reallocate the texture storage (incorrect) and even fail to copy the base level image after reallocation, leading to the graphical glitch of https://bugs.freedesktop.org/show_bug.cgi?id=91993 . A piglit test has been submitted for review as well (subtest of arb_texture_storage-texture-storage). v2: also bypass all calls to st_finalize_texture (suggested by Marek Olšák) Cc: [email protected] Reviewed-by: Marek Olšák <[email protected]>