| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
See commit e07c9a288.
Reviewed-by: Kristian Høgsberg <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits, but that is an assumption OpenGL drivers (or any dynamic library for that matter) can't afford to make as there are many closed- and open- source application binaries out there that only assume 4-byte stack alignment.
V4: fix comment and indentation
V3: move all sse4.1 build flag config to the same location
and add comment as to why we need to do the realign
V2: use $target_cpu rather than $host_cpu
and setup build flags in config rather than makefile
https://bugs.freedesktop.org/show_bug.cgi?id=86788
Signed-off-by: Timothy Arceri <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
CC: "10.4" <[email protected]>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Nowadays GCC assumes stack pointer is 16-byte aligned even on 32-bits,
but that is an assumption OpenGL drivers (or any dynamic library for
that matter) can't afford to make as there are many closed- and open-
source application binaries out there that only assume 4-byte stack
alignment.
This fix uses force_align_arg_pointer GCC attribute, and is only a
stop-gap measure.
The right fix would be to pass -mstackrealign or
-mincoming-stack-boundary=2 to all source fails that use any -msse*
option, as there is no way to guarantee if/when GCC will decide to spill
SSE registers to the stack.
https://bugs.freedesktop.org/show_bug.cgi?id=86788
Reviewed-by: Brian Paul <[email protected]>
|
|
Makes use of SSE 4.1 to speed up compute of min and max elements.
Callgrind cpu usage results from pts benchmarks:
Openarena 0.8.8: 3.67% -> 1.03%
UrbanTerror: 2.36% -> 0.81%
V5:
- actually make use of the optimisation in android (Emil Velikov)
- set a better array size limit for using SSE and added TODO
V4:
- fixed bugs with incrementing pointer and updating counters
V3:
- Removed sse_minmax.c from Makefile.sources
- handle the first few values without SSE until the pointer is aligned
and use _mm_load_si128 rather than _mm_loadu_si128
- guard the call to the SSE code better at build time
V2:
- removed GL* types
- use _mm_store_si128() rather than _mm_store_ps()
- add runtime check for SSE
- use aligned attribute for local mix/max
- bunch of tidyups
Reviewed-by: Juha-Pekka Heikkila <[email protected]>
Reviewed-by: Matt Turner <[email protected]>
Signed-off-by: Timothy Arceri <[email protected]>
|