This functionality is better accessed through tools like oprofile.
Originally committed as revision 23808 to svn://svn.ffmpeg.org/ffmpeg/trunk
--arch=ARCH select architecture [$arch]
--cpu=CPU select the minimum required CPU (affects
instruction selection, may crash on older CPUs)
- --enable-powerpc-perf enable performance report on PPC
- (requires enabling PMC)
--disable-asm disable all assembler optimizations
--disable-altivec disable AltiVec optimizations
--disable-amd3dnow disable 3DNow! optimizations
nonfree
pic
postproc
- powerpc_perf
rdft
runtime_cpudetect
shared
echo "AltiVec enabled ${altivec-no}"
echo "PPC 4xx optimizations ${ppc4xx-no}"
echo "dcbzl available ${dcbzl-no}"
- echo "performance report ${powerpc_perf-no}"
fi
if enabled sparc; then
echo "VIS enabled ${vis-no}"
+++ /dev/null
-FFmpeg & evaluating performance on the PowerPC Architecture HOWTO
-
-(c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
-
-
-
-I - Introduction
-
-The PowerPC architecture and its SIMD extension AltiVec offer some
-interesting tools to evaluate performance and improve the code.
-This document tries to explain how to use those tools with FFmpeg.
-
-The architecture itself offers two ways to evaluate the performance of
-a given piece of code:
-
-1) The Time Base Registers (TBL)
-2) The Performance Monitor Counter Registers (PMC)
-
-The first ones are always available, always active, but they're not very
-accurate: the registers increment by one every four *bus* cycles. On
-my 667 Mhz tiBook (ppc7450), this means once every twenty *processor*
-cycles. So we won't use that.
-
-The PMC are much more useful: not only can they report cycle-accurate
-timing, but they can also be used to monitor many other parameters,
-such as the number of AltiVec stalls for every kind of instruction,
-or instruction cache misses. The downside is that not all processors
-support the PMC (all G3, all G4 and the 970 do support them), and
-they're inactive by default - you need to activate them with a
-dedicated tool. Also, the number of available PMC depends on the
-procesor: the various 604 have 2, the various 75x (aka. G3) have 4,
-and the various 74xx (aka G4) have 6.
-
-*WARNING*: The PowerPC 970 is not very well documented, and its PMC
-registers are 64 bits wide. To properly notify the code, you *must*
-tune for the 970 (using --tune=970), or the code will assume 32 bit
-registers.
-
-
-II - Enabling FFmpeg PowerPC performance support
-
-This needs to be done by hand. First, you need to configure FFmpeg as
-usual, but add the "--powerpc-perf-enable" option. For instance:
-
-#####
-./configure --prefix=/usr/local/ffmpeg-svn --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
-#####
-
-This will configure FFmpeg to install inside /usr/local/ffmpeg-svn,
-compiling with gcc-3.3 (you should try to use this one or a newer
-gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of
-thumb, those at 550Mhz and more). It will also enable the PMC.
-
-You may also edit the file "config.h" to enable the following line:
-
-#####
-// #define ALTIVEC_USE_REFERENCE_C_CODE 1
-#####
-
-If you enable this line, then the code will not make use of AltiVec,
-but will use the reference C code instead. This is useful to compare
-performance between two versions of the code.
-
-Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h":
-
-#####
-#define POWERPC_NUM_PMC_ENABLED 4
-#####
-
-If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more
-PMC than available on your CPU!
-
-Then, simply compile FFmpeg as usual (make && make install).
-
-
-
-III - Using FFmpeg PowerPC performance support
-
-This FFmeg can be used exactly as usual. But before exiting, FFmpeg
-will dump a per-function report that looks like this:
-
-#####
-PowerPC performance report
- Values are from the PMC registers, and represent whatever the
- registers are set to record.
- Function "gmc1_altivec" (pmc1):
- min: 231
- max: 1339867
- avg: 558.25 (255302)
- Function "gmc1_altivec" (pmc2):
- min: 93
- max: 2164
- avg: 267.31 (255302)
- Function "gmc1_altivec" (pmc3):
- min: 72
- max: 1987
- avg: 276.20 (255302)
-(...)
-#####
-
-In this example, PMC1 was set to record CPU cycles, PMC2 was set to
-record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec
-Issue Stalls.
-
-The function "gmc1_altivec" was monitored 255302 times, and the
-minimum execution time was 231 processor cycles. The max and average
-aren't much use, as it's very likely the OS interrupted execution for
-reasons of its own :-(
-
-With the exact same settings and source file, but using the reference C
-code we get:
-
-#####
-PowerPC performance report
- Values are from the PMC registers, and represent whatever the
- registers are set to record.
- Function "gmc1_altivec" (pmc1):
- min: 592
- max: 2532235
- avg: 962.88 (255302)
- Function "gmc1_altivec" (pmc2):
- min: 0
- max: 33
- avg: 0.00 (255302)
- Function "gmc1_altivec" (pmc3):
- min: 0
- max: 350
- avg: 0.03 (255302)
-(...)
-#####
-
-592 cycles, so the fastest AltiVec execution is about 2.5x faster than
-the fastest C execution in this example. It's not perfect but it's not
-bad (well I wrote this function so I can't say otherwise :-).
-
-Once you have that kind of report, you can try to improve things by
-finding what goes wrong and fixing it; in the example above, one
-should try to diminish the number of AltiVec stalls, as this *may*
-improve performance.
-
-
-
-IV) Enabling the PMC in Mac OS X
-
-This is easy. Use "Monster" and "monster". Those tools come from
-Apple's CHUD package, and can be found hidden in the developer web
-site & FTP site. "MONster" is the graphical application, use it to
-generate a config file specifying what each register should
-monitor. Then use the command-line application "monster" to use that
-config file, and enjoy the results.
-
-Note that "MONster" can be used for many other things, but it's
-documented by Apple, it's not my subject.
-
-If you are using CHUD 4.4.2 or later, you'll notice that MONster is
-no longer available. It's been superseeded by Shark, where
-configuration of PMCs is available as a plugin.
-
-
-
-V) Enabling the PMC on Linux
-
-On linux you may use oprofile from http://oprofile.sf.net, depending on the
-version and the cpu you may need to apply a patch[1] to access a set of the
-possibile counters from the userspace application. You can always define them
-using the kernel interface /dev/oprofile/* .
-
-[1] http://dev.gentoo.org/~lu_zero/development/oprofile-g4-20060423.patch
-
---
-Romain Dolbeau <romain@dolbeau.org>
-Luca Barbato <lu_zero@gentoo.org>
av_free(video_standard);
-#if CONFIG_POWERPC_PERF
- void powerpc_display_perf_report(void);
- powerpc_display_perf_report();
-#endif /* CONFIG_POWERPC_PERF */
-
for (i=0;i<AVMEDIA_TYPE_NB;i++)
av_free(avcodec_opts[i]);
av_free(avformat_opts);
#include <altivec.h>
#endif
#include "libavcodec/dsputil.h"
-#include "dsputil_ppc.h"
#include "util_altivec.h"
#include "types_altivec.h"
#include "dsputil_altivec.h"
/* next one assumes that ((line_size % 16) == 0) */
void put_pixels16_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
{
-POWERPC_PERF_DECLARE(altivec_put_pixels16_num, 1);
register vector unsigned char pixelsv1, pixelsv2;
register vector unsigned char pixelsv1B, pixelsv2B;
register vector unsigned char pixelsv1C, pixelsv2C;
register int line_size_3 = line_size + line_size_2;
register int line_size_4 = line_size << 2;
-POWERPC_PERF_START_COUNT(altivec_put_pixels16_num, 1);
// hand-unrolling the loop by 4 gains about 15%
// mininum execution time goes from 74 to 60 cycles
// it's faster than -funroll-loops, but using
block +=line_size_4;
}
#endif
-POWERPC_PERF_STOP_COUNT(altivec_put_pixels16_num, 1);
}
/* next one assumes that ((line_size % 16) == 0) */
#define op_avg(a,b) a = ( ((a)|(b)) - ((((a)^(b))&0xFEFEFEFEUL)>>1) )
void avg_pixels16_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
{
-POWERPC_PERF_DECLARE(altivec_avg_pixels16_num, 1);
register vector unsigned char pixelsv1, pixelsv2, pixelsv, blockv;
register vector unsigned char perm = vec_lvsl(0, pixels);
int i;
-POWERPC_PERF_START_COUNT(altivec_avg_pixels16_num, 1);
-
for (i = 0; i < h; i++) {
pixelsv1 = vec_ld( 0, pixels);
pixelsv2 = vec_ld(16,pixels);
pixels+=line_size;
block +=line_size;
}
-
-POWERPC_PERF_STOP_COUNT(altivec_avg_pixels16_num, 1);
}
/* next one assumes that ((line_size % 8) == 0) */
static void avg_pixels8_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h)
{
-POWERPC_PERF_DECLARE(altivec_avg_pixels8_num, 1);
register vector unsigned char pixelsv1, pixelsv2, pixelsv, blockv;
int i;
-POWERPC_PERF_START_COUNT(altivec_avg_pixels8_num, 1);
-
for (i = 0; i < h; i++) {
/* block is 8 bytes-aligned, so we're either in the
left block (16 bytes-aligned) or in the right block (not) */
pixels += line_size;
block += line_size;
}
-
-POWERPC_PERF_STOP_COUNT(altivec_avg_pixels8_num, 1);
}
/* next one assumes that ((line_size % 8) == 0) */
static void put_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
{
-POWERPC_PERF_DECLARE(altivec_put_pixels8_xy2_num, 1);
register int i;
register vector unsigned char pixelsv1, pixelsv2, pixelsavg;
register vector unsigned char blockv, temp1, temp2;
(vector unsigned short)pixelsv2);
pixelssum1 = vec_add(pixelssum1, vctwo);
-POWERPC_PERF_START_COUNT(altivec_put_pixels8_xy2_num, 1);
for (i = 0; i < h ; i++) {
int rightside = ((unsigned long)block & 0x0000000F);
blockv = vec_ld(0, block);
block += line_size;
pixels += line_size;
}
-
-POWERPC_PERF_STOP_COUNT(altivec_put_pixels8_xy2_num, 1);
}
/* next one assumes that ((line_size % 8) == 0) */
static void put_no_rnd_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
{
-POWERPC_PERF_DECLARE(altivec_put_no_rnd_pixels8_xy2_num, 1);
register int i;
register vector unsigned char pixelsv1, pixelsv2, pixelsavg;
register vector unsigned char blockv, temp1, temp2;
(vector unsigned short)pixelsv2);
pixelssum1 = vec_add(pixelssum1, vcone);
-POWERPC_PERF_START_COUNT(altivec_put_no_rnd_pixels8_xy2_num, 1);
for (i = 0; i < h ; i++) {
int rightside = ((unsigned long)block & 0x0000000F);
blockv = vec_ld(0, block);
block += line_size;
pixels += line_size;
}
-
-POWERPC_PERF_STOP_COUNT(altivec_put_no_rnd_pixels8_xy2_num, 1);
}
/* next one assumes that ((line_size % 16) == 0) */
static void put_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h)
{
-POWERPC_PERF_DECLARE(altivec_put_pixels16_xy2_num, 1);
register int i;
register vector unsigned char pixelsv1, pixelsv2, pixelsv3, pixelsv4;
register vector unsigned char blockv, temp1, temp2;
register const vector unsigned char vczero = (const vector unsigned char)vec_splat_u8(0);
register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2);
-POWERPC_PERF_START_COUNT(altivec_put_pixels16_xy2_num, 1);
-
temp1 = vec_ld(0, pixels);
temp2 = vec_ld(16, pixels);
pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels));
block += line_size;
pixels += line_size;
}
-
-POWERPC_PERF_STOP_COUNT(altivec_put_pixels16_xy2_num, 1);
}
/* next one assumes that ((line_size % 16) == 0) */
static void put_no_rnd_pixels16_xy2_altivec(uint8_t * block, const uint8_t * pixels, int line_size, int h)
{
-POWERPC_PERF_DECLARE(altivec_put_no_rnd_pixels16_xy2_num, 1);
register int i;
register vector unsigned char pixelsv1, pixelsv2, pixelsv3, pixelsv4;
register vector unsigned char blockv, temp1, temp2;
register const vector unsigned short vcone = (const vector unsigned short)vec_splat_u16(1);
register const vector unsigned short vctwo = (const vector unsigned short)vec_splat_u16(2);
-POWERPC_PERF_START_COUNT(altivec_put_no_rnd_pixels16_xy2_num, 1);
-
temp1 = vec_ld(0, pixels);
temp2 = vec_ld(16, pixels);
pixelsv1 = vec_perm(temp1, temp2, vec_lvsl(0, pixels));
block += line_size;
pixels += line_size;
}
-
-POWERPC_PERF_STOP_COUNT(altivec_put_no_rnd_pixels16_xy2_num, 1);
}
static int hadamard8_diff8x8_altivec(/*MpegEncContext*/ void *s, uint8_t *dst, uint8_t *src, int stride, int h){
-POWERPC_PERF_DECLARE(altivec_hadamard8_diff8x8_num, 1);
int sum;
register const vector unsigned char vzero =
(const vector unsigned char)vec_splat_u8(0);
register vector signed short temp0, temp1, temp2, temp3, temp4,
temp5, temp6, temp7;
-POWERPC_PERF_START_COUNT(altivec_hadamard8_diff8x8_num, 1);
{
register const vector signed short vprod1 =(const vector signed short)
{ 1,-1, 1,-1, 1,-1, 1,-1 };
vsum = vec_splat(vsum, 3);
vec_ste(vsum, 0, &sum);
}
-POWERPC_PERF_STOP_COUNT(altivec_hadamard8_diff8x8_num, 1);
return sum;
}
}
static int hadamard8_diff16_altivec(/*MpegEncContext*/ void *s, uint8_t *dst, uint8_t *src, int stride, int h){
-POWERPC_PERF_DECLARE(altivec_hadamard8_diff16_num, 1);
int score;
-POWERPC_PERF_START_COUNT(altivec_hadamard8_diff16_num, 1);
score = hadamard8_diff16x8_altivec(s, dst, src, stride, 8);
if (h==16) {
dst += 8*stride;
src += 8*stride;
score += hadamard8_diff16x8_altivec(s, dst, src, stride, 8);
}
-POWERPC_PERF_STOP_COUNT(altivec_hadamard8_diff16_num, 1);
return score;
}
/* next one assumes that ((line_size % 8) == 0) */
static void avg_pixels8_xy2_altivec(uint8_t *block, const uint8_t *pixels, int line_size, int h)
{
-POWERPC_PERF_DECLARE(altivec_avg_pixels8_xy2_num, 1);
register int i;
register vector unsigned char pixelsv1, pixelsv2, pixelsavg;
register vector unsigned char blockv, temp1, temp2, blocktemp;
(vector unsigned short)pixelsv2);
pixelssum1 = vec_add(pixelssum1, vctwo);
-POWERPC_PERF_START_COUNT(altivec_avg_pixels8_xy2_num, 1);
for (i = 0; i < h ; i++) {
int rightside = ((unsigned long)block & 0x0000000F);
blockv = vec_ld(0, block);
block += line_size;
pixels += line_size;
}
-
-POWERPC_PERF_STOP_COUNT(altivec_avg_pixels8_xy2_num, 1);
}
void dsputil_init_altivec(DSPContext* c, AVCodecContext *avctx)
*/
#include "libavcodec/dsputil.h"
-
-#include "dsputil_ppc.h"
-
#include "dsputil_altivec.h"
int mm_flags = 0;
return result;
}
-#if CONFIG_POWERPC_PERF
-unsigned long long perfdata[POWERPC_NUM_PMC_ENABLED][powerpc_perf_total][powerpc_data_total];
-/* list below must match enum in dsputil_ppc.h */
-static unsigned char* perfname[] = {
- "ff_fft_calc_altivec",
- "gmc1_altivec",
- "dct_unquantize_h263_altivec",
- "fdct_altivec",
- "idct_add_altivec",
- "idct_put_altivec",
- "put_pixels16_altivec",
- "avg_pixels16_altivec",
- "avg_pixels8_altivec",
- "put_pixels8_xy2_altivec",
- "put_no_rnd_pixels8_xy2_altivec",
- "put_pixels16_xy2_altivec",
- "put_no_rnd_pixels16_xy2_altivec",
- "hadamard8_diff8x8_altivec",
- "hadamard8_diff16_altivec",
- "avg_pixels8_xy2_altivec",
- "clear_blocks_dcbz32_ppc",
- "clear_blocks_dcbz128_ppc",
- "put_h264_chroma_mc8_altivec",
- "avg_h264_chroma_mc8_altivec",
- "put_h264_qpel16_h_lowpass_altivec",
- "avg_h264_qpel16_h_lowpass_altivec",
- "put_h264_qpel16_v_lowpass_altivec",
- "avg_h264_qpel16_v_lowpass_altivec",
- "put_h264_qpel16_hv_lowpass_altivec",
- "avg_h264_qpel16_hv_lowpass_altivec",
- ""
-};
-#include <stdio.h>
-#endif
-
-#if CONFIG_POWERPC_PERF
-void powerpc_display_perf_report(void)
-{
- int i, j;
- av_log(NULL, AV_LOG_INFO, "PowerPC performance report\n Values are from the PMC registers, and represent whatever the registers are set to record.\n");
- for(i = 0 ; i < powerpc_perf_total ; i++) {
- for (j = 0; j < POWERPC_NUM_PMC_ENABLED ; j++) {
- if (perfdata[j][i][powerpc_data_num] != (unsigned long long)0)
- av_log(NULL, AV_LOG_INFO,
- " Function \"%s\" (pmc%d):\n\tmin: %"PRIu64"\n\tmax: %"PRIu64"\n\tavg: %1.2lf (%"PRIu64")\n",
- perfname[i],
- j+1,
- perfdata[j][i][powerpc_data_min],
- perfdata[j][i][powerpc_data_max],
- (double)perfdata[j][i][powerpc_data_sum] /
- (double)perfdata[j][i][powerpc_data_num],
- perfdata[j][i][powerpc_data_num]);
- }
- }
-}
-#endif /* CONFIG_POWERPC_PERF */
-
/* ***** WARNING ***** WARNING ***** WARNING ***** */
/*
clear_blocks_dcbz32_ppc will not work properly on PowerPC processors with a
*/
static void clear_blocks_dcbz32_ppc(DCTELEM *blocks)
{
-POWERPC_PERF_DECLARE(powerpc_clear_blocks_dcbz32, 1);
register int misal = ((unsigned long)blocks & 0x00000010);
register int i = 0;
-POWERPC_PERF_START_COUNT(powerpc_clear_blocks_dcbz32, 1);
#if 1
if (misal) {
((unsigned long*)blocks)[0] = 0L;
#else
memset(blocks, 0, sizeof(DCTELEM)*6*64);
#endif
-POWERPC_PERF_STOP_COUNT(powerpc_clear_blocks_dcbz32, 1);
}
/* same as above, when dcbzl clear a whole 128B cache line
#if HAVE_DCBZL
static void clear_blocks_dcbz128_ppc(DCTELEM *blocks)
{
-POWERPC_PERF_DECLARE(powerpc_clear_blocks_dcbz128, 1);
register int misal = ((unsigned long)blocks & 0x0000007f);
register int i = 0;
-POWERPC_PERF_START_COUNT(powerpc_clear_blocks_dcbz128, 1);
#if 1
if (misal) {
// we could probably also optimize this case,
#else
memset(blocks, 0, sizeof(DCTELEM)*6*64);
#endif
-POWERPC_PERF_STOP_COUNT(powerpc_clear_blocks_dcbz128, 1);
}
#else
static void clear_blocks_dcbz128_ppc(DCTELEM *blocks)
}
}
-#if CONFIG_POWERPC_PERF
- {
- int i, j;
- for (i = 0 ; i < powerpc_perf_total ; i++) {
- for (j = 0; j < POWERPC_NUM_PMC_ENABLED ; j++) {
- perfdata[j][i][powerpc_data_min] = 0xFFFFFFFFFFFFFFFFULL;
- perfdata[j][i][powerpc_data_max] = 0x0000000000000000ULL;
- perfdata[j][i][powerpc_data_sum] = 0x0000000000000000ULL;
- perfdata[j][i][powerpc_data_num] = 0x0000000000000000ULL;
- }
- }
- }
-#endif /* CONFIG_POWERPC_PERF */
}
#endif /* HAVE_ALTIVEC */
}
+++ /dev/null
-/*
- * Copyright (c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
- *
- * This file is part of FFmpeg.
- *
- * FFmpeg is free software; you can redistribute it and/or
- * modify it under the terms of the GNU Lesser General Public
- * License as published by the Free Software Foundation; either
- * version 2.1 of the License, or (at your option) any later version.
- *
- * FFmpeg is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * Lesser General Public License for more details.
- *
- * You should have received a copy of the GNU Lesser General Public
- * License along with FFmpeg; if not, write to the Free Software
- * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
- */
-
-#ifndef AVCODEC_PPC_DSPUTIL_PPC_H
-#define AVCODEC_PPC_DSPUTIL_PPC_H
-
-#include "config.h"
-
-#if CONFIG_POWERPC_PERF
-void powerpc_display_perf_report(void);
-/* the 604* have 2, the G3* have 4, the G4s have 6,
- and the G5 are completely different (they MUST use
- ARCH_PPC64, and let's hope all future 64 bis PPC
- will use the same PMCs... */
-#define POWERPC_NUM_PMC_ENABLED 6
-/* if you add to the enum below, also add to the perfname array
- in dsputil_ppc.c */
-enum powerpc_perf_index {
- altivec_fft_num = 0,
- altivec_gmc1_num,
- altivec_dct_unquantize_h263_num,
- altivec_fdct,
- altivec_idct_add_num,
- altivec_idct_put_num,
- altivec_put_pixels16_num,
- altivec_avg_pixels16_num,
- altivec_avg_pixels8_num,
- altivec_put_pixels8_xy2_num,
- altivec_put_no_rnd_pixels8_xy2_num,
- altivec_put_pixels16_xy2_num,
- altivec_put_no_rnd_pixels16_xy2_num,
- altivec_hadamard8_diff8x8_num,
- altivec_hadamard8_diff16_num,
- altivec_avg_pixels8_xy2_num,
- powerpc_clear_blocks_dcbz32,
- powerpc_clear_blocks_dcbz128,
- altivec_put_h264_chroma_mc8_num,
- altivec_avg_h264_chroma_mc8_num,
- altivec_put_h264_qpel16_h_lowpass_num,
- altivec_avg_h264_qpel16_h_lowpass_num,
- altivec_put_h264_qpel16_v_lowpass_num,
- altivec_avg_h264_qpel16_v_lowpass_num,
- altivec_put_h264_qpel16_hv_lowpass_num,
- altivec_avg_h264_qpel16_hv_lowpass_num,
- powerpc_perf_total
-};
-enum powerpc_data_index {
- powerpc_data_min = 0,
- powerpc_data_max,
- powerpc_data_sum,
- powerpc_data_num,
- powerpc_data_total
-};
-extern unsigned long long perfdata[POWERPC_NUM_PMC_ENABLED][powerpc_perf_total][powerpc_data_total];
-
-#if !ARCH_PPC64
-#define POWERP_PMC_DATATYPE unsigned long
-#define POWERPC_GET_PMC1(a) __asm__ volatile("mfspr %0, 937" : "=r" (a))
-#define POWERPC_GET_PMC2(a) __asm__ volatile("mfspr %0, 938" : "=r" (a))
-#if (POWERPC_NUM_PMC_ENABLED > 2)
-#define POWERPC_GET_PMC3(a) __asm__ volatile("mfspr %0, 941" : "=r" (a))
-#define POWERPC_GET_PMC4(a) __asm__ volatile("mfspr %0, 942" : "=r" (a))
-#else
-#define POWERPC_GET_PMC3(a) do {} while (0)
-#define POWERPC_GET_PMC4(a) do {} while (0)
-#endif
-#if (POWERPC_NUM_PMC_ENABLED > 4)
-#define POWERPC_GET_PMC5(a) __asm__ volatile("mfspr %0, 929" : "=r" (a))
-#define POWERPC_GET_PMC6(a) __asm__ volatile("mfspr %0, 930" : "=r" (a))
-#else
-#define POWERPC_GET_PMC5(a) do {} while (0)
-#define POWERPC_GET_PMC6(a) do {} while (0)
-#endif
-#else /* ARCH_PPC64 */
-#define POWERP_PMC_DATATYPE unsigned long long
-#define POWERPC_GET_PMC1(a) __asm__ volatile("mfspr %0, 771" : "=r" (a))
-#define POWERPC_GET_PMC2(a) __asm__ volatile("mfspr %0, 772" : "=r" (a))
-#if (POWERPC_NUM_PMC_ENABLED > 2)
-#define POWERPC_GET_PMC3(a) __asm__ volatile("mfspr %0, 773" : "=r" (a))
-#define POWERPC_GET_PMC4(a) __asm__ volatile("mfspr %0, 774" : "=r" (a))
-#else
-#define POWERPC_GET_PMC3(a) do {} while (0)
-#define POWERPC_GET_PMC4(a) do {} while (0)
-#endif
-#if (POWERPC_NUM_PMC_ENABLED > 4)
-#define POWERPC_GET_PMC5(a) __asm__ volatile("mfspr %0, 775" : "=r" (a))
-#define POWERPC_GET_PMC6(a) __asm__ volatile("mfspr %0, 776" : "=r" (a))
-#else
-#define POWERPC_GET_PMC5(a) do {} while (0)
-#define POWERPC_GET_PMC6(a) do {} while (0)
-#endif
-#endif /* ARCH_PPC64 */
-#define POWERPC_PERF_DECLARE(a, cond) \
- POWERP_PMC_DATATYPE \
- pmc_start[POWERPC_NUM_PMC_ENABLED], \
- pmc_stop[POWERPC_NUM_PMC_ENABLED], \
- pmc_loop_index;
-#define POWERPC_PERF_START_COUNT(a, cond) do { \
- POWERPC_GET_PMC6(pmc_start[5]); \
- POWERPC_GET_PMC5(pmc_start[4]); \
- POWERPC_GET_PMC4(pmc_start[3]); \
- POWERPC_GET_PMC3(pmc_start[2]); \
- POWERPC_GET_PMC2(pmc_start[1]); \
- POWERPC_GET_PMC1(pmc_start[0]); \
- } while (0)
-#define POWERPC_PERF_STOP_COUNT(a, cond) do { \
- POWERPC_GET_PMC1(pmc_stop[0]); \
- POWERPC_GET_PMC2(pmc_stop[1]); \
- POWERPC_GET_PMC3(pmc_stop[2]); \
- POWERPC_GET_PMC4(pmc_stop[3]); \
- POWERPC_GET_PMC5(pmc_stop[4]); \
- POWERPC_GET_PMC6(pmc_stop[5]); \
- if (cond) { \
- for(pmc_loop_index = 0; \
- pmc_loop_index < POWERPC_NUM_PMC_ENABLED; \
- pmc_loop_index++) { \
- if (pmc_stop[pmc_loop_index] >= pmc_start[pmc_loop_index]) { \
- POWERP_PMC_DATATYPE diff = \
- pmc_stop[pmc_loop_index] - pmc_start[pmc_loop_index]; \
- if (diff < perfdata[pmc_loop_index][a][powerpc_data_min]) \
- perfdata[pmc_loop_index][a][powerpc_data_min] = diff; \
- if (diff > perfdata[pmc_loop_index][a][powerpc_data_max]) \
- perfdata[pmc_loop_index][a][powerpc_data_max] = diff; \
- perfdata[pmc_loop_index][a][powerpc_data_sum] += diff; \
- perfdata[pmc_loop_index][a][powerpc_data_num] ++; \
- } \
- } \
- } \
-} while (0)
-#else /* CONFIG_POWERPC_PERF */
-// those are needed to avoid empty statements.
-#define POWERPC_PERF_DECLARE(a, cond) int altivec_placeholder __attribute__ ((unused))
-#define POWERPC_PERF_START_COUNT(a, cond) do {} while (0)
-#define POWERPC_PERF_STOP_COUNT(a, cond) do {} while (0)
-#endif /* CONFIG_POWERPC_PERF */
-
-#endif /* AVCODEC_PPC_DSPUTIL_PPC_H */
#endif
#include "libavutil/common.h"
#include "libavcodec/dsputil.h"
-#include "dsputil_ppc.h"
#include "dsputil_altivec.h"
#define vs16(v) ((vector signed short)(v))
void fdct_altivec(int16_t *block)
{
-POWERPC_PERF_DECLARE(altivec_fdct, 1);
vector signed short *bp;
vector float *cp;
vector float b00, b10, b20, b30, b40, b50, b60, b70;
vector float mzero, cnst, cnsts0, cnsts1, cnsts2;
vector float x0, x1, x2, x3, x4, x5, x6, x7, x8;
- POWERPC_PERF_START_COUNT(altivec_fdct, 1);
-
-
/* setup constants {{{ */
/* mzero = -0.0 */
mzero = ((vector float)vec_splat_u32(-1));
#undef CTS
/* }}} */
-
-POWERPC_PERF_STOP_COUNT(altivec_fdct, 1);
}
/* vim:set foldmethod=marker foldlevel=0: */
* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
*/
#include "libavcodec/fft.h"
-#include "dsputil_ppc.h"
#include "util_altivec.h"
#include "dsputil_altivec.h"
*/
static void ff_fft_calc_altivec(FFTContext *s, FFTComplex *z)
{
-POWERPC_PERF_DECLARE(altivec_fft_num, s->nbits >= 6);
register const vector float vczero = (const vector float)vec_splat_u32(0.);
int ln = s->nbits;
FFTComplex *cptr, *cptr1;
int k;
-POWERPC_PERF_START_COUNT(altivec_fft_num, s->nbits >= 6);
-
np = 1 << ln;
{
nblocks = nblocks >> 1;
nloops = nloops << 1;
} while (nblocks != 0);
-
-POWERPC_PERF_STOP_COUNT(altivec_fft_num, s->nbits >= 6);
}
av_cold void ff_fft_init_altivec(FFTContext *s)
*/
#include "libavcodec/dsputil.h"
-#include "dsputil_ppc.h"
#include "util_altivec.h"
#include "types_altivec.h"
#include "dsputil_altivec.h"
altivec-enhanced gmc1. ATM this code assume stride is a multiple of 8,
to preserve proper dst alignment.
*/
-#define GMC1_PERF_COND (h==8)
void gmc1_altivec(uint8_t *dst /* align 8 */, uint8_t *src /* align1 */, int stride, int h, int x16, int y16, int rounder)
{
-POWERPC_PERF_DECLARE(altivec_gmc1_num, GMC1_PERF_COND);
const DECLARE_ALIGNED(16, unsigned short, rounder_a) = rounder;
const DECLARE_ALIGNED(16, unsigned short, ABCD)[8] =
{
unsigned long dst_odd = (unsigned long)dst & 0x0000000F;
unsigned long src_really_odd = (unsigned long)src & 0x0000000F;
-
-POWERPC_PERF_START_COUNT(altivec_gmc1_num, GMC1_PERF_COND);
-
tempA = vec_ld(0, (unsigned short*)ABCD);
Av = vec_splat(tempA, 0);
Bv = vec_splat(tempA, 1);
dst += stride;
src += stride;
}
-
-POWERPC_PERF_STOP_COUNT(altivec_gmc1_num, GMC1_PERF_COND);
}
#include "libavcodec/h264data.h"
#include "libavcodec/h264dsp.h"
-#include "dsputil_ppc.h"
#include "dsputil_altivec.h"
#include "util_altivec.h"
#include "types_altivec.h"
static void PREFIX_h264_chroma_mc8_altivec(uint8_t * dst, uint8_t * src,
int stride, int h, int x, int y) {
- POWERPC_PERF_DECLARE(PREFIX_h264_chroma_mc8_num, 1);
DECLARE_ALIGNED(16, signed int, ABCD)[4] =
{((8 - x) * (8 - y)),
(( x) * (8 - y)),
vec_s16 vsrc2ssH, vsrc3ssH, psum;
vec_u8 vdst, ppsum, vfdst, fsum;
- POWERPC_PERF_START_COUNT(PREFIX_h264_chroma_mc8_num, 1);
-
if (((unsigned long)dst) % 16 == 0) {
fperm = (vec_u8){0x10, 0x11, 0x12, 0x13,
0x14, 0x15, 0x16, 0x17,
}
}
}
- POWERPC_PERF_STOP_COUNT(PREFIX_h264_chroma_mc8_num, 1);
}
/* this code assume that stride % 16 == 0 */
/* this code assume stride % 16 == 0 */
static void PREFIX_h264_qpel16_h_lowpass_altivec(uint8_t * dst, uint8_t * src, int dstStride, int srcStride) {
- POWERPC_PERF_DECLARE(PREFIX_h264_qpel16_h_lowpass_num, 1);
register int i;
LOAD_ZERO;
vec_u8 sum, vdst, fsum;
- POWERPC_PERF_START_COUNT(PREFIX_h264_qpel16_h_lowpass_num, 1);
-
for (i = 0 ; i < 16 ; i ++) {
vec_u8 srcR1 = vec_ld(-2, src);
vec_u8 srcR2 = vec_ld(14, src);
src += srcStride;
dst += dstStride;
}
- POWERPC_PERF_STOP_COUNT(PREFIX_h264_qpel16_h_lowpass_num, 1);
}
/* this code assume stride % 16 == 0 */
static void PREFIX_h264_qpel16_v_lowpass_altivec(uint8_t * dst, uint8_t * src, int dstStride, int srcStride) {
- POWERPC_PERF_DECLARE(PREFIX_h264_qpel16_v_lowpass_num, 1);
-
register int i;
LOAD_ZERO;
vec_u8 sum, vdst, fsum, srcP3a, srcP3b, srcP3;
- POWERPC_PERF_START_COUNT(PREFIX_h264_qpel16_v_lowpass_num, 1);
-
for (i = 0 ; i < 16 ; i++) {
srcP3a = vec_ld(0, srcbis += srcStride);
srcP3b = vec_ld(16, srcbis);
dst += dstStride;
}
- POWERPC_PERF_STOP_COUNT(PREFIX_h264_qpel16_v_lowpass_num, 1);
}
/* this code assume stride % 16 == 0 *and* tmp is properly aligned */
static void PREFIX_h264_qpel16_hv_lowpass_altivec(uint8_t * dst, int16_t * tmp, uint8_t * src, int dstStride, int tmpStride, int srcStride) {
- POWERPC_PERF_DECLARE(PREFIX_h264_qpel16_hv_lowpass_num, 1);
register int i;
LOAD_ZERO;
const vec_u8 permM2 = vec_lvsl(-2, src);
vec_u8 fsum, sumv, sum, vdst;
vec_s16 ssume, ssumo;
- POWERPC_PERF_START_COUNT(PREFIX_h264_qpel16_hv_lowpass_num, 1);
src -= (2 * srcStride);
for (i = 0 ; i < 21 ; i ++) {
vec_u8 srcM2, srcM1, srcP0, srcP1, srcP2, srcP3;
dst += dstStride;
}
- POWERPC_PERF_STOP_COUNT(PREFIX_h264_qpel16_hv_lowpass_num, 1);
}
#endif
#include "libavcodec/dsputil.h"
#include "types_altivec.h"
-#include "dsputil_ppc.h"
#include "dsputil_altivec.h"
#define IDCT_HALF \
void idct_put_altivec(uint8_t* dest, int stride, int16_t *blk)
{
-POWERPC_PERF_DECLARE(altivec_idct_put_num, 1);
vec_s16 *block = (vec_s16*)blk;
vec_u8 tmp;
-#if CONFIG_POWERPC_PERF
-POWERPC_PERF_START_COUNT(altivec_idct_put_num, 1);
-#endif
IDCT
#define COPY(dest,src) \
COPY (dest, vx5) dest += stride;
COPY (dest, vx6) dest += stride;
COPY (dest, vx7)
-
-POWERPC_PERF_STOP_COUNT(altivec_idct_put_num, 1);
}
void idct_add_altivec(uint8_t* dest, int stride, int16_t *blk)
{
-POWERPC_PERF_DECLARE(altivec_idct_add_num, 1);
vec_s16 *block = (vec_s16*)blk;
vec_u8 tmp;
vec_s16 tmp2, tmp3;
vec_u8 perm1;
vec_u8 p0, p1, p;
-#if CONFIG_POWERPC_PERF
-POWERPC_PERF_START_COUNT(altivec_idct_add_num, 1);
-#endif
-
IDCT
p0 = vec_lvsl (0, dest);
ADD (dest, vx5, perm1) dest += stride;
ADD (dest, vx6, perm0) dest += stride;
ADD (dest, vx7, perm1)
-
-POWERPC_PERF_STOP_COUNT(altivec_idct_add_num, 1);
}
#include "libavcodec/dsputil.h"
#include "libavcodec/mpegvideo.h"
-#include "dsputil_ppc.h"
#include "util_altivec.h"
#include "types_altivec.h"
#include "dsputil_altivec.h"
static void dct_unquantize_h263_altivec(MpegEncContext *s,
DCTELEM *block, int n, int qscale)
{
-POWERPC_PERF_DECLARE(altivec_dct_unquantize_h263_num, 1);
int i, level, qmul, qadd;
int nCoeffs;
assert(s->block_last_index[n]>=0);
-POWERPC_PERF_START_COUNT(altivec_dct_unquantize_h263_num, 1);
-
qadd = (qscale - 1) | 1;
qmul = qscale << 1;
block[0] = backup_0;
}
}
-POWERPC_PERF_STOP_COUNT(altivec_dct_unquantize_h263_num, nCoeffs == 63);
}