Home > Doc > System evaluation based on past performance: Random Signals Test > Critical Values

System evaluation based on past performance: Random Signals Test

Critical Values

The level of significance is the probability of rejecting the null hypothesis when it is true. In our case, it’s the probability that the Random Signals Test mistakenly picks a system when it shouldn’t. If the Random Signals Test is used to test just one system, then the critical value of 5% significance is just the 95th percentile of a performance distribution. However, a trader could be data mining – he could run the hypothesis test on several different systems and then pick the system that passes the test.

In this case, the conventional critical values cannot be used. Intuitively, if there are 100 systems that trade randomly, about 5 of them will have performance greater than the 95th percentile of a performance distribution. Thus, if 100 random systems are tested, about 5 will be picked with conventional critical values, even though none should be picked.

If all the systems under consideration are a priori independent of each other, the correct percentiles for critical values are easily calculated from the definition of level of significance.

Let a significance, i Perf the performance of the ith system, N be the desired level of the total number of systems being tested, c the appropriate critical value, and p the corresponding percentile of a performance distribution. Then, from the definition of level of significance.

The transition from line 3 to line 4 is from the assumption that the N systems are independent of each other. From line 4 to line 5, the probability that a random variable is less than some value is the percentile (divided by 100) corresponding to that value.

Solving for the percentile p,

p=100*(1- a)1/N (5)

For a given level of significance a, as the number of systems N increases, so does the appropriate percentile p.

Table 2 shows the percentiles for critical values for selected levels of significance a and numbers of systems N. For example, if 10 systems are being tested, a system should only be picked at the 5% significance level if its performance is greater than the 99.49th percentile of the performance distribution.

Table 2: Percentiles corresponding to selected levels of significance a and numbers of candidate systems N. When one system is tested, the critical value of 5% significance is taken from the 95th percentile of a performance distribution. When more than one system is tested, critical values must be adjusted upward to account for data mining. For example, if 10 a priori independent systems are tested, the critical value at 5% significance is from the 99.49th percentile. In the case of curve-fitting, the systems are not independent of each other; N is thus somewhere between one and the number of parameter sets considered in the fitting. When one system is tested on different price series, N is the number of those price series.

Because the percentiles of interest are so large and so close together, a lot of runs of the random system are needed to obtain accurate critical values. In this paper, the random system is run 100,000 times. Table 3 shows the appropriate critical values for S&P 500 March 2002 futures based on the performance distribution in Figure 2 and percentiles in Table 2. Table 3 states that, for example, if 10 systems are tested, a system should only be picked at the 5% significance level if its Rate of Return is greater than 81.4%.

Table 3: Critical values for Rate of Return for selected levels of significance a and numbers of candidate systems N. Values calculated for S&P 500 March 2002 futures based on trade characteristics in Table 1 and percentiles in Table 2. For example, if 10 a priori independent systems are being tested, pick a system at the 5% significance level if its Rate of Return is 81.4% or more.

In our example, the Rate of Return (72%) is greater than all the Rates of Return in the N=1 column. This means that, assuming we’re testing only this one system, the Random Signals Test rejects the null hypothesis of random trading even at the 1% significance level.

Curve-Fitting

Traders often define a system in terms of a set of parameters and then find the set that maximizes past performance. For example, a trader dealing with the moving average crossover system might find the look-back period that maximizes historical Rate of Return. This is known as curve-fitting or optimization.

For the purpose of finding percentiles for critical values, each set of parameters defines a different system. However, the performances of these different systems are a priori correlated. For example, the performance of the moving average crossover system with the look-back period of 20 days is very similar to the performance of the system with the look-back period of 21 days.

Because of this correlation, the formula for percentiles mentioned above is no longer valid. If the performance of all the systems is independent, then N in Table 2 should be set to the number of systems. If the performance of all the systems is identical, then it does not matter how many systems there are and N should be set to 1. If the performance across systems is a priori positively correlated, as is the case with curve-fitting, N should be somewhere between 1 and the number of parameter sets considered. However, the exact value of N is unknown and depends on the system being tested.

We can approximate N by counting the number of clusters of parameter sets that a priori have similar performance within the cluster, have approximately independent performance across clusters. For example, consider the moving average crossover system. A trader who is testing it for look-back periods of 10 to 50 days might believe that there are two such clusters: from 10 to about 30 days and from about 30 to 50 days. That is, for example, he believes that the performance of the system with look-back period of 35 days is

1. Highly positively correlated to the performance of the system with look-back period of 40 days; but

2. Independent of the performance of the system with look-back period of 20 days. In this case, the trader sets N = 2.

Multiple Markets

A trader might also test a system on several different price series and pick it if the null hypothesis of random trading is rejected on at least one of these series. In this case, the derivation of the formula for percentiles is the same as discussed above, with N now being the number of price series on which the system is tested. As long as all N price series are independent of each other, performances on each of these series are also independent and the above derivation applies.

Prof. Alex Strashny

Next: Conclusion and Acknowledgments

Summary: Index