Pacific initially developed a statistical approach to determining compliance with TA96's nondiscriminatory access standard structured on three central principles. First, the remedy plan must not impose payments on Pacific when nondiscriminatory or parity treatment is provided.11 However, Pacific conceded that, given the nature of the statistical models applied, it was difficult to drive the parity payment amount closer to zero without lowering the out-of-parity payments substantially. (Pacific's 1999 Opening Brief on Performance Remedies at 2-3.)
Second, if Pacific does not provide parity treatment, then payment amounts to the CLEC should have some reasonable relationship to the level of performance provided.12 Pacific argued that remedy amounts should not be enormous when the level of performance deviates from parity by only small amounts or in isolated incidents. Thus, the levels of remedies should start relatively low and increase commensurately with the level of nonperformance. Id. at 3.
Third, remedy payments should motivate Pacific to provide nondiscriminatory service, but should not motivate the CLECs to favor receiving large remedy payments.13 Therefore, the remedy amounts must not be so high that a CLEC would be more desirous of receiving poor service and collecting large payments than receiving nondiscriminatory service. Id.
The CLECs also based their initial incentive proposal on three principals. They declared that the incentives must be in an amount sufficient to cause Pacific to meet its parity obligations. Second, the incentives must be self-executing without broad opportunity for circumvention or lengthy delay in the payment of the consequences. Finally, the CLECs asserted that the structure of the plan must be fairly simple to implement and monitor.
In its initial performance incentive proposal, Pacific defines parity to mean delivering services to CLEC customers from the same processes as delivered to ILEC customers. When organizationally it is not possible to have the same processes, Pacific then defines parity to mean that the ILEC must deliver services with the same properties to the CLEC as delivered to the ILEC. The definition for parity, and the test for parity, appears to be the same, i.e., 1.645 standard deviations from the mean.14 (Pacific 1999 Opening Brief at 5-6 and 13-15.)
Verizon CA contends that parity only requires that CLEC ordering processes be performed in "substantially the same time and manner" as the ILEC's like processes. It claims that ILECs have unavoidable variations in their own processes, and as long as the ILEC and CLEC distributions are substantially the same, parity is present. Verizon CA also considers the appropriate test for parity to be average performance within 1.645 standard deviations of the mean. (Verizon CA 1999 Opening Brief at 5.)
The CLECs define parity as equal service for the ILEC and the CLEC. The CLECs want zero (0) standard deviations from the mean for the definition of parity, but have offered that a test for determining parity could be one (1) standard deviation from the mean. (CLECs' 1999 Opening Brief at 4-15.)
In its May 3, 1999 preliminary statement, Verizon CA embraced each of the core principles Pacific and the CLECs set forth, and asserted that the concepts need not be mutually exclusive. Moreover, it added the following seven principles of its own to the "ideal" incentive plan. First, a design objective of the plan should be that no incentive payments should be made when parity exists. Consequences should be economically significant, not just statistically significant. Further, the incentive structure should provide that the incentive payment equals the resource cost of meeting the standard. Regular review periods are necessary. The incentive mechanism should not result in large administrative costs. There must be some "off-ramps" in a self-executing incentive system to deal with certain circumstances. Finally, with an eye to the future, the plan should be symmetrical across all parties. (Verizon CA Brief on OSS Performance Incentives at 2-5.)
Pacific originally proposed using a standard Z-test15 for purposes of determining compliance with parity. The CLECs objected to the standard Z-test, which utilizes the individual variances of the Pacific and CLEC samples, arguing that Pacific could manipulate the variance of the CLEC sample. Pacific responded that the standard Z- test was adequate because any alleged manipulation of the CLEC sample variance would be readily apparent.
The CLECs speculated that Pacific could increase the variance of the CLEC sample, which would reduce the probability that Pacific would be found out-of-parity.16 In response, they proposed the "Modified Z-test,"17 which modifies the standard Z-test by using only Pacific's sample variance. In the "spirit of collaboration," Pacific offered to use the CLECs' proposed Modified Z-test on a trial basis, and then test it in order to evaluate whether the Modified Z-test yielded "fair and accurate results." Verizon CA agreed to use the Modified Z-test to assess parity subject to review and modification following a six-month interim implementation period.
Pacific initially desired a minimum sample size of thirty occurrences.18 In the "spirit of cooperation," Pacific was willing to lower the sample size to twenty, with the caveat that the impact of smaller sample sizes be evaluated during a review period in the not too distant future. Pacific also accepted benchmark measures for a specific list of rare submeasures.19 That is, parity measures with rarely occurring activity were essentially to be converted to benchmark measures.
The CLECs acknowledge that many of their number will have fewer than thirty observations (e.g., orders) in a month for some measures. They want to ensure that a requirement of a larger sample size does not passively provide an acceptable level of performance to the ILEC. Therefore, the CLECs preferred sample sizes as small as one, but suggested a minimum sample size of five for parity submeasures. The CLECs also accepted the benchmark measures for the specific list of rare submeasures.
Verizon CA supported the use of "table lookup"20 for sample sizes exceeding 50 CLEC transactions. Noting that there is a lack of experience using the Modified "t" statistic21 for non-normal samples, Verizon CA advocated using permutation tests for sample sizes between 20 and 50. (Verizon CA 1999 Opening Brief at 33-34.) For sample sizes less than 20, Verizon CA originally proposed that the CLECs and it should explore, during the interim development period, use of: (1) permutation tests; (2) aggregation of results across sub-measures; (3) aggregation of results across CLECs; and (4) possible exclusion of a given measure from performance incentive assessment. During the interim period, Verizon CA stated that it would also rely, to the extent practicable, on "exact methods"22 to determine achieved significant levels for small sample tests on proportions. (Id. at 34.)
Pacific and Verizon CA proposed a Z statistic of greater than 1.645 standard deviations (critical value) to determine "out-of-parity." A 1.645 standard deviation corresponds to a five percent (one-tailed) Type I error, or "alpha." A Type I error is rejecting the null hypothesis (i.e., parity service)23 when it should not be rejected. A Type II error is accepting the null hypothesis when it should not be accepted. "Alpha" is the probability of a Type I error and "beta" is the probability of a Type II error. Values of 1, 5, and 10 percent alpha levels are the most common "textbook" values.
The null hypothesis in this application poses that ILEC and CLEC performance are in parity. A Type I error is identifying the ILEC as not providing parity service (i.e., the ILEC is providing worse service to CLECs than to itself) when in fact the ILEC is providing parity service. A Type II error is identifying the ILEC as providing parity service when in fact it is not providing parity service. Pacific wanted to be limited to a five-percent probability of being identified as not providing parity service when in fact it is providing parity service.
The CLECs recommended an equal error methodology be employed for setting the errors. This essentially calculates and equates the Type I and Type II errors for each submeasure each month. The CLECs ultimately suggested that a Z statistic of greater than 1.04 standard deviations (critical value) should identify "out-of-parity" conditions. A 1.04 standard deviation corresponds to a fifteen percent (one-tailed) Type I alpha level. The CLECs were concerned with Type II errors, not just Type I errors. By making the critical alpha level larger, the CLECs worried less about the beta error.24 Thus, the CLECs wanted at least a fifteen-percent probability limit for identifying Pacific and Verizon CA as not providing parity service when in fact they are providing parity service, because they believed that this would correspond more closely to an equal probability of identifying non-parity service as parity service.
By ruling issued November 22, 1999, the assigned Commissioner assessed the submitted proposed plans and set forth his concerns about them (the ACR). The ACR noted that the existent ILEC models and the CLECs' model appeared distinct and incompatible. In addition, the parties revealed considerable misunderstanding and confusion about the two sets of respective model assumptions and calculations. It was difficult to sort out the relative impacts of each of the respective components of the two differing model approaches. Moreover, the end result outcomes of the two models were highly uncertain because both the modeling approaches were trying simultaneously to design and implement the total model (both the performance assessment model elements and the incentive plan elements) without the benefit of an implementation and data calibration structure.
While the plans' proponents had articulated numerous core concepts, no distilled set of principles supported both plans. There also appeared to be little rationale for the incentive levels implicit in either plan. It is unlikely that either plan could be implemented as designed. Moreover, both models might impose costs when evidence suggests parity service, and both models might not impose costs when evidence suggests non-parity service. During the February 1999 technical workshop, each proposed plan produced dramatically different payments due to different input assumptions. Both plans were also very sensitive to minor changes in assumptions. These problems were not due to an attempt to keep the plans simple; both the ILECs' and CLECs' plans were very complex. Accordingly, we affirm the ACR's evaluation of the initial ILECs' and CLECs' plans.
The ACR expressed the need to have one common interim model framework of analyses for review and discussion, and for use by all concerned parties in order to implement the performance remedies plan. One interim performance remedies plan model and set of explicit assumptions, would allow common quantitative analyses to be performed and estimates to be developed. All key model assumptions would be explicit, and the policy ramifications of these assumptions would be clear.
The ACR proposed that a common and feasible approach to implement the necessary performance remedies plan25 be developed with the assistance of the ILECs and the CLECs. It noted that to achieve the single common model framework, there needed to be an unwinding of the performance assessment model elements and the incentive plan elements that the parties merged together from the outset. To that end, the ACR proposed an initial conceptual performance measurement statistical model, and asked the parties to respond to specific questions about the model. Further, it proposed that the Commission implement a fully functioning, self-executing performance remedies plan during a six-month pilot test period.
We concur with the ACR assessment that a single model approach would allow the Commission to make informed policy decisions about the performance remedies plan. A single model approach focuses on the goal of parity service by the ILECs, economic incentives paid by the ILECs, and/or a change in ILECs' operations support to the CLECs. The end goal is certainly not just to have complex statistical measurement theory applications. There may be a variety of statistical measurement approaches that can all achieve the same basic economic and operations incentives by using different incentive plan structures and amounts, in combination with different measurement approaches.
A single common interim model and a single set of explicit assumptions should allow calibration of end result economic outcomes both before and after a six-month pilot test period using actual empirical data. The interim pilot test period can assist the Commission in determining the appropriate levels of long-term economic incentives. Long-term incentive impacts can be calibrated in relation to one model, one common set of assumptions, and actual test period empirical data. Penalty amounts and structures can still be set and paid during the pilot test period, and they can be applicable only during this interim period, unless otherwise determined.
Noting the ILECs' and CLECs' distinct views on standard and Modified Z-tests, the ACR questioned whether there would be a way to determine if the Modified Z-test yields "fair and accurate results." Of interest are differences in the results if the standard Z-test was used rather than the Modified Z-test. Such differences would be due to disparities between the variances of Pacific and the CLECs. Regarding the CLEC position that the variance of the CLEC sample could be potentially manipulated, the ACR stated that concern about the possibility of manipulation should not direct the test procedure.
The ACR suggested that the optimal course might be for the Commission to proceed with the standard Z-test on a trial basis to be evaluated after a six-month test period. The proposed Modified Z-test26 applies an experimental argument27 to an observational situation. There are no other academic precedents for our application of this particular modified calculation. The ACR stated that it was doubtful at this point whether any further complicating modifications to the statistical methodology for determining compliance with parity would be worth the benefits without first trying the standard Z-test.
The standard Z-test is the most common method to compare two population means, under the following key assumptions:
1. Underlying distributions are not too skewed (i.e., they are not too different from a normal bell shaped curve).
2. Sample sizes are reasonably large.
3. Observations are independent measurements from the same processes (e.g., phone service installation operations).
If the variances are known to be equal, then a pooled, or common, variance estimate is used. If the variances are known to be unequal, then both separate variances are used. If it is unknown, a priori, whether the population variances are equal or not, then an initial test compares the variances. Based on this first test, either the separate or pooled variance estimate is used.
The genesis of the Modified Z-test assumes the contention that Pacific could manipulate the variance of the CLEC sample. While such manipulation might be possible, it seems equally likely that Pacific could simultaneously manipulate the mean of the CLEC sample, and the variance and mean of the corresponding Pacific sample. The ACR proposed to first test for variance equality between Pacific and CLEC results. If the variances prove to be unequal, the ACR suggested that it might be necessary to use the standard Z-test with both variances. In either case, parity will be assumed to exist when the differences in the measured results for both the ILECs and the CLECs in a single month, for the same measurements, are less than the critical value28 of the Z-test.
Early on, the CLECs implied that the difference between the standard Z-test and the Modified Z-test could measure Pacific's ability to manipulate the data. Since both Pacific and the CLECs have agreed to use the Modified Z-test during a pilot test period, the ACR raised the possibility that both the standard and Modified Z-tests might be calculated and evaluated over the six-month pilot test period. However, the ACR further proposed that if both tests were run, actual calculations during the trial test period would be based on the standard Z-test. The results of the evaluation might suggest that the decision as to which form of Z-test to use might be moot, since all choices might identify the same situations as being out-of-parity.
The ACR also suggested that during the six-month pilot test period, sample distributions could be reviewed to explore whether the distributions meet the above-stated underlying assumptions of the Z-test. At the end of this six-month pilot test period, there could be a reconsideration of whether any variety of Z-test should be used, or whether nonparametric tests29 might be more appropriate. All of the Z-tests described by Pacific and the CLECs are parametric tests. They assume observations are independent and are generated from the same process with a relatively well-behaved distribution.30 However, the ACR questioned the independence of the observations and the shapes of the distributions, especially the CLEC distributions. The ACR suggested that if these characterizations were accurate, over the long-term it might be better to use nonparametric tests.
Finally, the ACR noted that there appeared to be some confusion regarding the concept of samples versus entire populations. If, as the ACR surmised, it would be appropriate to assume we had the entire population of measurements during a time period, as with production output, then it might make sense to ultimately utilize concepts of statistical process control to monitor and modify the procedures when they appear to have gone, or likely will be going, out of control. For example, a production monitoring and control methodology31 could utilize the mean and variance of the ILEC (essentially as a benchmark against which CLEC measurements are compared). This could be performed using a Z-test-based chart set only on the mean of CLEC measurements against the historic mean and variance32 or other statistics of the ILEC. Or similarly, a permutation test could be used.
The ACR suggested that the real problem here might be that many performance measures ostensibly constructed from "samples" really are constructed from the complete set of actual observations. The ACR reasoned that frequently, a one-month observation is really a "sample" of the entire length of the production process, but is not a random sample, unless selected from among all of the months of production using some random procedure. In many instances, the proper statistical application may be statistical quality control viewing data as a time series. At the end of the six-month pilot test period, the confusion surrounding the sample versus population issue should be resolved. The ACR indicated that it would be very important to analyze the key underlying assumptions during the six-month pilot test period in order to establish the reasonableness of these assumptions and to understand the potential impact of any divergences from them.
Initially, the ACR plan did not contemplate a Z-test, or any other statistical test, for benchmark measures. It proposed to regard any measure that exceeds the benchmark value as a performance failure. Consequently, it envisioned that any performance worse than a benchmark would not be tolerated, and if exceeded, at least some penalty would be assessed. The ACR recommended monitoring the number of observations (e.g., orders) and improving benchmark measures over time taking into account the actual number of observations realistically expected to occur. For the immediate future, the ACR suggested treating benchmarks as absolutes, but moderating the impact of exceeding the benchmarks by means of smaller penalties for each occurrence. It also suggested that penalties should be greater for larger deviations from the benchmark.
Treating benchmarks as absolutes assumes that the parties established the benchmark values with some knowledge of the anticipated ability to meet them and/or the relative frequency of time they reasonably could be met. The frequency and value of the ILECs' inability to provide service meeting the benchmarks could be monitored and re-evaluated during the initial six-month pilot test period. Any dramatic differences between assessing performance with parity versus benchmark measures could eventually be resolved either by readjusting the alpha values, or benchmarks, or the incentives.
The ACR concurred with the concept of converting parity submeasures with rare activity to benchmarks. It suggested that additional rare activity submeasures should be converted to benchmarks. The ACR stated a preference for benchmark measures over parity measures for performance remedies, because benchmark measures do not require any complicating summary statistics. Early estimates indicated approximately forty percent of all measures were benchmarks, and that sixty percent were parity measures. Approximately fifteen percent of all measures had both parity and benchmark submeasures. The ACR expressed the hope that over time, the parties would agree to convert even more parity measures to benchmark measures.
The ACR surmised that sample size proposals were justified more by pragmatic concerns than by statistical principles. Proposed sample size specifications reflect negotiated values more than statistical criteria. For example, selecting a minimum sample size of five suggested one of two things: (1) either the cost to collect each observation is extremely expensive, or (2) there is an insufficient population from which to sample.33 The issue of minimum sample size is relevant only for the first situation.
If all five observations occur during a particular time period, this is the entire population of measurements instead of a sample. The only sampling analog is to assume that the five observations are a sample of the potential observations that could have occurred during that same time period. Usually measurements are made with sufficient frequency to allow for corrective action if the process is beginning to "go out of control," or because management prefers to review data on a set periodic basis (i.e., hourly, daily, weekly, monthly, etc.). Such "periodicity" of measurement is usually established independent of sample size concerns. The ACR suggested that if too few observations occur in an established time interval, either the time interval can be lengthened, or the test can be performed using an aggregated measure incorporating more than one measurement. Or, the consistency of measurements could be tracked over time (e.g., number of "misses" for percent success measures) using statistical quality control charts.
The current assumption is that the time period for measurement is monthly. The ACR proposed lengthening the time period when the number of observations (e.g., sample size) is very small. However, the ACR recommended that this time period should not be so long as to enable the ILEC to manipulate results, and/or escape detection for providing non-parity service to the CLECs.
The ACR proposed to proceed with a minimum sample size of thirty, which could be aggregated in up to three-month time periods. Thus, a minimum sample size of at least thirty would be obtained through an accumulation of up to three months, if necessary. If any sample size, aggregated or not, were to reach thirty in one, two, or three months, then the test would be performed when the number of observations first reached thirty. If, at the end of three months, the sample size had still not reached a minimum of thirty, the test would be performed using whatever sample size was achieved, regardless of the sample number. Ultimately, the measurement probably would be included in the rare occurrence benchmark list if fewer than thirty measurements happened during three months.
The ACR also advised that the appropriate length of time period for aggregation would be evaluated during the six-month pilot test period to better understand the frequency of measurements. Such an evaluation would aid in answering the question: "How many of each type of measurements can reasonably be expected to be made during any one month?" Any additional rare submeasures that could become benchmarks would also be evaluated during this pilot test period.34 The ACR proposed to analyze any relatively large CLEC or ILEC values that skew the general tendency of the other values. (ACR at 24-25.)
The ACR observed that it appears not to matter which critical value is actually employed, since the amount of the penalty can be adjusted to provide equivalent expected outcomes for the different possible critical values. The ACR proposed to track the actual alpha level outcomes, and ultimately calibrate the size of payments as a function of the actual values. The greater the Z-statistic value (corresponds to a smaller Type I alpha error), the larger the penalty. The ACR proposed that in this proceeding, there should be no single critical cutoff value but a range of values. However, the ACR proposed that if one discrete cutoff value must be selected, it be a ten-percent Type I alpha level for parity tests. Preliminarily, ten-percent was a split between the suggested five and fifteen-percent values, and it is a commonly used critical value. This alpha level corresponds to 1.282 standard deviations.
The ACR described the CLECs' critical value proposal to be more of an "equal error" proposal than the "equal risk" proposal as the CLECs introduced it. Equal error refers to decisions with the same Type I and Type II error probabilities. Equal risk refers to decisions where the consequences of the decisions are equal, such as equal dollar losses. Their ultimate proposal does not equate the two expected dollar losses. In addition, the significance level that equates Type I and II errors varies by sample sizes and underlying distributions. The ACR also noted that the CLECs indicated concern with the Type II error, not just the Type I error. While fifteen-percent alpha levels are not commonly used for hypothesis testing, they are sometimes used for monitoring.
In their initial brief, the CLECs suggested that a performance payment be made for any occurrence beyond the acceptable level in a benchmark. (CLECs' 1999 Opening Brief at 3.) The ACR offered a similar recommendation, and pointed out that the CLECs also proposed that a specific table35 be used to detail the small sample size benchmark standard comparable to the table agreed upon for large sample sizes (i.e., thirty or more observations). The ACR noted that the proposed table was negotiated, and did not systematically adopt the "closest" percentage possible compared to what would be expected from a large sample. It was unclear whether Pacific accepted this particular CLEC proposal.
The ACR remarked that while the concept of payments for all missed benchmark measures is easy to implement, it assumes accurate measurements. The ACR proposed discarding the benchmark table entirely at this juncture, and going with some level of graduated penalty for any measurement over the benchmark. For example, very small benchmark penalties could be assessed for very small frequencies of occurrences, and much larger penalties could be set for larger frequencies of occurrences.
For small sample sizes, the CLECs suggested permutation-testing procedures to compute the exact alpha and beta calculations.36 (CLECs' 1999 Opening Brief at 30.) Pacific accepted this suggestion, specifying that the sample size should not be less than ten, if and when the Commission orders permutation testing. The company commented that permutation testing "is not an intuitive process for most people." Pacific recommended studying the validity and feasibility of utilizing permutation testing and that the approach be revisited after a trial test period. (Pacific 1999 Opening Brief at 2.) The ACR suggested that permutation-testing procedures might be a reasonable application.
Desiring larger numbers of observations so that there would be little need for permutation testing procedures as a result of sample size, the ACR outlined its concern. Proposed statistical procedures use one-tailed tests to indicate when penalties should be assessed against the ILEC for poorer service to the CLECs, but do not yield any incentives to the ILEC for providing exceptional service. Still, the ACR acknowledged that permutation-testing procedures could have some role in assessing more exact measures of error. The ACR recommended that during the pilot test period, there be an evaluation of this application of permutation testing.
The ACR asked the parties to respond to four specific questions37 and to submit comments on the overall statistical model approach presented in the ruling. The parties38 filed opening and reply comments on January 7, and January 27, 2000, respectively.
11 "The expected cost for parity treatment should be zero." 12 "Payments should bear a reasonable relationship to level of performance." 13 "CLECs should not be motivated to receive large remedy payments." 14 A standard deviation is a standardized statistic measuring how dispersed scores are. A low standard deviation indicates scores are grouped closer to the mean than scores with a higher standard deviation. When applied to a normal or "bell-shaped" curve, the standard deviation provides helpful information about the dispersion of scores: 68.3 percent of all scores lie within one standard deviation of the mean (plus or minus one standard deviation, 95.4 percent lie within 2 standard deviations, 99.7 lie within 3 standard deviations, and so forth. In the present application, 1.645 standard deviations above the mean encompass 95 percent of the scores. So under conditions of random selection, a score greater than 1.645 standard deviation would be selected 5 percent or less of the time.15 Standard Z-test : Z = Difference/Standard deviation of the difference
Where: Difference = Pacific Average - CLEC Average. Standard deviation of the difference = Square root of ((Variance of Pacific x 1/Pacific sample size) + (Variance of CLEC x 1/CLEC sample size)). Or, assuming the variances for Pacific and the CLEC are equal, the variances are pooled together: Standard deviation of the difference = Square root of ((Pooled variance of Pacific and CLEC samples) x (1/Pacific sample size + 1/CLEC sample size)). 16 An increased CLEC variance theoretically could increase the size of the Z-test denominator without affecting the numerator, thus reducing the resulting Z-test statistic and reducing the chances of identifying out-of-parity situations. 17 Modified Z-test : Z = Same as Z-test. Where: Difference = Same as Z-test. Standard deviation of the difference = Square root of (Variance of Pacific x (1/Pacific sample size + 1/CLEC sample size)). 18 A sample size of thirty is a standard textbook "rule-of-thumb" sample size cutoff for parametric statistical testing such that distributional assumptions can be anticipated to be met for most situations. 19 A "measure" defines how performance will be measured for a specific OSS function, such as ordering, across several service types, such as residential telephone service, business telephone service, DSL service, etc. A "submeasure" applies the specified "measure" methods to individual service types, for example, either residential telephone service, or business telephone service, or DSL service, etc 20 The statistical test produces a test value. The test value can then be "looked up" in a table to determine statistical significance. In most cases a normal approximation or a "t" distribution table is used to determine the Z or t statistic that must be exceeded for a performance failure finding. 21 The "Modified t-test" is a variant of the Modified Z-test used for sampling distributions of small sample mean, as discussed later in this Decision. 22 The term "exact methods" is defined as performing all possible permutations. 23 A "null hypothesis" proposes that there are no differences between the true means. 24 As the critical alpha level is increased (e.g., from 0.05 to 0.15), beta decreases. 25 To avoid confusion with the work going on in the Performance Measurement segment of this proceeding, what is essentially the "performance measurement and incentive" plan will be referred to as the "performance remedies" plan. 26 It also holds the possibility of manipulation. 27 Brownie, Cavell, Boos, D., and Hughes-Oliver, J. Modifying the t and ANOVA F Test When Treatment Is Expected to Increase Variability Relative 2 Controls, 46 Biometrics at 259-266 (1990). 28 The critical value of the Z-statistic corresponds to a critical alpha value. The rejection region encompasses the critical Z-statistic and larger Z-statistic values, which correspond to critical alpha and smaller alpha values. 29 Distribution-free tests based on medians or ranks; that is, tests not dependent on assumptions about distributions, such as normality. 30 "Well-behaved" refers to distributions where a resulting distribution of sample means is not deviant enough from a normal distribution to cause inaccuracies - discussed later in this decision. 31 Standard Shewart Quality Control chart. R. Mason, R. Gunst, and J. Hess, Statistical Design And Analysis Of Experiments With Applications To Engineering And Science at 65 (1946). 32 Or cumulative values. 33 Instead, the number of observations is only five within some specified time period. 34 As stated, if there is no sample of observations, but instead, the population of CLEC values and/or ILEC values, the issues of errors and distributions are not really relevant. 35 CLECs' 1999 Opening Brief at 33. 36 Permutation testing involve direct estimation of probabilities from the actual data distribution, rather than inferences drawn from normal distribtution"look-up tables." 37 The ACR questions are reproduced in the attached Appendix A. 38 Pacific, the CLECs, Verizon CA and ORA.