In response to the ACR's initial question of why the standard Z-test should not be used in the model, Pacific advocated retaining the Modified Z-test for three reasons. First, the standard Z-test yields inaccurate Type I error rates under the conditions apparent in performance remedies plans, i.e., in the absence of normal distribution and with relatively few large samples. Second, the Modified Z-test is easier to compute. Third, the Modified Z-test is sensitive to differences in the CLECs' variances. (Pacific Opening Comments on the ACR at 2 and 5.)
The CLECs urged using the Modified Z-test, yet agreed that the standard Z-test could be used. (CLECs' Opening Comments on the ACR at 3.) Verizon CA endorsed use of the standard Z-test, with modifications. It maintained that parties should be able to calculate and evaluate both the standard and Modified Z-tests during the evaluation or pilot test period. (Verizon CA Opening Comments on the ACR at 2.) ORA argued that since the underlying series or performance measures are not normally distributed, the true probabilities are unknown and the Z-test is of little value. It opposed using formal statistical tests for performance incentives. (ORA Opening Comments on the ACR at 5.)
To the ACR proposal to use benchmarks without statistical tests, Pacific asserted that benchmarks without statistical tests require an ILEC to meet an unreasonably higher standard of performance for small sample sizes than for large sample sizes. Pacific stated that statistical tests for benchmark measures make it possible to achieve a uniform Type I error rate for all measures under conditions of parity and compliance. Pacific segued from this question into an introduction of its white paper concept of converting all benchmarks to " standards" against which all the CLECs' results could be statistically tested. (Pacific ACR Opening Comments at 6 and 9.)
The CLECs remarked that the ACR's desire "to see more parity measures turned into benchmarks [ACR at 27] " was "troubling and difficult to understand." (CLECs' ACR Opening Comments at 8.) The CLECs continued to support a limited benchmark approach with no associated statistical component (except for the use of a table for small sample sizes). However, they maintained that unlike the parity standard, which requires the use of statistics to compare distributions, a benchmark standard requires no comparison other than the benchmark itself. The CLECs urged the enforcement of the benchmark standards adopted in D.99-08-020. (CLECs' ACR Reply Comments at 5.)
Verizon CA supported using benchmark measures without any statistical tests during the pilot period for all previously designated benchmark measures. Verizon CA agreed that the ACR's simple approach could be used during the pilot. Notwithstanding, Verizon CA proposed examining other alternatives such as tables for small sample sizes and the use of statistical tests with benchmarks. (Verizon CA ACR Opening Comments at 12.) ORA argued that benchmarks should be based on historical and not future data, and should be limited to those measures in which there is historical data available on at least 20 observations. (ORA ACR Opening Comments at 6.) ORA asserted that benchmarks should be defined as the historical mean of the series plus one standard deviation.
Pacific urged, and the CLECS agreed to, the use of special tables for percentage-based benchmarks with small samples. The CLECs favored the use of a table for benchmarks with small sizes. While allowing that the ACR's simple benchmark approach could be used, Verizon CA advocated alternatively examining the use of tables for small sample sizes. Verizon CA endorsed Pacific's adjusted table of percentages for benchmarks. As noted, ORA opposed the use of any formal statistical tests for performance measures.
In response to question 3, Pacific agreed that samples of thirty are adequate for average-based parity submeasures. It did not agree that a sample size of thirty is appropriate for benchmark measures that are interpreted as absolute standards and for percentage-based measures for which the benchmark is near zero (0) or 100 percent. (Pacific ACR Opening Comments at 12.) Pacific initially desired a minimum sample size of 30 occurrences, which is the standard "rule of thumb" for parametric statistical testing. As a compromise, Pacific was willing to lower the sample size to 20, with the caveat that the impact of the small sample sizes be evaluated at the end of the six-month trial test period. It also accepted benchmark measures for a specific list of rare submeasures, i.e., rare parity measures essentially become benchmark measures.
Pacific did not agree to use the sample size at whatever number of cases is available after three months if a CLEC does not have thirty cases. Stating that neither the CLECs nor the ILECs have proposed that sample sizes less than five (5) be considered for assessing remedies, Pacific did not want to set the minimum sample size at one (1) case. Pacific argued that aggregating over months introduces additional complexity and accounting expenses into the measurement reporting process and that a simpler rule for sample size examines results over one month. Pacific concluded that "while it may be possible to program these aggregation rules, they will make it difficult for the CLECs to monitor Pacific's performance and difficult for Pacific to manage its business." (Id. at 13.)
The CLECs disagreed with using a minimum sample size as large as thirty (30). They argued that many CLECs would have fewer than 30 observations in a month for some measures. They also noted that Pacific reported that in the period of July through November 1999, approximately 100 CLECs had reportable data on 18,555 instances of parity submeasures. Of these reported submeasures, 62 percent of the CLECs had sample sizes of less than thirty cases. The CLECs further argued that the majority of all submeasures would have sample sizes less than thirty (30). (CLECs' ACR Opening Comments at 11.) Consequently, a majority of submeasures would not be subject to incentive payments. The CLECs have suggested a minimum sample size of 5 for parity submeasures. (CLECs' ACR Reply Comments at 8.)
The CLECs advocated using permutation testing for small sizes. (CLECs' ACR Opening Comments at 10.) They also disagreed with aggregating sample sizes over three months, or any time, because the ILECs could perform poorly for more than a month without correction. The CLECs insist that the only reason to favor a minimum sample size of thirty (30) for measured variables is that this might make a normal distribution an acceptable approximation to the distribution of the Z-test. Regarding minimum sample sizes for benchmark measures, the CLECs continued to advocate use of the table as the cleanest, easiest means of maintaining consistency with the adopted benchmarks. (Id. At 11.)
Verizon CA stated that aggregating small sample sizes over three months raises some potentially difficult and complex implementation issues. It advocated the standard Z-test with unequal variances employing exact distributions. For parity measures, Verizon CA favored using exact distributions for small sample sizes less than fifty (50). It also supported the Pacific-CLECs tables for benchmark measures with small sizes. Verizon CA disagreed that 30 observations for parity measures are appropriate with the Modified Z-test. It maintained that neither the standard nor Modified Z-test should be used with less than fifty observations. (Verizon CA ACR Opening Comments at 15.) ORA commented that the minimum sample size is not a "trivial issue" that should be arbitrarily set at thirty. It recommended a minimum sample size of 20 based on a formula (N (sample size) = 1/a where a = .05). (ORA ACR Opening Comments at 8.)
Pacific argued against the use of the 10-percent alpha limit and instead proposed a 5-percent Type I maximum error rate. The company asserted that a 10-percent alpha limit is unreasonably large and will yield an unfair proportion of Type I errors. It maintained that 5 percent represents a just compromise between unfairly detecting discrimination where none exists (Type I error) and failing to detect discrimination where it exists (Type II error). (Pacific ACR Opening Comments at 14-15.) Pacific focused on their desire to mitigate the effects of random variation. It commented that forgiveness rules help with the mitigation of random variation, but are complex and expensive to administer.
The CLECs continued to recommend an alpha value of 15 percent. They contended that it is a reasonable approximation of an alpha value that will balance Type I and Type II errors. The CLECs assert that they cannot ignore the impacts of a large Type II error. They also stated that any risk adjustment, such as a forgiveness plan, must reflect the alpha chosen by the Commission. The CLECs argued that an alpha value that more easily detects discriminatory behavior combined with a valid mitigation plan can achieve the goals of a high-powered test while minimizing payments under parity conditions. (CLECs' ACR Opening Comments at 13-14 and Reply at 10.) They agreed that there is no statistical reason why a 10-percent alpha cannot be used. In addition, they recommended that the Z-test for all parity submeasures be calculated throughout the six-month pilot test period at the five, ten, and fifteen percent levels in order to determine how many submeasures pass or fail depending on the critical value chosen. (CLECs' ACR Opening Comments at 13.)
Verizon CA commented that a 5 percent alpha remains a more balanced and reasonable choice. They asserted that a 10-percent critical value leads to a greater number of instances where a finding of "no parity" will follow from application of the test, when in fact, parity service is present. However, Verizon CA concurred with the CLECs the result should be examined at all three proposed levels: five, ten, and fifteen percent. (Verizon CA ACR Opening Comments at 17 and Reply Comments at 4.)
ORA stated that an alpha level of 10 percent is simply too large. They argued that a more standard alpha level of 5 percent should be used. ORA stated that the use of a larger than normal alpha level means an increase in the probability of incorrectly declaring that the ILEC is out-of-parity. ORA urged the Commission to reject multiple alpha values as an attempt at data mining. (ORA ACR Opening Comments at 13.)
ORA also noted that the proposed remedies plan has no provision to prevent service deterioration, thus posing an unacceptable risk to ratepayers. They asserted that service levels can only be maintained if standards are based on prior historical data and not on future data. Performance measures used in the test period should be limited to those measures in which there is historical data available on at least twenty (20) observations. One of the two major goals that ORA identified for the Performance Remedies Plan is to maintain service levels at least at historical levels for all ratepayers. Their other goal is to ensure that customers of both the CLECs and the ILECs receive "statistically equal" service. Finally, ORA insisted that a benchmark should also be based on historical, and not present or future data. (ORA ACR Opening Comments at 6.)