Comments on Draft Decision

The draft decision of ALJ Jacqueline A. Reed in this matter was mailed to parties in accordance with Pub. Util. Code § 311(g)(1) and Rule 77.7 of the Rules of Practice and Procedure. Comments were filed on _______________, and reply comments were filed on __________________.

Findings of Fact

1. The cornerstone of any performance incentive structure is how parity is defined, since it is on those occasions when an ILEC is out of parity that incentive payments will be made.

2. This Commission's definition of parity incorporates the objectives of the TA96 and the FCC.

3. It will be helpful to rely on statistical testing and benchmarks to infer whether or not parity has been achieved.

4. In late fall 1999, the existent ILEC models and the CLECs' model were distinct and incompatible.

5. The parties revealed considerable misunderstanding and confusion about the two sets of respective model assumptions and calculations.

6. The outcomes of the two models were highly uncertain because both approaches were trying simultaneously to design and implement the total model (both the performance assessment model elements and the incentive plan elements) without the benefit of an implementation and data calibration structure.

7. It is unlikely that either model could be implemented as designed.

8. During the February 1999 technical workshop, each proposed plan produced dramatically different payments due to different input assumptions.

9. There is a need to have one common interim model framework of analyses for review and discussion in order to implement the performance remedies plan.

10. To achieve a common model framework, the performance assessment model elements and the incentive plan elements need to be separated.

11. Since the task of accurately assessing the state of competitive conditions must be self-executing, the decision model must be able to automatically identify performance result levels that reveal competition barriers and that will trigger incentive payments.

12. There are two fundamental categories of performance measures that must be assessed to determine the existence of competitive conditions: "parity" and "benchmark" measures.

13. In identifying parity or non-parity, accurate remedies-plan decision-making involves more than accurately calculating average ILEC and CLEC performance and identifying non-parity if ILEC service to CLEC customers is significantly worse than ILEC service to ILEC customers.

14. Given that there is variability in ILEC performance in providing retail services to its own customers, a measurement showing inferior service to CLEC customers could be due either to this variability, or actual discrimination, or both.

15. Statistical testing allows estimation of decision accuracy, or in other words, calculation of the decision error probabilities.

16. These probabilities can then assist decision-making by quantifying the different error probabilities and comparing them to standards of confidence that the Commission wishes to apply.

17. Using measures of performance averages and variability, statistical analysis provides estimates of: (1) the probability that a result of a certain magnitude would be detected when it exists (test power and corresponding error beta) and (2) the probability that the result is due to random variation when in fact there are no differences (confidence level and corresponding error alpha).

18. Benchmarks have been constructed as tolerance limits.

19. The issues for statistical analysis accuracy of benchmarks are not the same as those for parity measures.

20. None of the presented models for parity assessment are acceptable in their entirety.

21. Three types of measurements have been developed for monitoring ILEC performance: averages, percentages, and rates.

22. Each measurement type requires a different statistical test or a variant of the same test.

23. All parties have agreed that a one-tailed statistical test should be used.

24. In response to the CLECs' concerns that ILEC discrimination could increase the CLEC variance, and thus make it more difficult to detect any discrimination, all parties agreed to use a Modified Z-test instead of the standard Z-test.

25. According to the statistical literature, requiring normally distributed data in the use of any Z-test may be only partially correct.

26. The Central Limit Theorem states that for large samples, non-normality in the data does not affect the test.

27. The permutation test has the potential for being a more accurate test that can handle small samples.

28. The Z-test often relies on the resulting sampling distributions being normal.

29. The few data examples we have available to us show the expected divergence for small samples, but not the expected convergence for larger samples, contrary to the theoretical expectation that the results should be the same for large sample sizes.

30. The results of the few available data examples raise doubts that the record is sufficiently developed to allow the Commission to confidently select the permutation test as a superior test for average-based measures.

31. In the interim, the Z-test is the most developed and accepted alternative to permutation testing.

32. The advantage of exact tests for the Commission's statistical model is two-fold: (1) calculations are made directly from the raw data, and (2) the exact tests have the potential to produce more accurate results for small samples.

33. Unlike for average-based permutation applications, outliers cannot affect the result of the Fisher Exact test, as the data consist only of "cell counts."

34. Additionally, unlike for average-based permutation applications, the results from the percentage-based Modified Z-test and the results from the Fisher's Exact Test converge towards equality as theoretically expected.

35. The Fisher's Exact Test generates computationaly difficult numbers that unnecessarily drain computer resources for no benefit in accuracy for large samples.

36. The Fisher's Exact Test is appropriate up to a limit of 1000 CLEC performance "hits" or "misses," and the Modified Z-test for proportions is appropriate for performance results above this limit.

37. Like the percentage-based Fisher's Exact test applications, and unlike for average-based permutation applications, the results from the rate-based Modified Z-test and the results from the binomial exact test converge towards equality as theoretically expected.

38. Balancing alpha and beta to be equal only ensures that the most accurate decision is made, not what the relative consequences of those decisions will be.

39. The record is relatively silent on the actual beta values that various critical alpha levels might produce.

40. The record is relatively silent on the appropriate test power or beta error level.

41. The record contains no information on what performance level deltas would be, because no party has submitted any proposal containing a comprehensive set of specific deltas.

42. A fixed alpha is not an adequate long-term solution.

43. Test power is very low for the small samples that represent the majority of the performance measure results.

44. Fixed alphas that provide better test power for small samples result in unnecessarily high test power for large samples.

45. A larger alpha level of 0.10, instead of the 0.05 level, enhances decision accuracy and avoids uncorrectable decision-making errors while still addressing correctable errors in the next phase of this proceeding.

46. A smaller alpha level than 0.15 is reasonable because of concerns about the effect on large-sample results.

47. An 80% confidence level (0.20 alpha) in the model for conditional failure identifications is warranted because of the high beta error still remaining when using the 0.10 alpha level.

48. Both record efforts to establish "material" thresholds have merit.

49. The "material difference" standard has merit and the potential to improve the decision model we specify.

50. Minimum sample size requirements vary depending upon the type of statistical test used.

51. Harmful ILEC performance in small new or innovative market niches, or harmful ILEC performance to smaller CLECs, could be masked by relying on assessments of larger market samples or larger CLEC samples when the results for CLECs are aggregated.

52. It is important to examine performance at the smaller market and smaller CLEC levels.

53. There are unresolved issues regarding minimum sample size and sample aggregation rules, and the rules for incentive payments are integrated with the aggregation rules.

54. Minimum sample size rules result in some data being discarded.

55. Our small sample aggregation rules avoid discarding any data and increase sample sizes for the very smallest samples with minimal impact on the actual results.

56. The previously proposed sample size rules are complicated and fall short of our goal of simplicity.

57. The fundamental problem with small sample sizes for parity measures is that they fail to satisfy the normality assumptions for the Modified Z-test.

58. Statistical texts indicate that the t-distribution is more appropriate for tests between two sample means, especially for small samples.

59. Using the t-distribution table would adjust for decreasing sample size.

60. Percentage and rate-based measures are assessed using exact tests, which do not depend on inferences or assumptions about underlying distributions.

61. A log transformation (1) brings the distributions much closer to normality, and (2) provides a reasonable interpretation of skewed data.

62. ILEC distribution normality is improved when log transformations are used.

63. Log transformations also change the effect of outliers.

64. Log transformation improves normality for large samples.

65. Log transformations provide a more appropriate Modified Z-test application than an application using data that is not transformed.

66. Although the ILECs and the CLECs agree to use a benchmark adjustment table, they disagree on two aspects of such tables, sample sizes to which they will be applied and sample sizes from which they will be derived.

67. A fixed derivation sample size results in varying levels of increased implied performance relative to the benchmark limit.

68. The appropriate application and derivation sample sizes vary with the benchmark level.

69. When the adjustment tables are used, the benchmarks are substantially lowered.

70. The application and derivation sample sizes recommended by staff in Appendix K, are more appropriate than the parties' proposals.

71. Benchmarks are absolute performance limits that define a "meaningful opportunity to compete."

72. Benchmarks already allow for random variation - no benchmark requires all services to be completed within a certain time period, and no benchmark sets an upper limit on any one service's outcome.

73. Performance measures that are correlated because they are redundant should be treated so that multiple payments are not made for the same failure.

74. No party wishes to implement a self-executing statistical correlation component to reduce payment for discrimination.

75. Parties presented correlation analysis only as an abstract concept; no implementable plans were described or proposed.

76. Allowing retroactive adjustments would nullify the self-executing nature of the performance remedies plan.

77. Reading "negative" values to represent negative outcomes is intuitively understandable whereas the reverse is not.

78. The present fully implementable model is an interim one that will generate incentive payments once we have added the incentive components in the next phase of this proceeding.

Conclusions of Law

1. Parity means that the ILEC is providing services to the CLECs in substantially the same period of time and manner (including quality) as it is providing to itself.

2. This Commission endeavors to ensure that the CLECs have OSS access that is at least equal to the ILECs' own access.

3. One interim performance remedies plan model and set of explicit assumptions would allow common quantitative analyses to be performed and estimates to be developed.

4. A single model approach would allow the Commission to make informed and fair policy decisions about the performance remedies plan.

5. A single model approach focuses on the goal of parity service by the ILECs, economic incentives paid by the ILECs, and/or a change in ILECs' operations support to the CLECs.

6. A single interim model and a single set of explicit assumptions should allow calibration of economic outcomes both before and after a six-month pilot test period using actual empirical data.

7. The interim pilot test period will assist the Commission in determining the appropriate levels of long-term economic incentives.

8. Long-term incentive impacts can be calibrated in relation to one model, one common set of assumptions, and actual test period empirical data.

9. Statistical testing should be used to assess the balance between finding and preventing actual barriers, and avoiding the identification of barriers when they do not exist, thus enabling greater decision quality and attainment of legislative goals.

10. A new "hybrid" of elements from each of the different models presented in this proceeding constitutes the most appropriate performance remedies statistical model.

11. Consistent with academic texts and with the FCC's view of the appropriate statistical application regarding the requirements of the Act, a one-tailed test is appropriate for situations where there is only interest in outcomes in one direction, in this case where the CLEC performance results are worse than the ILEC results.

12. The selection of the appropriate test for small samples should be based on the relative accuracy of the different tests.

13. It is reasonable for our sample aggregation rules to act as an interim solution and a "floor" for sample sizes.

14. Evidence in this proceeding is compelling that normality cannot be assumed for small samples since measures of time-delay are commonly skewed - the distribution is "bunched up" for shorter delays, and tapers off slowly for longer delays.

15. Until the Commission can determine which test is the more appropriate treatment of the data, including underlying issues such as "production output" versus "larger process population sampling" and more specific issues regarding outlier treatment, it is not reasonable to either approve or order use of the permutation test.

16. There is a need to better understand what the appropriate sample sizes are for using the permutation test versus the Modified Z or t-test.

17. Since there are unresolved questions surrounding the potential of the permutation test, the active interested parties in this proceeding should collaboratively conduct or fund a research inquiry to answer these unresolved questions.

18. In the case of the percentage-based performance results data, the Fisher's Exact test is appropriate.

19. The Fisher's Exact test should be used for percentage-based performance results because it provides accurate decision error probabilities, is consistent with theoretical assumptions, solves the Z-test application problems, and generates no objections from the parties.

20. The binomial exact test should be used for rate-based performance results because it provides accurate decision error probabilities, is consistent with theoretical exceptions, solves the Z-test application problems, is preferred by most parties, and generates no objections from any party.

21. The question of relative risk is more appropriately addressed in this proceeding's next phase, which will establish the "consequences" for the performance decisions made in the present phase.

22. To remedy the lack of critical record information, it is reasonable to direct the ILECs to calculate both alpha and beta values whenever a statistical test is applied.

23. As a general policy statement, it is reasonable to assume that a Type II error is at least as important as a Type I error. Apparent discrepancies can be adjusted in the incentive payment phase.

24. It is reasonable that the problems of insufficient test power for small samples (large beta) and "too much" test power for large samples can be better resolved through even approximate alpha/beta balancing techniques.

25. A fixed alpha critical value in the model should only be used as an interim decision-criterion solution.

26. The 90% confidence level (0.10 alpha, or 10% significance level) should be adopted in the statistical model to control the Type I error and to reduce the Type II error to more acceptable levels for the preponderance of the performance results.

27. The 80% confidence level (0.20 alpha) should be adopted in the statistical model for conditional failure identifications because of the low power of these tests.

28. The parties should be directed to devise and propose specific conditional failure identifications in the next phase of this proceeding.

29. One goal of the performance remedies plan is to assess each CLEC's performance results for each submeasure.

30. The smaller market and smaller CLEC levels may be critical for entry and innovation, which in turn are critical to a healthy competitive telecommunications infrastructure.

31. Consistent with the academic justification of the Modified Z-test, the test statistic should be compared to the t-distribution.

32. The small sample aggregation rules we have designed should be easily understood with the results easily reproduced.

33. To assess performance subject to the performance remedies plan, statistical analysis and decision rules should be applied to all data, including sample sizes as small as one case, after our small sample aggregation rules are applied.

34. How payments will be triggered or allocated under the aggregation rules should be addressed in the upcoming incentives phase.

35. All percentage and rate-based data at the submeasure level for each CLEC should be analyzed for parity regardless of small sample sizes since exact tests are accurate for all sample sizes.

36. Staff's analyses of several ILEC and CLEC distributions demonstrate that even in cases where the log transformation dramatically changes results from the non-transformed data, the transformed results are reasonable and appropriate treatments of the performance data.

37. Log transformations of the data should not be ordered on a permanent basis until the record is adequately developed in subsequent phases of this proceeding.

38. More exact tests should be used in addressing small sample size issues, if subsequent research shows them to be appropriate.

39. The log transformation is reasonable and appropriate, and is necessary at least as an interim solution for application of the Modified Z-test to small to moderately large samples.

40. Log transformations should be utilized for all average-based performance measures as specified in Appendix J.

41. The meaning of outliers should be discussed in the incentives phase of this proceeding.

42. Because of the legitimacy of the benchmark small sample problem, and since the CLECs have agreed to some adjustments, a benchmark small sample adjustment table should be ordered as part of the decision model.

43. It is appropriate to set different application sample sizes for different benchmark percentage levels.

44. The implied performance level should address what is analogous to a Type I error without disproportionately increasing what is analogous to a Type II error.

45. The ILECs should use the small sample adjustment tables presented in Appendix K.

46. If any benchmark is inconsistent with the performance definition "a meaningful opportunity to compete," it should be adjusted directly rather than add all the complexities and ambiguities that a new statistical overlay would create.

47. Benchmarks should be treated as tolerance limits; however, the issue may be re-examined in the incentive payment phase.

48. A review and revision of the benchmarks should not be ordered at this time because it could be more cumbersome than using adjustment tables with the current benchmarks, and establishing benchmarks is the subject of a different proceeding.

49. Since parties recognize that a statistical correlation alone cannot distinguish between failure redundancy and multiple instances of independent discrimination, we should not order any statistical correlation component to our self-executing performance remedies plan model.

50. Any party seeking to have a correlation plan considered in the next phase of this proceeding should describe the plan down to the level of detail that will allow implementation. Parties should provide numerical examples so there is no misunderstanding about the necessary specificity of the plan.

51. The parties should present proposals by the end of the trial period that would put into effect the monitoring and analysis of certain performance data for trends over time.

52. The same performance remedies model should be applied to both Pacific and Verizon CA in the interest of fairness.

53. Since some "calibration" with actual data will be helpful in assessing our decision model and its effects on the overall plan, a calibration period should be ordered to occur simultaneously with the incentive payment setting phase of this proceeding before the trial period begins.

54. Allowing retroactive payment alteration will make the already difficult decision model development task more cumbersome.

55. Incentive payment amounts should not be altered retroactively.

56. Following a six-month trial period, to be specified in the incentive payment phase of this proceeding, the performance of the remedies plan model should be reviewed and any component determined to need changing should be adjusted.

57. A fully implementable interim model should be utilized while gaining the experience necessary for future development of a permanent model.

58. This decision should become effective immediately so that the calibration process can begin and the incentive payment phase may proceed.

INTERIM ORDER

IT IS ORDERED that:

1. A performance remedies plan decision model, which identifies performance failures and non-failures, as specified in Appendix C incorporated by reference herein, shall be adopted for Pacific Bell (Pacific) and Verizon California Inc. (Verizon CA).

2. The performance remedies plan, comprised of the decision model adopted herein and an incentive payment component that will be determined in the next phase of this proceeding, shall be implemented for a trial period of six months.

3. Pacific and Verizon CA shall use the Modified t-test for average-based parity performance measures.

4. Log transformations shall be utilized for all average-based performance measures as specified in Appendix J.

5. Pacific, Verizon CA and the active interested competitive local exchange carriers (CLECs) in Rulemaking 97-10-016/Investigation 97-10-017 shall collectively conduct or fund a research inquiry into whether the permutation test or the Modified t-test is the more appropriate treatment of the data, including but not limited to underlying issues such as "production output" versus "larger process population sampling" and more specific issues regarding outlier treatment. The inquiry shall adopt a collaborative research approach so that all interested parties can collectively influence the research proposal.

6. The Fisher's Exact test shall be used for all percentage-based parity results except for those that cannot be computed because of large numbers. Results where the CLEC numerator exceeds 1000 shall be calculated with the Modified Z-test for proportions.

7. The binomial exact test shall be used for all rate-based tests.

8. The performance remedies plan model shall be constructed so that negative Z and t-values represent potential discrimination.

9. Pacific and Verizon CA shall calculate and report both Type I (alpha) and Type II (beta) error values whenever a statistical test is applied.

10. The parties shall collaboratively develop and implement an alpha/beta balancing procedure for the statistical model adopted herein and detailed in Appendix G no later than the end of the trial period, unless the parties reach agreement and jointly move to implement the components sooner.

11. If the parties are unable to agree on an alpha/beta balancing decision component for the model by the end of the trial period, the parties shall submit their individual models for Commission review and decision as directed by the assigned Commissioner and/or assigned Administrative Law Judge.

12. Until an alpha/beta balanced criterion is established, fixed alpha critical values shall be adopted for the interim.

13. A 90% confidence level (0.10 alpha, or 10% significance level) shall be adopted as the interim fixed critical value in the statistical model for failure identifications.

14. An 80% confidence level (0.20 alpha) shall be adopted in the statistical model as the fixed critical value to identify conditional failures. This value may be considered an interim value if alpha/beta balancing is established as one of the conditional failures, and if an overall alpha/beta balancing methodology is subsequently adopted for all failure identifications.

15. The parties shall devise and propose specific conditional failure identifications in the next phase of this proceeding.

16. Except for rare submeasures identified in Appendix H, Attachment 1, the following small sample aggregation rules shall be used for average-based parity performance measures: (1) For each submeasure, all samples with one to four cases shall be aggregated with each other; and (2) statistical analyses and decision rules shall be applied to determine performance subject to the performance remedies plan for all samples after the aggregation in step (1), regardless of sample size.

17. Rare submeasures identified in Appendix H, Attachment 1, shall be analyzed without aggregation and regardless of sample size.

18. How payments will be triggered or allocated under the aggregation rules shall be addressed in the upcoming incentives phase.

19. All percentage and rate-based data at the submeasure level for each CLEC shall be analyzed for parity without aggregation and regardless of sample size.

20. Pacific and Verizon CA shall use the small sample adjustment tables presented in Appendix K.

21. Benchmarks shall be treated as tolerance limits; however, the issue may be re-examined in the incentive payment phase.

22. Pacific, Verizon CA and any interested parties shall present proposals by the end of the trial period that would put into effect the monitoring and analysis of certain performance data for trends over time.

23. The same performance remedies model shall be applied to Pacific and Verizon CA.

24. A calibration period shall occur simultaneously with the incentive payment setting phase of this proceeding before the trial period begins.

25. Following a six-month trial period, to be specified in the incentive payment phase of this proceeding, we shall review the performance of the remedies plan model and adjust any component that we determine needs changing.

This order is effective today.

Dated , at San Francisco, California.

Appendix A

Assigned Commissioner's Ruling Questions

Tests for Determining Compliance with Parity

1. A standardized Z-test is proposed for purposes of determining compliance with parity. Explain why this standard textbook statistical test cannot serve as a measurement tool at least for the duration of the six-month trial pilot test period? Keep in mind that the incentive phase of the model can calibrate for measurement outcomes through various incentive plan structures and amounts.

2. Benchmark measures without any statistical tests are proposed for purposes of determining a performance failure. Explain why this simple approach cannot serve as a measurement tool at least for the duration of the six-month trial pilot test period? Keep in mind that the incentive phase of the model can incorporate information on underlying data values and distributions.

Minimum Sample Sizes

1. A minimum sample size of thirty, aggregated in up to three-month time periods, is proposed. Explain why this standard textbook statistical proposal cannot serve as a minimum sample size rule at least for the duration of the six-month trial test period? Keep in mind that the test would still be performed using whatever sample size is achieved at the end of three months.

Alpha Levels/Critical Values

Ten percent Type I alpha level for parity tests is proposed. Explain why this standard textbook statistical proposal cannot serve as an alpha level/critical value rule at least for the duration of the six-month trial pilot test period? Again, keep in mind that the penalty phase of the plan can calibrate the size of the payments as a function of the critical values.

(END OF APPENDIX A)

See CPUC Formal File for the Complete Set of Appendices.