We develop early stopping rules for growing regression tree estimators. The fully data-driven stopping rule is based on monitoring the global residual norm. The best-first search and the breadth-first search algorithms together with linear interpolation give rise to generalized projection or regularization flows. A general theory of early stopping is established. Oracle inequalities for the early-stopped regression tree are derived without any smoothness assumption on the regression function, assuming the original CART splitting rule, yet with a much broader scope. The remainder terms are of smaller order than the best achievable rates for Lipschitz functions in dimension d≥2. In real and synthetic data the early stopping regression tree estimators attain the statistical performance of cost-complexity pruning while significantly reducing computational costs.
We propose:
- generalised projection flow as a unifying framework for early stopped smoothing estimators
- oracle inequalities for the early stopped CART estimator
- ready-to-use algorithm for practice by using nearest neighbour estimator for the noise level as a stopping threshold
Hey there! I am a PhD student in statistics at HU Berlin. Please feel free to contact me at any time via miftachr@hu-berlin.de. My research interests are explainability, nonparametric statistics and inference.