Nonparametric Regression Methods for Longitudinal Data Analysis: Mixed-Effects Modeling Approaches

Preface

Non-parametric regression methods for longitudinal data analysis have been a popular statistical research topic since the late 1990s. The needs of longitudinal data analysis from biomedical research and other scientific areas along with the recognition of the limitation of parametric models in practical data analysis have driven the development of more innovative non-parametric regression methods. Because of the flexibility in the form of regression models, non-parametric modeling approaches can play an important role in exploring longitudinal data, just as they have done for independent cross-sectional data analysis. Mixed-effects models are powerful tools for longitudinal data analysis. Linear mixed-effects models, nonlinear mixed effects models and generalized linear mixed-effects models have been well developed to model longitudinal data, in particular, for modeling the correlations and within subject/ between-subject variations of longitudinal data. The purpose of this book is to survey the non-parametric regression techniques for longitudinal data analysis which are widely scattered throughout the literature, and more importantly, to systematically investigate the incorporation of mixed-effects modeling techniques into various non-parametric regression models.

The focus of this book is on modeling ideas and inference methodologies, although we also present some theoretical results for the justification of the proposed methods. The data analysis examples from biomedical research are used to illustrate the methodologies throughout the book. We regard the application of the statistical modeling technologies to practical scientific problems as important. In this book, we mainly concentrate on the major non-parametric regression and smoothing methods including local polynomial, regression spline, smoothing spline and penalized spline

Chapter 1 provides a brief overview of the book chapters, and in particular, presents data examples from biomedical research studies which have motivated the use of non-parametric regression analysis approaches. Chapters 2 and 3 review mixed-effects models and non-parametric regression methods, the two important building blocks of the proposed modeling techniques. Chapters 4~7 present the core contents of this book with each chapter covering one of the four major non-parametric regression methods including local polynomial, regression spline, smoothing spline and penalized spline. Chapters 8 and 9 extend the modeling techniques in Chapters 4??7 to semi-parametric and time varying coefficient models for longitudinal data analysis. The last chapter, Chapter 10, covers discrete longitudinal data modeling and analysis.

Most of the contents of this book should be comprehensible to readers with some basic statistical training. Advanced mathematics and technical skills are not necessary for understanding the key modeling ideas and for applying the analysis methods to practical data analysis. The materials in Chapters 1??7 can be used in a lower or medium level graduate course in statistics or biostatistics. Chapters 8-10 can be used in a higher level graduate course or as reference materials for those who intend to do research in this area.

We have tried our best to acknowledge the work of many investigators who have contributed to the development of the models and methodologies for non-parametric regression analysis of longitudinal data. However, it is beyond the scope of this project to prepare an exhaustive review of the vast literature in this active research field and we regret any oversight or omissions of particular authors or publications.

We would like to express our sincere thanks to Ms. Jeanne Holden-Wiltse for helping us with polishing and editing the manuscript. We are grateful to Ms. Susanne Steitz and Mr. Steve Quigley at John Wiley & Sons, Inc. who have made great efforts in coordinating the editing, review, and finally the publishing of this book. We would like to thank our colleagues, collaborators and friends, Zongwu Cai, Raymond Carroll, Jianqing Fan, Kai-Tai Fang, Hua Liang, James S. Marron, Yanqing Sun, Yuedong Wang, and Chunming Zhang for their fruitful collaborations and valuable inspirations. Thanks also go to Ollivier Hyrien, Hua Liang, Sally Thurston, and Naisyin Wang for their review and comments on some chapters of the book. We thank our families and loved ones who provided strong support and encouragement during the writing process of this book. We are grateful to our teachers and academic mentors, Fred W. Huffer, Jinhuai Zhang, Jianqing Fan, Kai-Tai Fang and James S. Marron, for guiding us to the beauty of statistical research. J.-T. Zhang also would like to acknowledge Professors Zhidong Bai, Louis H. Y. Chen, Kwok Pui Choi and Anthony Y. C. Kuk for their support and encouragement.

Wu’s research was partially supported by grants from the National Institute of Allergy and Infectious Diseases, the National Institutes of Health (NIH). Zhang’s research was partially supported by the National University of Singapore Academic Research grant R-155-000-038-112. The book was written with partial support from the Department of Biostatistics and Computational Biology, University of Rochester, where the second author was a Visiting Professor.

Hulin Wu
Department of Biostatistics and Computational Biology
University of Rochester
Rochester, NY, USA

and

Jin-Ting Zhang
National University of Singapore
Department of Statistics and Applied Probability
Singapore

Contents

1: Introduction

     1.1 Motivating Longitudinal Data Examples
          1.1.1 Progesterone Data
          1.1.2 ACTG 388 Data
          1.1.3 MACS Data

     1.2 Mixed-Effects Modeling: from Parametric to Nonparametric
          1.2.1 Parametric Mixed-Effects Models
          1.2.2 Nonparametric Regression and Smoothing
          1.2.3 Nonparametric Mixed-Effects Models

     1.3 Scope of the Book
          1.3.1 Building Blocks of the NPME Models
          1.3.2 Fundamental Development of the NPME Models
          1.3.3 Further Extensions of the NPME Models

     1.4 Implementation of Methodologies

     1.5 Options for Reading This Book

     1.6 Bibliographical Notes

2: Parametric Mixed-Effects Models

     2.1 Introduction

     2.2 Linear Mixed-Effects Model
          2.2.1 Model Specification
          2.2.2 Estimation of Fixed and Random-Effects
          2.2.3 Bayesian Interpretation
          2.2.4 Estimation of Variance Components
          2.2.5 The EM-Algorithms

     2.3 Nonlinear Mixed-Effects Model
          2.3.1 Model Specification
          2.3.2 Two-Stage Method
          2.3.3 First-Order Linearization Method
          2.3.4 Conditional First-Order Linearization Method

     2.4 Generalized Mixed-Effects Model
          2.4.1 Generalized Linear Mixed-Effects Model
          2.4.2 Examples of GLME Model
          2.4.3 Generalized Nonlinear Mixed-Effects Model

     2.5 Summary and Bibliographical Notes

     2.6 Appendix: Proofs

3 Nonparametric Regression Smoothers

     3.1 Introduction

     3.2 Local Polynomial Kernel Smoother
          3.2.1 General Degree LPK Smoother
          3.2.2 Local Constant and Linear Smoothers
          3.2.3 Kernel Function
          3.2.4 Bandwidth Selection
          3.2.5 An Illustrative Example

     3.3 Regression Splines 50
          3.3.1 Truncated Power Basis
          3.3.2 Regression Spline Smoother
          3.3.3 Selection of Number and Location of Knots
          3.3.4 General Basis-Based Smoother

     3.4 Smoothing Splines
          3.4.1 Cubic Smoothing Splines
          3.4.2 General Degree Smoothing Splines
          3.4.3 Connection between a Smoothing Spline and a LME Model
          3.4.4 Connection between a Smoothing Spline and a State-Space Model
          3.4.5 Choice of Smoothing Parameters

     3.5 Penalized Splines
          3.5.1 Penalized Spline Smoother
          3.5.2 Connection between a Penalized Spline and a LME Model
          3.5.3 Choice of the Knots and Smoothing Parameter Selection
          3.5.4 Extension

     3.6 Linear Smoother

     3.7 Methods for Smoothing Parameter Selection
          3.7.1 Goodness of Fit
          3.7.2 Model Complexity
          3.7.3 Cross-Validation
          3.7.4 Generalized Cross-Validation
          3.7.5 Generalized Maximum Likelihood
          3.7.6 Akaike Information Criterion
          3.7.7 Bayesian Information Criterion

     3.8 Summary and Bibliographical Notes

4 Local Polynomial Methods

     4.1 Introduction 71

     4.2 Nonparametric Population Mean Model
          4.2.1 Naive Local Polynomial Kernel Method
          4.2.2 Local Polynomial Kernel GEE Method
          4.2.3 Fan-Zhang’s Two-Step Method

     4.3 Nonparametric Mixed-Effects Model

     4.4 Local Polynomial Mixed-Effects Modeling
          4.4.1 Local Polynomial Approximation
          4.4.2 Local Likelihood Approach
          4.4.3 Local Marginal Likelihood Estimation
          4.4.4 Local Joint Likelihood Estimation
          4.4.5 Component Estimation
          4.4.6 A Special Case: Local Constant Mixed-Effects Model

     4.5 Choosing Good Bandwidths
          4.5.1 Leave-One-Subject-Out Cross-Validation
          4.5.2 Leave-One-Point-Out Cross-Validation
          4.5.3 Bandwidth Selection Strategies

     4.6 LPME Backfitting Algorithm 90

     4.7 Asymptotical Properties of the LPME Estimators

     4.8 Finite Sample Properties of the LPME Estimators
          4.8.1 Comparison of the LPME Estimators in Section 4.5.3
          4.8.2 Comparison of Different Smoothing Methods
          4.8.3 Comparisons of BCHB-Based versus Backfitting-Based LPME Estimators

     4.9 Application to the Progesterone Data

     4.10 Summary and Bibliographical Notes

     4.11 Appendix: Proofs
          4.11.1 Conditions
          4.11.2 Proofs

5 Regression Spline Methods

     5.1 Introduction

     5.2 Naive Regression Splines
          5.2.1 The NRS Smoother
          5.2.2 Variability Band Construction
          5.2.3 Choice of the Bases
          5.2.4 Knot Locating Methods
          5.2.5 Selection of the Number of Basis Functions
          5.2.6 Example and Model Checking
          5.2.7 Comparing GCV against SCV

     5.3 Generalized Regression Splines
          5.3.1 The GRS Smoother
          5.3.2 Variability Band Construction
          5.3.3 Selection of the Number of Basis Functions
          5.3.4 Estimating the Covariance Structure

     5.4 Mixed-Effects Regression Splines
          5.4.1 Fits and Smoother Matrices
          5.4.2 Variability Band Construction
          5.4.3 No-Effect Test
          5.4.4 Choice of the Bases
          5.4.5 Choice of the Number of Basis Functions
          5.4.6 Example and Model Checking

     5.5 Comparing MERS against NRS
          5.5.1 Comparison via the ACTG 388 Data
          5.5.2 Comparison via Simulations

     5.6 Summary and Bibliographical Notes

     5.7 Appendix: Proofs

6 Smoothing Splines Methods

     6.1 Introduction

     6.2 Naive Smoothing Splines
          6.2.1 The NSS Estimator
          6.2.2 Cubic NSS Estimator
          6.2.3 Cubic NSS Estimator for Panel Data
          6.2.4 Variability Band Construction
          6.2.5 Choice of the Smoothing Parameter
          6.2.6 NSS Fit as BLUP of a LME Model
          6.2.7 Model Checking

     6.3 Generalized Smoothing Splines
          6.3.1 Constructing a Cubic GSS Estimator
          6.3.2 Variability Band Construction
          6.3.3 Choice of the Smoothing Parameter
          6.3.4 Covariance Matrix Estimation
          6.3.5 GSS Fit as BLUP of a LME Model

     6.4 Extended Smoothing Splines
          6.4.1 Subject-Specific Curve Fitting
          6.4.2 The ESS Estimators
          6.4.3 ESS Fits as BLUPs of a LME Model
          6.4.4 Reduction of the Number of Fixed-Effects Parameters

     6.5 Mixed-Effects Smoothing Splines
          6.5.1 The Cubic MESS Estimators
          6.5.2 Bayesian Interpretation
          6.5.3 Variance Components Estimation
          6.5.4 Fits and Smoother Matrices
          6.5.5 Variability Band Construction
          6.5.6 Choice of the Smoothing Parameters
          6.5.7 Application to the Conceptive Progesterone Data

     6.6 General Degree Smoothing Splines
          6.6.1 General Degree NSS
          6.6.2 General Degree GSS
          6.6.3 General Degree ESS
          6.6.4 General Degree MESS
          6.6.5 Choice of the Bases

     6.7 Summary and Bibliographical Notes

     6.8 Appendix: Proofs

7 Penalized Spline Methods

     7.1 Introduction

     7.2 Naive P-Splines
          7.2.1 The NPS Smoother
          7.2.2 NPS Fits and Smoother Matrix
          7.2.3 Variability Band Construction
          7.2.4 Degrees of Freedom
          7.2.5 Smoothing Parameter Selection
          7.2.6 Choice of the Number of Knots
          7.2.7 NPS Fit as BLUP of a LME Model

     7.3 Generalized P-Splines 203
          7.3.1 Constructing the GPS Smoother
          7.3.2 Degrees of Freedom
          7.3.3 Variability Band Construction
          7.3.4 Smoothing Parameter Selection
          7.3.5 Choice of the Number of Knots
          7.3.6 GPS Fit as BLUP of a LME Model
          7.3.7 Estimating the Covariance Structure

     7.4 Extended P-Splines
          7.4.1 Subject-Specific Curve Fitting
          7.4.2 Challenges for Computing the EPS Smoothers
          7.4.3 EPS Fits as BLUPs of a LME Model

     7.5 Mixed-Effects P-Splines
          7.5.1 The MEPS Smoothers
          7.5.2 Bayesian Interpretation
          7.5.3 Variance Components Estimation
          7.5.4 Fits and Smoother Matrices
          7.5.5 Variability Band Construction
          7.5.6 Choice of the Smoothing Parameters
          7.5.7 Choosing the Numbers of Knots

     7.6 Summary and Bibliographical Notes

     7.7 Appendix: Proofs

8 Semiparametric Models

     8.1 Introduction

     8.2 Semiparametric Population Mean Model
          8.2.1 Model Specification
          8.2.2 Local Polynomial Method
          8.2.3 Regression Spline Method
          8.2.4 Penalized Spline Method
          8.2.5 Smoothing Spline Method
          8.2.6 Methods Involving No Smoothing
          8.2.7 MACS Data

     8.3 Semiparametric Mixed-Effects Model
          8.3.1 Model Specification
          8.3.2 Local Polynomial Method
          8.3.3 Regression Spline Method
          8.3.4 Penalized Spline Method
          8.3.5 Smoothing Spline Method
          8.3.6 ACTG 388 Data Revisited
          8.3.7 MACS Data Revisted

     8.4 Semiparametric Nonlinear Mixed-Effects Model
          8.4.1 Model Specification
          8.4.2 Wu and Zhang’s Approach
          8.4.3 Ke and Wang’s Approach
          8.4.4 Generalizations of Ke and Wang’s Approach

     8.5 Summary and Bibliographical Notes

9 Time-Varying Coefficient Models

     9.1 Introduction

     9.2 Time-Varying Coefficient NPM Model
          9.2.1 Local Polynomial Kernel Method
          9.2.2 Regression Spline Method
          9.2.3 Penalized Spline Method
          9.2.4 Smoothing Spline Method
          9.2.5 Smoothing Parameter Selection
          9.2.6 Backfitting Algorithm
          9.2.7 Two-Step Method
          9.2.8 TVC-NPM Models with Time-Independent Covariates
          9.2.9 MACS Data
          9.2.10 Progesterone Data

     9.3 Time-Varying Coefficient SPM Model

     9.4 Time-Varying Coefficient NPME Model
          9.4.1 Local Polynomial Method
          9.4.2 Regression Spline Method
          9.4.3 Penalized Spline Method
          9.4.4 Smoothing Spline Method
          9.4.5 Backfitting Algorithms
          9.4.6 MACS Data Revisted
          9.4.7 Progesterone Data Revisted

     9.5 Time-Varying Coefficient SPME Model
          9.5.1 Backfitting Algorithm
          9.5.2 Regression Spline Method

     9.6 Summary and Bibliographical Notes

10 Discrete Longitudinal Data

     10.1 Introduction

     10.2 Generalized NPM Model

     10.3 Generalized SPM Model

     10.4 Generalized NPME Model
          10.4.1 Penalized Local Polynomial Estimation
          10.4.2 Bandwidth Selection
          10.4.3 Implementation
          10.4.4 Asymptotic Theory
          10.4.5 Application to an AIDS Clinical Study

     10.5 Generalized TVC-NPME Model

     10.6 Generalized SAME Model

     10.7 Summary and Bibliographical Notes

     10.8 Appendix: Proofs