图书介绍

STATISTICS FOR HIGH-DIMENSIONAL DATA METHODS2025|PDF|Epub|mobi|kindle电子书版本百度云盘下载

STATISTICS FOR HIGH-DIMENSIONAL DATA METHODS
  • THEORY AND APPLICATIONS 著
  • 出版社: SPRINGER
  • ISBN:7519211677
  • 出版时间:2016
  • 标注页数:556页
  • 文件大小:83MB
  • 文件页数:574页
  • 主题词:

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

STATISTICS FOR HIGH-DIMENSIONAL DATA METHODSPDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

1 Introduction1

1.1 The framework1

1.2 The possibilities and challenges2

1.3 About the book3

1.3.1 Organization of the book3

1.4 Some examples4

1.4.1 Prediction and biomarker discovery in genomics5

2 Lasso for linear models7

2.1 Organization of the chapter7

2.2 Introduction and preliminaries8

2.2.1 The Lasso estimator9

2.3 Orthonormaldesign10

2.4 Prediction11

2.4.1 Practical aspects about the Lasso for prediction12

2.4.2 Some results from asymptotic theory13

2.5 Variable screening and||^β-β0||q-norms14

2.5.1 Tuning parameter selection for variable screening17

2.5.2 Motif regression for DNA binding sites18

2.6 Variable selection19

2.6.1 Neighborhood stability and irrepresentable condition22

2.7 Key properties and corresponding assumptions:a summary23

2.8 The adaptive Lasso:a two-stage procedure25

2.8.1 An illustration:simulated data and motif regression25

2.8.2 Orthonormal design27

2.8.3 The adaptive Lasso:variable selection under weak conditions28

2.8.4 Computation29

2.8.5 Multi-step adaptive Lasso30

2.8.6 Non-convex penalty functions32

2.9 Thresholdingthe Lasso33

2.10 The relaxed Lasso34

2.11 Degrees of freedom of the Lasso34

2.12 Path-following algorithms36

2.12.1 Coordinatewise optimization and shooting algorithms38

2.13 Elastic net:an extension41

Problems42

3 Generalized linear models and the Lasso45

3.1 Organization of the chapter45

3.2 Introduction and preliminaries45

3.2.1 The Lasso estimator:penalizing the negative log-likelihood46

3.3 Important examples of generalized linear models47

3.3.1 Binary response variable and logistic regression47

3.3.2 Poisson regression49

3.3.3 Multi-category response variable and multinomial distribution50

Problems53

4 The group Lasso55

4.1 Organization of the chapter55

4.2 Introduction and preliminaries56

4.2.1 The group Lasso penalty56

4.3 Factor variables as covariates58

4.3.1 Prediction of splice sites in DNA sequences59

4.4 Properties of the group Lasso for generalized linear models61

4.5 The generalized group Lasso penalty64

4.5.1 Groupwise prediction penalty and parametrization invariance65

4.6 The adaptive group Lasso66

4.7 Algorithms for the group Lasso67

4.7.1 Block coordinate descent68

4.7.2 Block coordinate gradient descent72

Problems75

5 Additive models and many smooth univariate functions77

5.1 Organization of the chapter77

5.2 Introduction and preliminaries78

5.2.1 Penalized maximum likelihood for additive models78

5.3 The sparsity-smoothness penalty79

5.3.1 Orthogonal basis and diagonal smoothing matrices80

5.3.2 Natural cubic splines and Sobolev spaces81

5.3.3 Computation82

5.4 A sparsity-smoothness penalty of group Lasso type85

5.4.1 Computational algorithm86

5.4.2 Alternative approaches88

5.5 Numerical examples89

5.5.1 Simulated example89

5.5.2 Motif regression90

5.6 Prediction and variable selection91

5.7 Generalized additive models92

5.8 Linear model with varying coefficients93

5.8.1 Properties for prediction95

5.8.2 Multivariate linear model95

5.9 Multitask learning95

Problems97

6 Theory for the Lasso99

6.1 Organization of this chapter99

6.2 Least squares and the Lasso101

6.2.1 Introduction101

6.2.2 The result assuming the truth is linear102

6.2.3 Linear approximation of the truth108

6.2.4 A further refinement:handling smallish coefficients112

6.3 The setup for general convex loss114

6.4 The margin condition119

6.5 Generalized linear model without penalty122

6.6 Consistency of the Lasso for general loss126

6.7 An oracle inequality128

6.8 The lq-error for 1≤q≤2135

6.8.1 Application to least squares assuming the truth is linear136

6.8.2 Application to general loss and a sparse approximation of the truth137

6.9 The weighted Lasso139

6.10 The adaptively weighted Lasso141

6.11 Concave penalties144

6.11.1 Sparsity oracle inequalities for least squares with lr-penalty146

6.11.2 Proofs for this section(Section 6.11)147

6.12 Compatibility and(random)matrices150

6.13 On the compatibility condition156

6.13.1 Direct bounds for the compatibility constant158

6.13.2 Bounds using ||βS||2 1≤s||βS||2 2161

6.13.3 Sets N containing S167

6.13.4 Restricted isometry169

6.13.5 Sparse eigenvalues170

6.13.6 Further coherence notions172

6.13.7 An overview of the various eigenvalue flavored constants174

Problems178

7 Variable selection with the Lasso183

7.1 Introduction183

7.2 Some results from literature184

7.3 Organization of this chapter185

7.4 The beta-min condition187

7.5 The irrepresentable condition in the noiseless case189

7.5.1 Definition of the irrepresentable condition190

7.5.2 The KKT conditions190

7.5.3 Necessity and sufficiency for variable selection191

7.5.4 The irrepresentable condition implies the compatibility condition195

7.5.5 The irrepresentable condition and restricted regression197

7.5.6 Selecting a superset of the true active set199

7.5.7 The weighted irrepresentable condition200

7.5.8 The weighted irrepresentable condition and restricted regression201

7.5.9 The weighted Lasso with “ideal” weights203

7.6 Definition of the adaptive and thresholded Lasso204

7.6.1 Definition of adaptive Lasso204

7.6.2 Definition of the thresholded Lasso205

7.6.3 Order symbols206

7.7 A recollection of the results obtained in Chapter 6206

7.8 The adaptive Lasso and thresholding:invoking sparse eigenvalues210

7.8.1 The conditions on the tuning parameters210

7.8.2 The results211

7.8.3 Comparison with the Lasso213

7.8.4 Comparison between adaptive and thresholded Lasso214

7.8.5 Bounds for the number of false negatives215

7.8.6 Imposing beta-min conditions216

7.9 The adaptive Lasso without invoking sparse eigenvalues218

7.9.1 The condition on the tuning parameter219

7.9.2 The results219

7.10 Some concluding remarks221

7.11 Technical complements for the noiseless case without sparse eigenvalues222

7.11.1 Prediction error for the noiseless(weighted)Lasso222

7.11.2 The number of false positives of the noiseless(weighted)Lasso224

7.11.3 Thresholding the noiseless initial estimator225

7.11.4 The noiseless adaptive Lasso227

7.12 Technical complements for the noisy case without sparse eigenvalues232

7.13 Selection with concave penalties237

Problems241

8 Theory for l1 /l2-penalty procedures249

8.1 Introduction249

8.2 Organization and notation of this chapter250

8.3 Regression with group structure252

8.3.1 The loss function and penalty253

8.3.2 The empirical process254

8.3.3 The group Lasso compatibility condition255

8.3.4 A group Lasso sparsity oracle inequality256

8.3.5 Extensions258

8.4 High-dimensional additive model258

8.4.1 The loss function and penalty258

8.4.2 The empirical process260

8.4.3 The smoothed Lasso compatibility condition264

8.4.4 A smoothed group Lasso sparsity oracle inequality265

8.4.5 On the choice of the penalty270

8.5 Linear model with time-varying coefficients275

8.5.1 The loss function and penalty275

8.5.2 The empirical process277

8.5.3 The compatibility condition for the time-varying coefficients model278

8.5.4 A sparsity oracle inequality for the time-varying coefficients model279

8.6 Multivariate linear model and multitask learning281

8.6.1 The loss function and penalty281

8.6.2 The empirical process282

8.6.3 The multitask compatibility condition283

8.6.4 A multitask sparsity oracle inequality284

8.7 The approximation condition for the smoothed group Lasso286

8.7.1 Sobolevsmoothness286

8.7.2 Diagonalized smoothness287

Problems288

9 Non-convex loss functions and l1-regularization293

9.1 Organization of the chapter293

9.2 Finite mixture of regressions model294

9.2.1 Finite mixture of Gaussian regressions model294

9.2.2 l1-penalized maximum likelihood estimator295

9.2.3 Properties of the l1-penalized maximum likelihood estimator299

9.2.4 Selection of the tuning parameters300

9.2.5 Adaptive l l1-penalization301

9.2.6 Riboflavin production with bacillus subtilis301

9.2.7 Simulated example303

9.2.8 Numerical optimization304

9.2.9 GEM algorithm for optimization304

9.2.10 Proof of Proposition 9.2308

9.3 Linear mixed effects models310

9.3.1 The model and l -penalized estimation311

9.3.2 The Lasso in linear mixed effects models312

9.3.3 Estimation of the random effects coefficients312

9.3.4 Selection of the regularization parameter313

9.3.5 Properties of the Lasso in linear mixed effects models313

9.3.6 Adaptive l1-penalized maximum likelihood estimator314

9.3.7 Computational algorithm314

9.3.8 Numerical results317

9.4 Theory for l1-penalization with non-convex negative log-likelihood320

9.4.1 The setting and notation320

9.4.2 Oracle inequality for the Lasso for non-convex loss functions323

9.4.3 Theory for finite mixture of regressions models326

9.4.4 Theory for linear mixed effects models329

9.5 Proofs for Section 9.4332

9.5.1 Proof of Lemma 9.1332

9.5.2 Proof of Lemma 9.2333

9.5.3 Proof of Theorem 9.1335

9.5.4 Proof of Lemma 9.3337

Problems337

10 Stable solutions339

10.1 Organization of the chapter339

10.2 Introduction,stability and subsampling340

10.2.1 Stability paths for linear models341

10.3 Stability selection346

10.3.1 Choice of regularization and error control346

10.4 Numerical results351

10.5 Extensions352

10.5.1 Randomized Lasso352

10.6 Improvements from a theoretical perspective354

10.7 Proofs355

10.7.1 Sample splitting355

10.7.2 Proof of Theorem 10.1356

Problems358

11 P-values for linear models and beyond359

11.1 Organization of the chapter359

11.2 Introduction,sample splitting and high-dimensional variable selection360

11.3 Multi sample splitting and familywise error control363

11.3.1 Aggregation over multiple p-values364

11.3.2 Control of familywise error365

11.4 Multi sample splitting and false discovery rate367

11.4.1 Control of false discovery rate368

11.5 Numerical results369

11.5.1 Simulations and familywise error control369

11.5.2 Familywise error control for motif regression in computational biology372

11.5.3 Simulations and false discovery rate control372

11.6 Consistent variable selection374

11.6.1 Single sample split method374

11.6.2 Multi sample split method377

11.7 Extensions377

11.7.1 Other models378

11.7.2 Control of expected false positive selections378

11.8 Proofs379

11.8.1 Proof of Proposition 11.1379

11.8.2 Proof of Theorem 11.1380

11.8.3 Proof of Theorem 11.2382

11.8.4 Proof of Proposition 11.2384

11.8.5 Proof of Lemma 11.3384

Problems386

12 Boosting and greedy algorithms387

12.1 Organization of the chapter387

12.2 Introduction and preliminaries388

12.2.1 Ensemble methods:multiple prediction and aggregation388

12.2.2 AdaBoost389

12.3 Gradient boosting:a functional gradient descent algorithm389

12.3.1 The generic FGD algorithm390

12.4 Some loss functions and boosting algorithms392

12.4.1 Regression392

12.4.2 Binary classification393

12.4.3 Poisson regression396

12.4.4 Two important boosting algorithms396

12.4.5 Other data structures and models398

12.5 Choosing the base procedure398

12.5.1 Componentwise linear least squares for generalized linear models399

12.5.2 Componentwise smoothing spline for additive models400

12.5.3 Trees403

12.5.4 The low-variance principle404

12.5.5 Initialization of boosting404

12.6 L2Boosting405

12.6.1 Nonparametric curve estimation:some basic insights about boosting405

12.6.2 L2Boosting for high-dimensional linear models409

12.7 Forward selection and orthogonal matching pursuit413

12.7.1 Linear models and squared error loss414

12.8 Proofs418

12.8.1 Proof of Theorem 12.1418

12.8.2 Proof of Theorem 12.2420

12.8.3 Proof of Theorem 12.3426

Problems430

13 Graphical modeling433

13.1 Organization of the chapter433

13.2 Preliminaries about graphical models434

13.3 Undirected graphical models434

13.3.1 Markov properties for undirected graphs434

13.4 Gaussian graphical models435

13.4.1 Penalized estimation for covariance matrix and edge set436

13.4.2 Nodewise regression440

13.4.3 Covariance estimation based on undirected graph442

13.5 Ising model for binary random variables444

13.6 Faithfulness assumption445

13.6.1 Failure of faithfulness446

13.6.2 Faithfulness and Gaussian graphical models448

13.7 The PC-algorithm:an iterative estimation method449

13.7.1 Population version of the PC-algorithm449

13.7.2 Sample version for the PC-algorithm451

13.8 Consistency for high-dimensional data453

13.8.1 An illustration455

13.8.2 Theoretical analysis of the PC-algorithm456

13.9 Back to linear models462

13.9.1 Partial faithfulness463

13.9.2 The PC-simple algorithm465

13.9.3 Numerical results468

13.9.4 Asymptotic results in high dimensions471

13.9.5 Correlation screening(sure independence screening)474

13.9.6 Proofs475

Problems480

14 Probability and moment inequalities481

14.1 Organization of this chapter481

14.2 Some simple results for a single random variable482

14.2.1 Sub-exponential random variables482

14.2.2 Sub-Gaussian random variables483

14.2.3 Jensen’s inequality for partly concave functions485

14.3 Bernstein’s inequality486

14.4 Hoeffding’s inequality487

14.5 The maximum of p averages489

14.5.1 Using Bernstein’s inequality489

14.5.2 Using Hoeffding’s inequality491

14.5.3 Having sub-Gaussian random variables493

14.6 Concentration inequalities494

14.6.1 Bousquet’s inequality494

14.6.2 Massart’s inequality496

14.6.3 Sub-Gaussian random variables496

14.7 Symmetrization and contraction497

14.8 Concentration inequalities for Lipschitz loss functions500

14.9 Concentration for squared error loss with random design504

14.9.1 The inner product of noise and linear functions505

14.9.2 Squared linear functions505

14.9.3 Squared error loss508

14.10 Assuming only lower order moments508

14.10.1 Nemirovski moment inequality509

14.10.2 A uniform inequality for quadratic forms510

14.11 Using entropy for concentration in the sub-Gaussian case511

14.12 Some entropy results516

14.12.1 Entropy of finite-dimensional spaces and general convex hulls518

14.12.2 Sets with restrictions on the coefficients518

14.12.3 Convex hulls of small sets:entropy with log-term519

14.12.4 Convex hulls of small sets:entropy without log-term520

14.12.5 Further refinements523

14.12.6 An example:functions with(m—1)-th derivative of bounded variation523

14.12.7 Proofs for this section(Section 14.12)525

Problems535

Author Index539

Index543

References547

热门推荐