R語言簡潔建模(Tidy Modcling with R 影印版)
定 價:118 元
- 作者:[美]馬克斯·庫恩(MaxKuhn)著
- 出版時間:2023/3/1
- ISBN:9787576605907
- 出 版 社:東南大學出版社
- 中圖法分類:TP312.8R
- 頁碼:363
- 紙張:膠版紙
- 版次:1
- 開本:16開
tidymodels是一組用于建模和機器學習的R語言包。無論你是新手還是擁有多年建模經驗,這本實踐用書將為數據分析師、業(yè)務分析師和數據科學家展示tidymodels框架如何為你的工作提供一致、靈活的方法。
RStudio的工程師Max kuhn和Julia Silge展示了通過專注于一種被稱為tidyverse的R方言來創(chuàng)建模型的方法。采用tidyverse原則的軟件共享高層設計理念和低層語法及數據結構,因此學習生態(tài)系統的一部分有助于掌握下一部分。你會明白為什么tidymodels框架被人們廣泛使用。
本書內容包括:
·學習從頭到尾構建模型所需的步驟
·理解如何流暢地使用不同的建模和特征工程方法
·研究如何避免建模的常見缺陷,比如過擬合
·學習為建模準備數據的實用方法
·調整模型以獲得最佳性能
·使用良好的統計實踐來比較、評估和選擇模型
馬克斯·庫恩(Max Kuhn),康涅狄格州格羅頓市輝瑞全球研發(fā)非臨床統計部主任,在制藥和診斷行業(yè)已有近20年應用預測模型的經驗,他還是很多R包的作者。
Preface
Part I. Introduction
1. Software for Modeling
Fundamentals for Modeling Software
Types of Models
Descriptive Models
Inferential Models
Predictive Models
Connections Between Types of Models
Some Terminology
How Does Modeling Fit into the Data Analysis Process?
Chapter Summary
2. A Tiflyverse Primer
Tidyverse Principles
Design for Humans
Reuse Existing Data Structures
Design for the Pipe and Functional Programming
Examples of Tidyverse Syntax
Chapter Summary
3. A Review of R Modeling Fundamentals
An Example
What Does the R Formula Do?
Why Tidiness Is Important for Modeling
Combining Base R Models and the Tidyverse
The tidymodels Metapackage
Chapter Summary
Part II. Modeling Basics
4. The Ames Housing Data
Exploring Features of Homes in Ames
Chapter Summary
5. Spending Our Data
Common Methods for Splitting Data
What About a Validation Set?
Multilevel Data
Other Considerations for a Data Budget
Chapter Summary
6. Fitting Models with parsnip
Create a Model
Use the Model Results
Make Predictions
parsnip-Extension Packages
Creating Model Specifications
Chapter Summary
7. A Model Workflow
Where Does the Model Begin and End?
Workflow Basics
Adding Raw Variables to the workflow0
How Does a workflow0 Use the Formula?
Tree-Based Models
Special Formulas and Inline Functions
Creating Multiple Workflows at Once
Evaluating the Test Set
Chapter Summary
8. Feature Engineering with Recipes
A Simple recipe() for the Ames Housing Data
Using Recipes
How Data Are Used by the recipe()
Examples of Steps
Encoding Qualitative Data in a Numeric Format
Interaction Terms
Spline Functions
Feature Extraction
Row Sampling Steps
General Transformations
Natural Language Processing
Skipping Steps for New Data
Tidy a recipe()
Column Roles
Chapter Summary
9. Judging Model Effectiveness
Performance Metrics and Inference
Regression Metrics
Binary Classification Metrics
Multiclass Classification Metrics
Chapter Summary
Part Ill. Tools for Creating Effective Models
10. Resampling for Evaluating Performance
The Resubstitution Approach
Resampling Methods
Cross-Validation
Repeated Cross-Validation
Leave-One-Out Cross-Validation
Monte Carlo Cross-Validation
Validation Sets
Bootstrapping
Rolling Forecasting Origin Resampling
Estimating Performance
Parallel Processing
Saving the Resampled Objects
Chapter Summary
11. Comparing Models with Resampling
Creating Multiple Models with Workflow Sets
Comparing Resampled Performance Statistics
Simple Hypothesis Testing Methods
Bayesian Methods
A Random Intercept Model
The Effect of the Amount of Resampling
Chapter Summary
12. Model Tuning and the Dangers of Overntting
Model Parameters
Tuning Parameters for Different Types of Models
What Do We Optimize?
The Consequences of Poor Parameter Estimates
Two General Strategies for Optimization
Tuning Parameters in tidymodels
Chapter Summary
13. Grid Search
Regular and Nonregular Grids
Regular Grids
Nonregular Grids
Evaluating the Grid
Finalizing the Model
Tools for Creating Tuning Specifications
Tools for Efficient Grid Search
Submodel Optimization
Parallel Processing
Benchmarking Boosted Trees
Access to Global Variables
Racing Methods
Chapter Summary
14. Iterative Search
A Support Vector Machine Model
Bayesian Optimization
A Gaussian Process Model
Acquisition Functions
The tune_bayes() Function
Simulated Annealing
Simulated An