布雷特·蘭茨(Brett Lantz),在應用創(chuàng)新的數(shù)據(jù)方法來理解人類的行為方面有10余年經(jīng)驗。他最初是一名社會學家,在學習一個青少年社交網(wǎng)站分布的大型數(shù)據(jù)庫時,他就開始陶醉于機器學習。從那時起,他致力于移動電話、醫(yī)療賬單數(shù)據(jù)和公益活動等交叉學科的研究。
Preface
Chapter 1: Introducing Machine Learning
The origins of machine learning
Uses and abuses of machine learning
Machine learning successes
The limits of machine learning
Machine learning ethics
How machines learn
Data storage
Abstraction
Generalization
Evaluation
Machine learning in practice
Types of input data
Types of machine learning algorithms
Matching input data to algorithms
Machine learning with R
Installing R packages
Loading and unloading R packages
Summary
Chapter 2: Managing and Understanding Data
R data structures
Vectors
Factors
Lists
Data frames
Matrixes and arrays
Managing data with R
Saving, loading, and removing R data structures
Importing and saving data from CSV files
Exploring and understanding data
Exploring the structure of data
Exploring numeric variables
Measuring the central tendency- mean and median
Measuring spread - quartiles and the five-number summary
Visualizing numeric variables - boxplots
Visualizing numeric variables - histograms
Understanding numeric data - uniform and normal distributions
Measuring spread - variance and standard deviation
Exploring categorical variables
Measuring the central tendency - the mode
Exploring relationships between variables
Visualizing relationships - scatterplots
Examining relationships - two-way cross-tabulations
Summary
Chapter 3: Lazy Learning - Classification Using Nearest Neighbors
Understanding nearest neighbor classification
The k-NN algorithm
Measuring similarity with distance
Choosing an appropriate k
Preparing data for use with k-NN
Why is the k-NN algorithm lazy?
Example - diagnosing breast cancer with the k-NN algorithm
Step 1 - collecting data
Step 2 - exploring and preparing the data
Transformation - normalizing numeric data
Data preparation - creating training and test datasets
Step 3 - training a model on the data
Step 4 - evaluating model performance
Step 5 -improving model performance
Transformation - z-score standardization
Testing alternative values of k
Summary
Chapter 4: Probabilistic Learning - Classification Using Naive Bayes
Understanding Naive Bayes
Basic concepts of Bayesian methods
Understanding probability
Understanding joint probability
Computing conditional probability with Bayes' theorem
The Naive Bayes algorithm
Classification with Naive Bayes
The Laplace estimator
Using numeric features with Naive Bayes
Example - filtering mobile phone spam with the
Naive Bayes algorithm
Step 1 - collecting data
Step 2 - exploring and preparing the data
Data preparation - cleaning and standardizing text data
Data preparation - splitting text documents into words
Data preparation - creating training and test datasets
Visualizing text data - word clouds
Data preparation - creating indicator features for frequent words
Step 3 - training a model on the data
Step 4 - evaluating model performance
Step 5 -improving model performance
Summary
Chapter 5: Divide and Conquer - Classification Using Decision Trees and Rules
Chapter 6: Forecasting Numeric Data - Regression Methods
Chapter 7: Black Box Methods - Neural Networks and Support Vector Machines
Chapter 8: Finding Patterns - Market Basket Analysis Using Association Rules
Chapter 9: Finding Groups of Data - Clustering with k-means
Chapter 10: Evaluating Model Performance
Chapter 11: Improving Model Performance
Chapter 12: Specialized Machine Learning Topics
Index