隨著越來越多的公司轉(zhuǎn)向用Hadoop來存儲和處理他們有價值的數(shù)據(jù),系統(tǒng)被破壞的潛在風(fēng)險也正以指數(shù)級趨勢增長。《Hadoop安全(影印版 英文版)》不僅向Hadoop管理員和安全架構(gòu)師們展示了如何保護Hadoop數(shù)據(jù),防止未授權(quán)訪問,也介紹了如何限制攻擊者在安全入侵過程中損壞和篡改數(shù)據(jù)的能力。
作者本·斯皮維與喬伊·愛徹利維亞提供了關(guān)于Hadoop安全特性的深入信息,并將它們根據(jù)通常的計算機安全概念重新組織整理。你還能獲得演示如何將這些概念應(yīng)用到你自己的用例中的真實案例。
Foreword
Preface
1. Introduction
Security Overview
Confidentiality
Integrity
Availability
Authentication, Authorization, and Accounting
Hadoop Security: A Brief History
Hadoop Components and Ecosystem
Apache HDFS
Apache YARN
Apache MapReduce
Apache Hive
Cloudera Impala
Apache Sentry (Incubating)
Apache HBase
Apache Accumulo
Apache Solr
Apache Oozie
Apache ZooKeeper
Apache Flume
Apache Sqoop
Cloudera Hue
Summary
Part I. Security Architecture
2. Securing Distributed Systems
Threat Categories
Unauthorized Access/Masquerade
Insider Threat
Denial of Service
Threats to Data
Threat and Risk Assessment
User Assessment
Environment Assessment
Vulnerabilities
Defense in Depth
Summary
3. System Architecture
Operating Environment
Network Security
Network Segmentation
Network Firewalls
Intrusion Detection and Prevention
Hadoop Roles and Separation Strategies
Master Nodes
Worker Nodes
Management Nodes
Edge Nodes
Operating System Security
Remote Access Controls
Host Firewalls
SELinux
Summary
4. Kerberos
Why Kerberos?
Kerberos Overview
Kerberos Workflow: A Simple Example
Kerberos Trusts
MIT Kerberos
Server Configuration
Client Configuration
Summary
Part II. Authentication, Authorization, and Accounting
5. Identity and Authentication
Identity
Mapping Kerberos Principals to Usernames
Hadoop User to Group Mapping
Provisioning of Hadoop Users
Authentication
Kerberos
Username and Password Authentication
Tokens
Impersonation
Configuration
Summary
6. Authorization
HDFS Authorization
HDFS Extended ACLs
Service-Level Authorization
MapReduce and YARN Authorization
MapReduce (MR1)
YARN (MR2)
ZooKeeper ACLs
Oozie Authorization
HBase and Accumulo Authorization
System, Namespace, and Table-Level Authorization
Column- and Cell-Level Authorization
Summary
7. Apache Sentry (Incubating)
Sentry Concepts
The Sentry Service
Sentry Service Configuration
Hive Authorization
Hive Sentry Configuration
Impala Authorization
Impala Sentry Configuration
Solr Authorization
Solr Sentry Configuration
Sentry Privilege Models
SQL Privilege Model
Solr Privilege Model
Sentry Policy Administration
SQL Commands
SQL Policy File
Solr Policy File
Policy File Verification and Validation
Migrating From Policy Files
Summary
8. Accounting
HDFS Audit Logs
MapReduce Audit Logs
YARN Audit Logs
Hive Audit Logs
Cloudera Impala Audit Logs
HBase Audit Logs
Accumulo Audit Logs
Sentry Audit Logs
Log Aggregation
Summary
Part III. Data Security
9. Data Protection
Encryption Algorithms
Encrypting Data at Rest
Encryption and Key Management
HDFS Data-at-Rest Encryption
MapReduce2 Intermediate Data Encryption
Impala Disk Spill Encryption
Full Disk Encryption
Filesystem Encryption
Important Data Security Consideration for Hadoop
Encrypting Data in Transit
Transport Layer Security
Hadoop Data-in-Transit Encryption
Data Destruction and Deletion
Summary
10. Securing Data Ingest
Integrity of Ingested Data
Data Ingest Confidentiality
Flume Encryption
Sqoop Encryption
Ingest Workflows
Enterprise Architecture
Summary
11. Data Extraction and Client Access Security.
Hadoop Command-Line Interface
Securing Applications
HBase
HBase Shell
HBase REST Gateway
HBase Thrift Gateway
Accumulo
Accumulo Shell
Accumulo Proxy Server
Oozie
Sqoop
SQL Access
Impala
Hive
WebHDFS/HttpFS
Summary
12. Cloudera Hue
Hue HTTPS
Hue Authentication
SPNEGO Backend
SAML Backend
LDAP Backend
Hue Authorization
Hue SSL Client Configurations
Summary
Part IV. Putting It All Together
13. Case Studies
Case Study: Hadoop Data Warehouse
Environment Setup
User Experience
Summary
Case Study: Interactive HBase Web Application
Design and Architecture
Security Requirements
Cluster Configuration
Implementation Notes
Summary
Afterword
Index