HDP Security Overview_NV

HDP with Advanced Security
Comprehensive Security for Enterprise Hadoop
Hortonworks. We do Hadoop.
Page 1
© Hortonworks Inc. 2014
Agenda
• Our approach across security pillars
• Component Deep Dive
• Questions
Page 2
© Hortonworks Inc. 2014
Security needs are changing
Security needs are changing
• YARN unlocks the data lake
5 areas of security focus
• Multi-tenant: Multiple applications for data access
Administration
• Changing and complex compliance environment
Centrally management &
consistent security
• ETL of non-sensitive data can yield sensitive data
Authentication
Authenticate users and systems
Authorization
Provision access to data
Audit
Maintain a record of data access
Data Protection
Protect data at rest and in motion
Page 3
© Hortonworks Inc. 2014
Fall 2013
Largely silo’d deployments
with single workload clusters
Summer 2014
65% of clusters host
multiple workloads
Security in Hadoop with HDP + Argus (XA Secure)
Argus
HDP 2.1
Centralized Security Administration
Page 4
Authentication
Who am I/prove it?
Authorization
Restrict access to
explicit data
• Kerberos in native
Apache Hadoop
• HTTP/REST API
Secured with
Apache Knox
Gateway
• HDFS Permissions, HDFS ACL,
• Audit logs in with HDFS & MR
• Hive ATZ-NG
• Wire encryption
in Hadoop
• Open Source
Initiatives
• Partner
Solutions
• As-Is, works with
current
authentication
methods
• HDFS, Hive and
Hbase
• Fine grain
access control
• RBAC
• Future
Integration
© Hortonworks Inc. 2014
Audit
Understand who
did what
• Centralized
audit reporting
• Policy and
access history
Data Protection
Encrypt data at
rest & in motion
Map to Nevada Energy Requirements
Questions
HDP Security Component
End User Security
LDAP Integration
Kerberos, Argus (XA)
Group level access
Argus(XA)
Multiple level of access
Argus(XA)
Multiple Environments
Argus(XA)
Developer Security
Page 5
Access control for creating tables
Argus(XA)
Limit of creating scheme, creating
folders
Argus(XA)
© Hortonworks Inc. 2014
Security Features
HDP w/ Advanced Security
Authentication
Kerberos Support
✔
Perimeter Security – For services and rest API
✔
Authorizations
Fine grained access control
Role base access control
Column level
Permission Support
HDFS, Hbase and Hive
✔
✔
Create, Drop, Index, lock, user
Auditing
Resource access auditing
Policy auditing
Page 6
© Hortonworks Inc. 2014
Extensive Auditing
✔
Security Features
HDP w/ Advanced Security
Data Protection
Wire Encryption
✔
Volume Encryption
✔
File/Column Encryption
Partners
Reporting
Global view of policies and audit data
✔
Manage
Global policy manager, Web UI
✔
✔
Delegated administration
✔
User/ Group mapping
Page 7
© Hortonworks Inc. 2014
Authentication w/ Kerberos
Page 8
© Hortonworks Inc. 2014
Page 8
Kerberos Primer
KDC
5. Read/write file given NN-ST and
file name; returns block locations,
block IDs and Block Access Tokens
if access permitted
1. kinit - Login and get Ticket Granting Ticket (TGT)
NN
3. Get NameNode Service Ticket (NN-ST)
Client
2. Client Stores TGT in Ticket Cache
6. Read/write block given
Block Access Token and block ID
4. Client Stores NN-ST in Ticket Cache
Client’s
Kerberos
Ticket Cache
Page 9
© Hortonworks Inc. 2014
Page 9
DN
Kerberos Summary
• Provides Strong Authentication
• Establishes identity for users, services and hosts
• Prevents impersonation on unauthorized account
• Supports token delegation model
• Works with existing directory services
• Basis for Authorization
Page 10
© Hortonworks Inc. 2014
Page 10
User Management
•Most customers use LDAP for user info
–LDAP guarantees that user information is consistent across the
cluster
–An easy way to manage users & groups
–The standard user to group mapping comes from the OS on the
NameNode
•Kerberos provides authentication
–PAM can automatically log user into Kerberos
Page 12
© Hortonworks Inc. 2014
Page 12
Kerberos + Active Directory
Use existing directory
tools to manage users
AD /
LDAP
Use Kerberos tools to
manage host + service
principals
Cross Realm Trust
Users: smith@EXAMPLE.COM
Hosts: host1@HADOOP.EXAMPLE.COM
KDC
Services: hdfs/host1@HADOOP.EXAMPLE.COM
User Store
Client
Authentication
Hadoop Cluster
Page 13
© Hortonworks Inc. 2014
Page 13
Knox Gateway Overview
Perimeter REST API Security
Page 17
© Hortonworks Inc. 2014
Page 17
What does Perimeter Security really mean?
Knox Gateway
controls all
Hadoop REST
API access
through firewall
REST API
REST API
User
Page 18
Firewall only
allows
connections
through specific
ports from Knox
hostInc. 2014
© Hortonworks
Firewall
required at
perimeter
(today)
Gateway
Firewall
Hadoop
Services
Hadoop
cluster
mostly
unaffected
Why Knox?
Enhanced Security
• Protect network details
• Partial SSL for non-SSL services
• WebApp vulnerability filter
Centralized Control
• Central REST API auditing
• Service-level authorization
• Alternative to SSH “edge node”
Enterprise Integration
Simplified Access
•
•
•
•
•
Page 19
Kerberos encapsulation
Extends API reach
Single access point
Multi-cluster support
Single SSL certificate
© Hortonworks Inc. 2014
•
•
•
•
•
LDAP integration
Active Directory integration
SSO integration
Apache Shiro extensibility
Custom extensibility
Current Hadoop Client Model
• FileSystem and MapReduce Java APIs
• HDFS, Pig, Hive and Oozie clients (that wrap the Java APIs)
• Typical use of APIs is via “Edge Node” that is “inside” cluster
• Users SSH to Edge Node and execute API commands from shell
SSH
User
Page 20
© Hortonworks Inc. 2014
Edge Node
Hadoop
Page 20
Hadoop REST APIs
Service
API
WebHDFS
Supports HDFS user operations including reading files, writing to files,
making directories, changing permissions and renaming. Learn more about
WebHDFS.
WebHCat
Job control for MapReduce, Pig and Hive jobs, and HCatalog DDL
commands. Learn more about WebHCat.
Hive
Hive REST API operations, JDBC/ODBC over HTTP
HBase
HBase REST API operations
Oozie
Job submission and management, and Oozie administration. Learn more
about Oozie.
• Useful for connecting to Hadoop from the outside the cluster
Page 21
© Hortonworks Inc. 2014
Page 21
Hadoop REST API Security: Drill-Down
Hadoop Cluster 1
Firewall
Firewall
DMZ
Masters
NN
Edge
Node/Hado
op CLIs
RM
Web
HCat
Oozie
RPC
HBase
Slaves
DN
REST
Client
HS2
NM
Knox Gateway
HTTP
HTTP
LB
HTTP
GW
GW
Hadoop Cluster 2
Masters
NN
RM
LDAP
Page 22
© Hortonworks Inc. 2014
Enterprise
Identity
Provider
LDAP/AD
Oozie
HBase
Web
HCat
Slaves
DN
Page 22
NM
HS2
Authorization and Auditing
Page 23
© Hortonworks Inc. 2014
Page 23
Authorization and Audit
Authorization
Fine grain access control
•
HDFS – Folder, File
•
Hive – Database, Table, Column
•
HBase – Table, Column Family, Column
Flexibility
in defining
policies
Audit
Extensive user access auditing in
HDFS, Hive and HBase
•
IP Address
•
Resource type/ resource
•
Timestamp
•
Access granted or denied
Page 24
© Hortonworks Inc. 2014
Control
access into
system
Central Security Administration
HDP Advanced Security
• Delivers a ‘single pane of glass’ for
the security administrator
• Centralizes administration of
security policy
• Ensures consistent coverage across
the entire Hadoop stack
Page 25
© Hortonworks Inc. 2014
Setup Authorization Policies
file level
access
control,
flexible
definition
Control
permissions
26
Page 26
© Hortonworks Inc. 2014
Monitor through Auditing
27
Page 27
© Hortonworks Inc. 2014
Enterprise
Users
Authorization and Auditing w/ XA
XA Administration Portal
RDBMS
XA Audit Server
HDFS
HBase
Hadoop Components
Legacy
Tools
XA Policy Server
XA
Plugin
Hive Server2
XA
Plugin
Hadoop distributed
file system (HDFS)
XA
Plugin
Integration API
XA
Plugin*
Knox
XA
Plugin*
Storm
XA
Plugin*
Falcon
* - Future Integration
YARN : Data Operating System
Page 28
© Hortonworks Inc. 2014
Data Protection
HDP allows you to apply data protection policy at
three different layers across the Hadoop stack
Layer
What?
How ?
Storage
Encrypt data while it is at rest
Partners, OS level encrypt, Custom Code
Transmission
Encrypt data as it moves
Supported in HDP 2.1
Upon Access
Apply restrictions when accessed
Partners, Open Source Initiatives
Page 30
© Hortonworks Inc. 2014
Points of Communication
Hadoop Cluster
1
2
WebHDFS
DataTransferProtocol
Client
3
4
Page 31
© Hortonworks Inc. 2014
RPC
JDBC/ODBC
Nodes
2
DataTransfer
3
RPC
4
M/R Shuffle
Nodes
Page 31
Data Transmission Protection in HDP 2.1
• WebHDFS
– Provides read/write access to HDFS
– Optionally enable HTTPS
– Authenticated using SPNEGO (Kerberos for HTTP) filter
– SSL based wire encryption
• RPC
– Communications between NNs, DNs, etc. and Clients
– SASL based wire encryption
– DTP encryption with SASL
• JDBC/ODBC
– SSL based wire encryption
– Also available SASL based encryption
• Shuffle
– Mapper to Reducer over HTTP(S) with SSL
Page 32
© Hortonworks Inc. 2014
32
Data Storage Protection
• Encrypt at the physical file system level (e.g. dm-crypt)
• Encrypt via custom HDFS “compression” codec
• Encrypt at Application level (including security service/device)
ABC
DEF
DEF
Security Service
(Partner)
ETL
ENCRYPT
ABC
Page 33
© Hortonworks Inc. 2014
1a3d
ABC
App
DECRYPT
HDFS
Page 33
Current Open Source Initiatives
• HDFS Encryption
– Transparent encryption of data at rest in HDFS via Encryption zones. Being worked in the community
– Dependency on Key Management Server and Keyshell
•
•
•
•
Key Management Server
Key Provider API
Hive Column Level Encryption
HBase Column Level Encryption
– Transparent Column Encryption, needs more testing/validation
• Command line Key Operations
Page 34
© Hortonworks Inc. 2014
Resources
Page 35
© Hortonworks Inc. 2014
Page 35
Security Page
Page 36
© Hortonworks Inc. 2014
Hortonworks Security Investment Plans
Investment themes
HDP + XA
Comprehensive Security for Enterprise Hadoop
Goals:
Comprehensive Security
Meet all security requirements across Authentication,
Authorization, Audit & Data Protection for all HDP
components
Central Administration
Provide one location for administering security policies and
audit reporting for entire platform
Consistent Integration
Integrate with other security & identity management systems,
for compliance with IT policies
…all IN Hadoop
Page 37
© Hortonworks Inc. 2014
Previous Phases
 Kerberos Authentication
 HDFS, Hive & Hbase authorization
 Wire Encryption for data in motion
 Knox for perimeter security
 Basic Audit in HDFS & MR
 SQL Style Hive Authorization
 ACLs for HDFS
XA Secure Phase
• Centralized Security Admin for HDFS, Hive &
HBase
• Centralized Audit Reporting
• Delegated Policy Administration
Future Phases
• Encryption in HDFS, Hive & Hbase
• Centralized security administration of entire
Hadoop platform
• Centralized auditing of entire platform
• Expand Authentication & SSO integration choices
• Tag based global policies (e.g. Policy for PII)