Databases: Why? DATABASE DESIGN I - 1DL300 hour

Databases: Why?
DATABASE DESIGN I - 1DL300
•
An introductury course on database systems
http://www.it.uu.se/edu/course/homepage/dbastekn/vt12
•
You are using a database every day
hour
•
Database Management Systems (DBMS) provide …
Erik Zeitler
… efficient, reliable, convenient, and safe multi-user storage of and access to
massive amounts of persistent data.
Uppsala Database Laboratory
Department of Information Technology, Uppsala University,
Uppsala, Sweden
Erik Zeitler - UDBL - IT - UU
2012-01-16
Extremely prevalent
– Web sites, banking, telecom, sensors, retail, science and engineering, …
Spring 2012
1
Erik Zeitler - UDBL - IT - UU
2012-01-16
Databases: What?
•
Massive
•
•
Persistent
Safe
•
Multi-user
•
Convenient
– Terabytes
Databases: How?
per day
•
•
– Physical data independence
– High-level declarative query language
•
Reliable
•
values
Data definition language (DDL)
– Set up schema
– 1000’s of queries and/or updates per second
•
Data manipulation language (DML)/Query Language (QL)
– Query and modify
– 99.99999%
Erik Zeitler - UDBL - IT - UU
Schema vs data
types
variables
– Concurrency control
Efficient
Data model
– Set of records, graph, XML
– Hardware software power users
•
2
2012-01-16
3
Erik Zeitler - UDBL - IT - UU
2012-01-16
4
Key DB people
•
1DL300 in a nutshell
13 LECTURES
DBMS implementer
– Builds system (works at Oracle, IBM, Microsoft, Uppsala University)
•
• Course intro
– Overview of DB technology
– DB terminology
DB designer
– Establish schema
• ER-modeling
• Relational model and relational
algebra
• ER-to-relational mapping and
Normalization
• SQL
• Transactions, Concurrency control
• Recovery techniques
• Intro to storage and index structures
1DL400
•
DB application developer
– Programs that operate on DB
•
1DL300
DB Administrator (DBA)
– Operations & Maintenance
– Tuning & Optimization
Erik Zeitler - UDBL - IT - UU
2012-01-16
5
Erik Zeitler - UDBL - IT - UU
3 ASSIGNMENTS
To be done in pairs:
1. Normalization
2. DB Design and ER modeling
3. SQL in RDBMS
Queries and views
JDBC API access to RDBMS
1 FINAL EXAM
• Skrivsalen Mon March 5
• No books allowed
2012-01-16
6
Friendly reminders from the Student Office
Personnel
1. Not admitted?
Lecturers
Assistants
• Erik Zeitler (main teacher)
• Mikael Lax
1. Swedish students: Go to www.antagning.se 1DL300 sen anmälan
2. Master students: Go to your study counsellor
3. Exchange students: Ulrika Jaresund
2. Make sure you are registered (registration deadline January 29)
• phone 471 7345
• room 1306
– room 1320
• Silvia Stefanova
1.
2.
3.
4.
• Lars Melander
– room 1319
• phone 471 1051
• room 1316
• Anne Peters
• Minpeng Zhu
– room 2005
Studentportalen (all)
Registered before, want to register again? Student Office it-kansli@it.uu.se (re-register)
Master students: Study counsellor
Exchange students: Ulrika Jaresund
3. Want to quit the course?
• phone 471 3155
• room 1310
– Inform the Student Office! (it-kansli@it.uu.se)
•
If less than 3 weeks have passed since the course started, the course registration will be
removed. After 3 weeks a "course intermission" will be reported to UPPDOK instead.
4. Don’t forget to sign up for the final exam
firstname.lastname@it.uu.se
Erik Zeitler - UDBL - IT - UU
2012-01-16
7
Erik Zeitler - UDBL - IT - UU
2012-01-16
8
The database market (Computer Sweden May 24, 2002)
Introduction to Database Terminology
Elmasri/Navathe chs 1-2
Padron-McCarthy/Risch ch 1
Erik Zeitler
Department of Information Technology
Uppsala University, Uppsala, Sweden
Erik Zeitler - UDBL - IT - UU
2012-01-16
9
Erik Zeitler - UDBL - IT - UU
2012-01-16
10
Outline of a database system
DBMS deployment plans (Gartner 2008)
Erik Zeitler - UDBL - IT - UU
2012-01-16
11
Erik Zeitler - UDBL - IT - UU
2012-01-16
12
Database?
Database management system?
• A database (DB) is a more or less well-organized collection of related data.
•
A database management system (DBMS) is one (or several) program that
provides functionality for users to develop, use, and maintain a database.
•
Thus, a DBMS is a general software system for defining, populating (constructing),
manipulating and sharing databases for different types of applications.
•
Also supports protection (system and security) and maintenance to evolve the
system.
• The information in a database . . .
– represents information within some sub-area of “reality”
(i.e. objects, characteristics and relationships between objects)
– is logically connected
– has been organized for a specific group of users and applications
Erik Zeitler - UDBL - IT - UU
2012-01-16
13
Erik Zeitler - UDBL - IT - UU
Database system?
•
2012-01-16
14
Database vs files
A database system consists of . . .
– the physical database (instance)
– a database management system
– one or several database languages
(means for communicating with the database)
– one or several application program(s)
•
A database system makes a simple and efficient manipulation of large data sets
possible.
•
The term DB can refer to both the content and to the system (the answer to this
ambiguity is governed by the context).
Erik Zeitler - UDBL - IT - UU
2012-01-16
• DB in comparison to conventional file management:
–
–
–
–
–
–
15
data model – data abstraction
meta-data – in catalog
program-data and program-operation independence
multiple views of data
sharing data – multiuser transactions
high-level language for managing data in the database
Erik Zeitler - UDBL - IT - UU
2012-01-16
16
Advantages of using a database approach
•
•
•
•
•
•
•
•
•
•
•
Data model?
Efficient search and access of large data sets
Controlling redundancy and inconsistency
Access control
Persistent storage
Indexes and query processing
Backup and recovery
Multiple user interfaces
Complex relationships
Integrity constraints
Active behaviour
Enforcing standards, reducing application development time, flexibility
to evolve system, up-to-date info
Erik Zeitler - UDBL - IT - UU
2012-01-16
• Every DB has a data model
– “hides” the physical representation of data
• A data model is a formalism that defines
– a notation for describing data on an abstract level
– a set of operations to manipulate data represented using the data model
• Data models are used for data abstraction
– Enabling definition and manipulation of data on an abstract level.
17
Erik Zeitler - UDBL - IT - UU
Data models - examples
•
•
18
Meta-data, i.e. “data about data”
Examples of representational (implementation) data models within the database
field are:
–
–
–
–
–
2012-01-16
•
Hierarchical (IMS)
Network (IDMS)
Relational (ORACLE, DB2, SQL Server, InterBase, Mimer)
Object-oriented (ObjectStore, Objectivity, Versant, Poet)
Object-relational (Informix, Odapter, DB2)
Information about which information that exists and about how/where data is
stored
–
–
–
–
–
names and data types of data items
names and sizes of files
storage details of each file
mapping information among schemas
constraints
cf. variables
in a program
Conceptual data model
•
– ER-model - Entity-Relationship model
– (not an implementation model since there are no operations defined for the notation)
Erik Zeitler - UDBL - IT - UU
2012-01-16
19
Meta-data is stored in a system catalog (alt. term data dictionary).
Erik Zeitler - UDBL - IT - UU
2012-01-16
20
Schema and instance
•
Data independence
To be able to separate data in the database and its description the terms
• Reduces the connection between:
– database instance and
– database schema are used.
– the actual organization of data and
– how the users/application programs process data (or “sees” data.)
• The schema is created when a database is defined. A database schema is not
changed frequently.
• Why?
• The data in the database constitute an instance. Every change of data creates a new
instance of the database.
Erik Zeitler - UDBL - IT - UU
2012-01-16
21
– Data should be able to change without requiring a corresponding alteration of the
application programs.
– Different applications/users need different “views” of the same data.
Erik Zeitler - UDBL - IT - UU
2012-01-16
22
Three-schema architecture
Data independence - how?
By introducing a multi-level architecture where each level
represents one abstraction level
End
Endusers
users
•
The three-schema architecture
– Introduced in 1971
– a.k.a. ANSI/SPARC architecture for databases
– CODASYL Data Base Task Group.
Conceptual level
•
It consists of 3 levels:
– Internal level
– Conceptual level
– External level
•
Internal level
Each level introduces one abstraction layer and has a schema that describes how
representations should be mapped to the next lower abstraction level.
Erik Zeitler - UDBL - IT - UU
2012-01-16
view 2
External level
view 1
logical
physical
…
…
…
view n
Conceptual schema
Internal schema
Database
Databaseinstance
instance
23
Erik Zeitler - UDBL - IT - UU
2012-01-16
24
Internal, conceptual and external schemas
Logical database design
•
1. Make an ER diagram
2. Translate that ER diagram to a relational DB schema
ename
EMPLOYEE
WORKS_IN
dname
– Abstraction level: files, index files etc.
– Is usually defined through the data definition language (DDL) of the DBMS.
salary
1
N
MANAGES
sname
1
1
DEPARTMENT
SUPPLIER
1
N
dno
CARRIES
SUPPLIES
N
iname
M
ITEM
ino
date
•
• Schemas for the entity types in the example above
EMPLOYEE(ENAME, SALARY, DEPT)
saddr
DEPARTMENT(DNO, DNAME, MGR)
SUPPLIER(SNAME, SADDR)
ITEM(INO, INAME, DNO)
price
ORDER(ONO, DATE, CUST)
CUSTOMER(CNAME, CADDR, BALANCE)
N
ORDER
INCLUDE
quantity
N
PLACED_BY
CUSTOMER
cname
balance
caddr
1
Erik Zeitler - UDBL - IT - UU
Conceptual schema: an abstract description of the physical database.
– Constitute one, for all users, common basic model of the logical content of the database.
– This abstraction level corresponds to “the real world”: object, characteristics, relationships between
objects etc.
– The schema is created in the DDL according to a specific data model.
•
M
ono
Internal schema: describes storage structures and access paths for the physical
database.
• Schemas for relationship types (M:N)
SUPPLIES(SNAME, INO, PRICE)
INCLUDE(ONO, INO, QUANTITY)
External schema (or views): a (restricted) view over the conceptual schema
– A typical DB has several users with varying needs, demands, access privileges etc. and external
schemas describes different views of the conceptual database with respect to what the different
user groups would like to/are allowed to se.
– Some DBMS’s have a specific language for view definitions (else the DDL is used).
2012-01-16
25
Erik Zeitler - UDBL - IT - UU
2012-01-16
26
An example database (Elmasri/Navathe Figure 1.2)
Views - example (Elmasri/Navathe fig 1.4)
Erik Zeitler - UDBL - IT - UU
2012-01-16
27
Erik Zeitler - UDBL - IT - UU
2012-01-16
28
Possible data independence in the
three-schema architecture
Database languages
1. Logical data independence
A generic term for a class of languages used for defining, communicating with or
manipulating a database.
• In conventional programming languages, declarations and program sentences is
implemented in one language.
• A database language includes several different languages.
– The possibility to change the conceptual schema without influencing the external
schemas (views).
• e.g. add another field to a conceptual schema.
2. Physical data independence
–
–
–
–
– The possibility to change the internal schema without influencing the conceptual
schema..
• the effects of a physical reorganization of the database, such as adding an access path, is
eliminated.
view1
view2
…
…
In the DDL the database administrator define internal + conceptual schema
•
DB users and application programs use DML to
→ database is designed. Subsequent schema modifications are also made in DDL.
viewn
– retrieve, add, remove, or alter the data in the database
– the term Query Language is usually used as synonym to DML.
Internal schema
2012-01-16
29
Erik Zeitler - UDBL - IT - UU
2012-01-16
30
Physical two-tier client-server architecture
Classification criteria for DBMSs
•
•
…
Conceptual schema
Erik Zeitler - UDBL - IT - UU
Storage Definition Language (SDL) - internal schema
Data Definition Language (DDL) - conceptual schema
View Definition Language (VDL) - external schema
Data Manipulation Language (DML)
Type of data model
– hierarchical, network, relational, object-oriented, object-relational
•
Centralized vs. distributed DBMSs
– Homogeneous vs. heterogeneous DDBMSs
– Multidatabase systems
•
Single-user vs. multi-user systems
•
General-purpose vs. special-purpose DBMSs
– specific applications such as airline reservation and phone directory systems.
•
Cost
Erik Zeitler - UDBL - IT - UU
2012-01-16
31
Erik Zeitler - UDBL - IT - UU
2012-01-16
32
Logical three-tier client/server architecture
DBMS
architecture
Erik Zeitler - UDBL - IT - UU
2012-01-16
33
Erik Zeitler - UDBL - IT - UU
2012-01-16
34