Why a CMR? What to include in a CMR? Architecting a CMR

Architecting
Architecting
aa
Corporate
Corporate Metadata
Metadata Repository
Repository
at
at the
the
U.S.
U.S. Bureau
Bureau of
of Census
Census
Gail Wright
CMR Program Manager
Technical Director
Oracle Corporation
gail.wright@oracle.com
Agenda
n
Why a CMR?
n
What to include in a CMR?
n
Architecting a CMR
n
Leveraging a CMR
1
Why a
Corporate Metadata
Repository (CMR)?
3
Metadata Technology Continuum
EMR
Buried,
Tool-based
Inaccessible
Data
Metadata
Dictionaries
Defined
Application
Models
Autonomous
Repositories
Integrated
Vertical/
Inter-Dept
Metadata
CMR
FedStats
Integrated
Corporate
Enterprise
Metadata
Integrated
Global
Enterprise
Metadata
low integration
high integration
low share/reuse
high share/reuse
few open standards
many open standards
low interoperability
high interoperability
4
2
BOC Current Business Process
Does not include an Integrated Metadata
Business Process
Census 2000
American
Community
Survey
internally
developed
systems
customized
commercial
systems
CASES
Demographic
Surveys
Econ Census
Econ Surveys
variety of
programming
languages
GIDS
individual
tool of choice
Design
internally
developed
systems
CATI
CAPI
Mail
PAPI
OCS
ICM
CADE
CSAQ
OCR
TDE
PFIRS
individual
tool of choice
Collect
Process
SAS
DEVSURV
DADS/AFF
CENSAS
FERRET
COBOL
FORTRAN
DECForms
Econ DW
StEPS
ECON DW
Internet
CD-ROM
ISS (future)
(future)
Share
What are the problems with the current
Business Process?
n
n
Difficult to:
n meet customer demands for quick turnaround of
surveys, and customized products
n re-use and share metadata within the BOC
n maintain consistent standards
n compile and format metadata needed by dissemination
systems
n share metadata with external agencies, participate in
Virtual Statistical Agencies, etc.
n meet new metadata requirements like FGDC’s CSDGM
content standard
n perform time series or cross dataset comparisons
Metadata integrity and quality can be compromised
3
BOC Goal: An Integrated Metadata Process
copy
1998
Annual
Survey
1998
Annual
Survey
copy
1999
Annual
Survey
Census
and
Survey
Design
1998
Annual
Survey
copy
1999
Annual
Survey
Data
Collection
1998
Annual
Survey
copy
1999
Annual
Survey
Data
Processing
1999
Annual
Survey
Data
Dissemination
Corporate M E T A D A T A Repository
What to include
in a
Corporate
Metadata
Repository?
8
4
What is Metadata?
n
“Data about data”
n
n
n
Information about “raw” data that gives it meaning,
context or enhances understanding
Data about the Content, Quality, Condition, and
other characteristics about data
Every informational asset that’s not data
n
n
n
n
n
Requirements, Data Models, Business Models,
Screen Layouts
Data Mappings and transformations
Hierarchies, Aggregation rules, Formulas
Rules for comparison of data sets and historical
meaning
Security access controls, operational schedules,
code, ...
9
What is a Repository?
Everything
else
•Name
•Definition
•Format
Data
Dictionary
•Application
•Application
•System
•Model
•System
•Model
•Owner
•Authority
•Standard
•Owner
•Authority
•Standard
•Owner
•Authority
•Standard
•Source
•Source
•Source
•Source
•Destination
•Destination
•Destination
•Destination
•Legacy
•Legacy
•Legacy
•Legacy
•Name
•Definition
•Format
•Name
•Definition
•Format
•Name
•Definition
•Format
•Name
•Definition
•Format
Data
Directory
Data
Registry
Data
Data
Encyclopedia Repository
5
Factors for determining CMR content
n
n
Strategic to BOC Enterprise
Opportunity for sharing and reuse of:
n
n
n
Metadata
Meta-Model
Generic vs. Application specific
CMR Meta-Models
Data Element Registry
Security Framework
(ISO/IEC 11179 Standard)
Configuration Mgmt
Framework
Data Elements, Value Domains, Valid Values, Data Element Concepts,…
Data Set Registry
(Support FGDC CSDGM Geospatial Metadata Standard)
A Data Set is a collection of Data Elements.
Workflow Framework
Product Registry
Data Store
(Supports FGDC CSDGM Geospatial Metadata Standard & Dublin Core)
A Data Product may be a file/document, website/URL, or physical object.
(OMG CWM Standard)
Metadata for the physical data store.
(Supports Relational, MultiDimensional, and Flat File stores)
Business Rule Registry
Survey Registry
Surveys, Survey Instances, Universes, Frames, Sample, Questionnaires,
Questions,…
Classification Schemes
(ISO/IEC 11179 Standard)
Taxonomies, Keywords
6
Basic CMR Meta-Model Relationships
Survey
Survey
Instance
Questionnaire
Product
Data
Set
Data
Store
Data
Element
Question
Definitions
n
Administered Component
n
n
n
n
An object requiring naming, identification,
configuration, security, and optionally,
registration
Has one or more designations (names)
Has one or more definitions
Classified Component
n
An object that may be classified as a part of a
classification scheme
7
CMR Meta-Model High Level
Administered Component
Basic CMR Meta-Model Relationships
Classified Component
Survey
Product
Survey
Instance
Questionnaire
Data
Set
Data
Store
Data
Element
Question
Generating a Census Bureau Taxonomy
+ Census Bureau Information
+ Demographic
+ Census
+Data Elements
+Basic Demographic
- Relationship
- 1990 Census
+ Sex
+2000 Census
- Alternative Designations
- Questionnaires
- Alternative Definitions
- Products
- Data Element Concept
+Datasets
- Conceptual Domain
- Public Use Microdata Sample
- Value Domain
- 100% Edited Detail File
- Related Data Elements
+Sample Edited Detail File
- Related Information
- Data Elements
- Related Information
- Survey
- Age
- Race
- Marital Status
- Economic
- Occupation/Employment
- Geographic
- Housing
8
Architecting
a
CMR
17
CMR Component Based Architecture
u
Browser User Interface
COTS
Integrated
Admin
Tools
Browsing
Tools
External
Systems
Metadata
Interchange
Load/Unload
Products
u
Object Layer
u
Metadata Repository
Physical Storage Layer
u
Flexible,
functional,
open,
standardsbased,
componentbased
architecture
Reuse
Components
Swap
Components
Minimize
change
impacts
Security
Framework
9
Proposed Technical/Software Architecture
Four Ways an Application Can Use CMR Metadata
Tightly Coupled
with CMR
1. Application written against CMR - uses it directly
for metadata access and maintenance.
2. Application uses same CMR core physical model
- can replicate metadata from CMR.
3. Application communicates with CMR through an
API to exchange metadata.
4. Application communicates with CMR using a
standard XML-based metadata interchange.
Loosely Coupled
with CMR
CMR Tools
Web-enabled
Open
Administration Web-enabled Java
Open
Tools
API
Browsing
XML
Tools
Interchange
Integrated
Portal
Web
Site
Builder
Corporate Metadata Repository
CMR Core Meta-Models
10
CMR Extensibility
CMR
Extended
Tools,
API,
Open
Interchange
Web-enabled
Open
Administration
Web-enabled Java
Tools
Browsing API
XML
Tools
Interchange
Integrated
Portal
Web
Site
Builder
Corporate Metadata
Repository
CMR Core Meta-Models
CMR
Extended
Meta-Model
S/W Requirements
n
n
n
Scalable
Provides for open API and Interchange
Implements Standards
n
n
n
n
n
n
ISO/IEC 11179
FGDC CSDGM
Dublin Core
COTS preferred, if meets requirements
High productivity development tools
Self-documenting, easy to maintain app
11
CMR S/W for Deployment & Development
Software
Used for
Oracle8i EE V8.1.6
interMedia
WebDB V2.2 (upgrading to Oracle 9i Portal)
CMR Physical Repository
OAS V4.0.8.1 (upgrading to iAS)
CMR Web Server
Rational Rose 2000
CMR UML Modeling
Designer6i
CMR Server Modeling. CMR Web Application
Generation plus some PL/SQL coding.
JDeveloper V3.1
CMR Java API and XML application
development (BC4Js & JSPs)
Oracle XDK & MS Notepad
CMR XML generation, parsing, processing, &
upload/download from database tables
Structured and Full-text Metadata
CMR Web Portal
Designer Generated CMR Tools
Logical
Models
Functional
Requirements
Use
Cases
Physical
Models
Web
Modules
Server
Tier
Deployment
PL/SQL
generating
HTML & JS
Application
Code
TAPI
(PL/SQL)
UML
Object
Model
Server
Model
Middle
Tier
Deployment
Net8
Net8
OAS
Environment
w/
PL/SQL
Cartridge
&
HTTP
Listeners
Client
Tier
Deployment
HTTP
Web
Browser
HTML
Application
View Layer
CMR
Repository
Created/Generated using
Oracle Designer
Hand coded
Created/Generated using
Rational Rose
12
JDeveloper Generated Java API
Logical
Models
Physical
Models
Server
Tier
Deployment
Rational
Rose
Client
Tier
Deployment
OAS
Functional
Requirements
Use
Cases
UML
Object
Model
Middle
Tier
Deployment
BOC Java
HTTP
Server
Pages
Server
Model
Rational Rose
Generated.
Oracle Designer
Maintained.
CMR Open
HTTP
API
CMR
View
Layer JDBC
Java
Object
Layer
(BC4J)
CMR
Repository
Designer
Generated
DER XML
Application
BOC Java
Applet or
Application
JDeveloper
Generated
Leveraging
a
CMR
26
13
Metadata for Dissemination
Metadata
Data
ID WGT SEX AGE MARITAL
1 5
2 7
3 2
0
1
1
45
5
90
2
0
5
4
5
6
7
8
0
1
0
0
0
0
23
37
14
75
0
1
4
0
2
2
7
3
4
2
...
Survey/Census:
Source:
Dataset:
Description:
1990 Decennial Census
Bureau of the Census
1990 Public Use Microdata Sample (PUMS)
The PUMS dataset has basic demographic information about
persons and housing in the U.S. This information comes
from the 1990 Decennial Census long form which is
randomly sent to 1 in every 7 households. This dataset
is for public use and does not compromise the confidentiality
of individuals.
Data Elements:
ID - Record Identifier - A unique id for a record. Each record identifies 1 or
more persons having the same demographic characteristics. (See WGT)
WGT - Person Weight - A weight given to a record to represent the
1 or more persons with the same demographic characteristics. Valid
values: 1..9
SEX - Person Gender - Valid values (0: male, 1: female)
AGE - Person Age in Years - Valid values (0-90) Persons over 90 years of
age are top-coded with an age of 90 for confidentiality reasons.
MARITAL - Person Marital Status - Valid values (0: not applicable, 1: single,
2: married, 3: separated, 4: divorced, 5: widowed). Universe: Persons
over 15 years of age. Those 15 and under are given a value of 0.
For more information: Related Datasets and Publications, Sampling Errors and
Techniques, etc.
27
CMR Support for American FactFinder
XML
CMR
File
ASCII
AFF
File
CMR
ASCII
AFF
File
AFF
Data Elements
Data Sets
Data Products
AFF Metadata
Providers
14
AFF Metadata-Driven Architecture
u
dataset -> AFF can automatically
RunTime
Calls
AFF
Application
Code
Add metadata and data for new
CMR/AFF
Business & Technical
Metadata
search and query the new dataset
u
Geography Trees, Datasets, Subjects,
Report topics, etc. are all generated
at runtime, by accessing the metadata
Produces
u
Business metadata is linked to
technical metadata such that user
selections are used to generate SQL
statements to query the data
AFF Metadata-Driven, Dynamic Application
15
16
CMR Support for Econ 2002 Census
Econ
FGDC
ACSD
File
File
XML
CMR
File
ASCII
AFF
File
CMR
XML
Survey
File
ASCII
AFF
File
Econ
Metadata
Providers
AFF
GIDS
EMR
450 Econ Questionnaires
Activating the CMR
Data Element Registry
Security Framework
(ISO/IEC 11179 Standard)
Configuration Mgmt
Framework
Data Elements, Value Domains, Valid Values, Data Element Concepts,…
Data Set Registry
Data Quality
Inspection Workflow Framework
(Support FGDC CSDGM Geospatial Metadata Standard)
A Data Set is a collection of Data Elements.
Product
Generation
Product Registry
Data Store
Data Set
(OMG CWM Standard)
Query Metadata for the physical data store.
(Supports Relational, MultiGeneration
Dimensional, and Flat File stores)
(Supports FGDC CSDGM Geospatial Metadata Standard & Dublin Core)
A Data Product may be a file/document, website/URL, or physical object.
Business Rule Registry
Survey Registry
Surveys, Survey Instances, Universes, Frames, Sample, Questionnaires,
Questions,…
Taxonomy
Tree
Generation
Survey
Instrument
Generation
Classification Schemes
(ISO/IEC 11179 Standard)
Taxonomies, Keywords
17
Metadata: A core enabling component of any Information technology
Enterprise
Information Portal
Knowledge Mgmt &
Business Intelligence
Digital Libraries
e-Business
ERP
Data Integration
Application/Tool
Integration
Legacy Migration
Data Warehousing
& Decision Support
Data Query
and Search
Leveraging
the
CMR
Data Element Registry
36
18
Government Vision
Data Element Registry
BOC
Demographic,
Economic,
Geographic
Data
BLS
USGS
Economic Geographic
Data
Data
HUD
Housing
Data
EPA
Environmental
FAA
Air Safety
Data
Data
NASA
Aircraft
Data
CDC
Health
Data
NCI
Health
Data
HCFA
Health
Data
Integration Layer
Global Standardized Data Elements
Agency Standardized Data Elements
Non-Standardized Data Elements
19
DER Integration Technology
DER and Metadata
Repository
Legacy
Data
Exports
OLTP DB
Data Warehouse
Staging
DB
Data Marts
Source
Flat Files
FF
ExtractExternal
Transform
Data Sources
Quality
Check
Legacy Migration
Load
Multi-Dimensional
Cubes
DW and Analytics
Web Deployment
Information Portals
E-Commerce Apps
Questions
20