Where does the DBMS store the definitions of data elements and their relationships?

Technical Forces Driving the Adoption of Web Services

Douglas K. Barry, David Dick, in Web Services, Service-Oriented Architectures, and Cloud Computing (Second Edition), 2013

Adopting Standard Data Element Definitions

In the early 1980s, many large organizations were running custom software and there was very little use of packaged software. At the time, it was believed that there would be opportunities to internally exchange data more easily, reduce development time, and possibly reduce maintenance costs if all the custom software were to use the same data element definitions. These opportunities are shown as driving forces in Figure 5.3. Restraining forces related to cost offset these driving forces. Figure 5.3 shows the restraining forces of costs to developing the standard definitions and the costs related to changing existing systems.

Where does the DBMS store the definitions of data elements and their relationships?

Figure 5.3. Force field analysis for adopting standard data element definitions.

There are additional restraining forces in this figure. In some cases, there were valid reasons that two different systems used different definitions for the same data element. At the time, there had been little progress in developing a standard set of data element definitions that could be shared by various organizations. Therefore, the cost of developing a standard set for a single organization was quite high because it involved starting with a clean sheet of paper. Even if efforts to use standard data element definitions had been successful, the first merger or acquisition would likely cause a problem. The systems used by every other organization would likely have different data element definitions. Finally, as the use of packaged software increased, the definitions used in those products would most likely be incompatible. With enough mergers or acquisitions and use of packaged software, you would be back at the starting point with incompatible data element definitions.

Times have changed since the early 1980s and so have attitudes toward standard data element definitions. Some industries can see advantages in having standard definitions so that data can easily be interchanged among organizations. Another advantage to standard data element definitions is that they lessen the integration efforts involved in mergers and acquisitions. The term data element definition has more or less been replaced by semantic vocabulary. Chapter 3 discussed the opportunity and importance of standardized semantic vocabularies. A sampling of such vocabularies by industry can be found on page 179.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123983572000051

Metadata and Data Standards

David Loshin, in The Practitioner's Guide to Data Quality Improvement, 2011

Publisher Summary

This chapter discusses the use of data standards relying on common metadata definitions as a way to formalize data element metadata, and as a byproduct, information structure, and meanings. Looking at the relationship between the need for standard data exchange and assessing the various uses of data concepts help collate organizational metadata as a way to document data quality assertions in conjunction with data element definitions. Data standards and metadata management provide a basis for harmonizing business rules, and consequently data quality rules from multiple sources of data across the organization. Some general processes for data standards and metadata management are discussed in the chapter; evaluating the specific needs within an organization can lead to the definition of concrete information and functional requirements. In turn, the data quality team uses these requirements to identify candidate metadata management tools that support the types of activities described in the chapter. Solid metadata management supports more than just data quality; a focused effort on standardizing the definitions, semantics, and structure of critical data elements can be leveraged when isolating data quality expectations for downstream information consumers. These processes also highlight the data elements that are likely candidates for continuous inspection and monitoring as described as part of the data quality service level agreement.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123737175000105

Locating Metadata

Jack E. Olson, in Database Archiving, 2009

8.1.4 Tables

Rows that have the same definition are grouped into tables. This is the relational context. For IMS all segments using the same segment layout are referred to as a segment type. The collection of all segment instances having the same segment type is the same idea as a table.

There are two characteristics of tables that often get confused with characteristics of data elements. One is uniqueness. A data element definition does not have this characteristic. A data element may acquire the unique characteristic when used in a table. It is a table characteristic. For example, part_number might have uniqueness in the Parts table but not in the Purchase Order table. The other attribute that has this characteristic is nullability. A data element may be permitted to assume the null value in one table but not when used in another table.

Examples of data row definitions and the tables they are stored in for a purchase order business object are shown in Figure 8.4.

Where does the DBMS store the definitions of data elements and their relationships?

Figure 8.4. Data row metadata examples.

Another note on terminology is that many data-modeling products use the word entity to refer to what I call a table. Even when they're connected in an ERD, you still cannot identify specific business objects from the real world. It makes sense for the word entity to be associated with business objects and the word table to be associated with fragments of data that constitute a portion of a business object. In the real world this is what you see. For the archivist, this is the picture you want to portray.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012374720400008X

Enterprise Information Management

Alexander Borek, ... Philip Woodall, in Total Information Risk Management, 2014

F Metadata management

Metadata is data about data. Metadata management is about proposing, reviewing, agreeing to, endorsing, facilitating the observance of, rewarding compliance with, and managing metadata policies. Policies consist of a concept, a context, and a process. There are different types and layers of metadata:

Business definitions of metadata include concepts, business terms, definitions, and semantics of data.

Reference metadata includes conceptual domains, value domains, reference tables, and mapping. Data elements metadata includes critical data elements, data elements definitions, data formats, and aliases/synonyms.

Information architecture metadata includes entity models, relational tables, and master object directory.

Data governance metadata includes information usage, information quality, information quality service-level agreements (SLAs), and access controls.

Services metadata includes service directory, service users, and interfaces.

Business metadata includes business policies, information policies, and business rules.

Metadata policies need to analyze, identify, document, and harmonize definitions and put shared repositories into place.

Another task area is providing naming standards for data (e.g., “PersonLastName”, “OrderMonthly TotalAmount”), which can improve syntactical and semantic consistency, reduce lexical complexity, and help employ a controlled vocabulary. Furthermore, data model standards have to be designed for data elements (e.g., data element names, types, representations, formats, and entity modeling standards), which can include master data domains and standards attributes. Part of the activity process should also discover existing metadata (i.e., data values, business rules, and object relationships) and make it explicit. Data standards need to be harmonized into one coherent policy document that everyone in the organization has to comply with. A core aim should be to educate the rest of the organization that this policy document exists and how they can comply with it. Moreover, data and business rules are defined with due consideration to the business policy and context.

Metadata can support information integration across the enterprise, standardize the use of information, simplify data management, and make master data management consistent. It can, therefore, enable and improve effective business intelligence. Difficulties in metadata management appear because data policies are often defined but not enforced and complied with in practice.

Where does the DBMS store the definitions of data elements and their relationships?
ATTENTION

Many organizations neglect the importance of metadata. An enterprise-wide approach to metadata management is essential, otherwise all other EIM efforts may be jeopardized.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B978012405547600002X

Master Data Management and Data Quality

David Loshin, in The Practitioner's Guide to Data Quality Improvement, 2011

19.5 MDM: A High-Level Component Approach

For the purposes of data quality management, we can consider a component model view of a master data environment, then consider the architectural implementation spectrum. Figure 19.1 provides an overview of the component model, which essentially composes a master data repository, master data services, and the associated governance services necessary to support the master environment.

Where does the DBMS store the definitions of data elements and their relationships?

Figure 19.1. The MDM component model.

Essentially, we can divide this component model into three pieces:

1.

The master data repository,

2.

The collection of master data services, and

3.

The data governance processes and services.

19.5.1 The Master Data Repository

The master data repository is the framework in which master data objects and entities are represented. Conceptually, this collection of components is used to manage multiple aspects of what eventually is managed as master data:

Reference data, consisting of the enumerated data domains and associated mappings used by multiple business applications

Metadata, including the data element definitions, semantics, and structure definitions, for the shared data models for persistent storage and for data exchange

Master data models, for the representation of the different data entities managed as master data

Business rules, which implement the business policies associated with the master data

Hierarchies and relationships, used to establish connections between master data entities (such as organizational chart, financial chart of accounts, or household relationships)

The entities themselves

All of these are accessed and managed via a set of services, subject to operational data governance processes. The data is managed using one of a variety of MDM architectures, as discussed in section 19.7.

19.5.2 Master Data Services

Managing the master data environment depends on a collection of services that enable the management of and access to the master data. For example, a master data environment could be supported using these types of services:

Integration and consolidation, including data intake process management, master index/registry management, consolidation rules management, survivorship rules management, and source data and lineage management

Data publication and data access, including publish, subscribe, data life-cycle services (create, read, update, retire, archive); connectors to existing data assets; connectors to existing applications; data transformations; and associated data access web services

Data quality and cleansing, including parsing and standardization, data enrichment, data correction, identity resolution and matching, and unmerging of incorrectly consolidated records

Access control, including management of a participant registry, participant life-cycle management (create, read, update, retire), authentication, authorization, role management, and role-based access control

Metadata management, including master data model management, master data exchange model management, master metadata management (data element definitions, semantics, data types), reference data management, data quality rule management, business rules management, hierarchy management, relationship management, and aliasing and mapping rules management

Although this provides a high-level view of the types of services necessary for MDM, actual implementations may describe provided services with a more precise level of granularity.

19.5.3 Operational Data Governance

In support of the data governance techniques described in chapter 7 and the data inspection and monitoring described in chapter 13, a master data management environment must also provide services for operational data governance, including:

Incident reporting and incident tracking, as discussed in chapters 13 and 17;

Notification/alert management, so that when an issue is identified the right data steward is notified;

Data browsing, which allows the data steward to scan through and review records as part of the issues evaluation and remediation process;

Data profiling and assessment, as described in chapter 11;

History/log management, which allows the data stewards to review both the automatic and manual modifications to master data;

Privacy policy management, to review the privacy settings associated with accessing master data;

Stewardship role management, for documenting which data stewards are responsible for which master data sets; and

Governance policy management, for overseeing the processes of documenting enterprise data governance policies as they relate to master data, and their implementation.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123737175000191

MDM Components and the Maturity Model

David Loshin, in Master Data Management, 2009

3.3.2 Consolidated Metadata Management

A by-product of the process for identifying and clarifying data element names, definitions, and other relevant attribution is the discovery and documentation of enterprise-wide business metadata. Aside from collecting standard technical details regarding the numerous data elements that are potentially available, there is a need to determine business uses of each data element; which data element definitions refer to the same concept; the applications that refer to manifestations of that concept; how each data element and associated concepts are created, read, modified, or retired by different applications; the data quality characteristics; inspection and monitoring locations within the business process flow; and how all the uses are tied together.

Because the use of the data elements and their underlying concepts drives how the business application operates using master data, the enterprise metadata repository effectively becomes the control center driving and managing the business applications. Therefore, a critical component of an MDM environment is an enterprise business metadata management system to facilitate the desired level of control. At an even grander level, the metadata management framework supports the definition of the master data objects themselves: which data objects are managed within the MDM environment, which application data sources contribute to their consolidation and resolution, the frequency of and processes used for consolidation—everything necessary to understand the complete picture of the distributed use of master data objects across the enterprise.

It is worthwhile to note that advocating an enterprise-wide approach to metadata management does not necessarily mean purchasing an enterprise metadata management tool. Rather, the focus is on the procedures for sharing information, even if that is facilitated through less sophisticated means. The important part is reaching consensus on enterprise metadata.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123742254000035

Data-Driven Architecture for Big Data

Krish Krishnan, in Data Warehousing in the Age of Big Data, 2013

There are multiple types of probabilistic links and depending on the data type and the relevance of the relationships, we can implement one or a combination of linkage approaches with metadata and master data.

Consider two texts: “long John is a better donut to eat” and “John Smith lives in Arizona.” If we run a metadata-based linkage between them, the common word that is found is “John,” and the two texts will be related where there is no probability of any linkage or relationship. This represents a poor link, also called a weak link.

On the other hand, consider two other texts: “Blink University has released the latest winners list for Dean’s list, at deanslist.blinku.edu” and “Contact the Dean’s staff via deanslist.blinku.edu.” The email address becomes the linkage and can be used to join these two texts and additionally connect the record to a student or dean’s subject areas in the higher-education ERP platform. This represents a strong link. The presence of a strong linkage between Big Data and the data warehouse does not mean that a clearly defined business relationship exists between the environments; rather, it is indicative of a type of join within some context being present.

Consider a text or an email:

From: [email protected]

Subject: bill payment

Dear sir, we are very sorry to inform you that due to your poor customer service we are moving our business elsewhere.

Regards, John Doe

With the customer email address we can always link and process the data with the structured data in the data warehouse. This link is static in nature, as the customer will always update his or her email address. This link is also called a static link. Static links can become a maintenance nightmare if a customer changes his or her information multiple times in a period of time. This is worse if the change is made from an application that is not connected to the current platform. It is easy to process and create static linkages using master data sets.

Another type of linkage that is more common in processing Big Data is called a dynamic link. A dynamic relationship is created on-the-fly in the Big Data environment by a query. When any query executes, it iterates through for one part of the linkage in the unstructured data and next looks for the other part in the structured data. The linkage is complete when the relationship is not a weak probability. In probabilistic linking we will use metadata and semantic data libraries to discover the links in Big Data and implement the master data set when we process the data in the staging area.

Though linkage processing is the best technique known today for processing textual and semi-structured data, its reliance upon quality metadata and master data along with external semantic libraries proves to be a challenge. This can be overcome over a period of time as the data is processed effectively through the system multiple times, increasing the quality and volume of content available for reference processing.

To effectively create the metadata-based integration, a checklist will help create the roadmap:

1.

Definition:

Data element definitions

Data element business names

Data element abbreviations/acronyms

Data element types and sizes

Data element sources

Data-quality observations

2.

Outline the objectives of the metadata strategy:

Goals of the integration

Interchange formats

Data-quality goals

Data scalability of processing

3.

Define the scope of the metadata strategy:

Enterprise or departmental

4.

Define ownership:

Who is the steward of the metadata?

Who is the program sponsor?

Who will sign off on the documents and tests?

5.

Define stewardship:

Who own the metadata processes and standards?

What are the constraints today to process metadata?

6.

Master repository:

A best-practice strategy is to adopt the concept of a master repository of metadata.

This approach should be documented, as well as the location and tool used to store the metadata. If the repository is to be replicated, then the extent of this should also be noted.

7.

Metadata maintenance process:

Explain how the maintenance of metadata is achieved.

The extent to which the maintenance of metadata is integrated in the warehouse development life cycle and versioning of metadata.

Who maintains the metadata (e.g., Can users maintain it? Can users record comments or data-quality observations?).

8.

User access to metadata:

How will users interact and use the metadata?

Once the data is processed though the metadata stage, a second pass is normally required with the master data set and semantic library to cleanse the data that was just processed along with its applicable contexts and rules.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124058910000118

Designing for Archive Independence

Jack E. Olson, in Database Archiving, 2009

10.2 Independence from DBMS

The operational data will be stored in a DBMS or a file system. Most of these are very proprietary in the way they represent data structures, data relationships, and data elements.

The archive should be a DBMS neutral storage place. It should support industry-standard JDBC processing that can be achieved through a variety of data storage implementations.

10.2.1 Relational Data Sources

Even when RDBMS vendors claim to be using industry standards, as most of them do, they all have unique differences that make them nonstandard.

Relational systems also change on a regular basis. They add new data types, new output functions, and new relationship capabilities. They have always been downward compatible, thus preserving the ability to work with older data.

In moving data to the archive, it is important to get to a true industry-standard storage format. This can usually be done within any relational system by restricting data element definitions to industry-standard JDBC formats. You must resist the temptation to use structures that are not pervasive. For example, the timestamp construct of DB2 is unknown by most other relational DBMS systems.

Another area of difference in relational systems is the handling of large data objects (BLOB, CLOB, and others). Examine the implementations that you use and ensure that you can get to a standard implementation in the archive. LOB data structures are becoming a more common part of operational database applications. They must be carefully considered when you're designing an archive application.

10.2.2 Nonrelational Data Sources

If the operational systems use nonrelational DBMS stores, the need to change the data is even more critical. DBMS systems such as IMS, ADABAS, M204, and IDMS all have this requirement. The users want nothing more than to retire the applications that use them from service. They run on expensive mainframes. They require expert staff that is getting harder and harder to find.

If the archive data is stored in their operational formats, the systems must be kept indefinitely. If it is transformed to relational, they do not. Most companies have plenty of experience is transforming data from these systems to a relational format.

10.2.3 Multiple Data Sources

Another reason to strive for DBMS independence in the archive is that data will most likely come from more than one DBMS type. This might be true in the beginning because you have parallel applications. If not, it might become true in the future when the applications are moved to other DBMS types.

Again, you do not want to have to deal with the archive when you make the decision to move an operational database to another DBMS type. If the archive is DBMS independent, it won't matter.

10.2.4 The Archive Data Store

The archive data store needs to be able to handle all the independence factors described in this section. That tends to rule against any of the industry-standard DBMS systems from being used for the archive store.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123747204000108

Technical Forces Driving the Adoption of SOA

Douglas K. Barry, David Dick, in Web Services, Service-Oriented Architectures, and Cloud Computing (Second Edition), 2013

Adopting Standard, Enterprise-Wide Software

One early integration technique was for an organization to adopt enterprise-wide software. This worked sometimes. When it did, however, usually it was successful only for a short period. The obvious appeal of adopting standard software is that everyone uses the same software. This means that the entire organization uses the same data definitions, semantics, and formats for exchanging data. Often, this worked best for organizations that were small and were putting a new set of systems in place. Nevertheless, standardizing on systems software often runs into problems, too. There are long-term restraining forces, such as mergers and acquisitions, that can come into play. Even a new, small organization can acquire another organization that uses an entirely different system, and integration problems begin. Figure 6.1 provides the force field analysis for adopting standard, enterprise-wide software.

Where does the DBMS store the definitions of data elements and their relationships?

Figure 6.1. Force field analysis for adopting standard, enterprise-wide software.

This approach has a mergers and acquisitions restraining force for a similar reason as seen in trying to establish standard data element definitions in Chapter 5. The other organization can easily use different software. It is also common in larger organizations that some departments have different software needs. It is rare that you can find “one size fits all” software. Another downside is that adopting a complete set of software systems from a single vendor makes your organization dependent on that single vendor. As soon as you move away from that vendor’s products, you might be back into common integration issues. For organizations that have existing systems, adopting standard software can mean a mass conversion to the new software. This is often problematic and should be seen as a restraining force. Finally, it is often the case that the product doesn’t provide all the functionality that is needed.

Note that none of the restraining forces in this figure are shown in gray. This means that they will not diminish over time and will remain restraining forces for the foreseeable future.

Of course, every example has a counterexample. There are some industries where mergers and acquisitions are commonplace. You will see organizations in those industries adopting common, industry-wide software packages so that it will be easier for one organization to be acquired or merged with another organization. So, mergers and acquisitions can also be a driving force. This is represented in Figure 6.1 with a dashed line. Although I have not seen any empirical data on it, my experience is that this is the exception rather than the rule. That is the reason for the dashed line, because it is likely to apply to only some industries.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123983572000063

Practical Data Stewardship

David Plotkin, in Data Stewardship, 2014

Using Working Groups

Between full Data Stewardship Council meetings and using the interactive forum(s) is the possibility of forming working groups. These are committees formed by Business Data Stewards who are responsible for gathering feedback from interested parties to resolve an issue or settle a disagreement. The working groups are needed when a question requires widespread input from the business community. The steward schedules and runs the meetings; the participants are business users who are impacted by items, such as:

A proposed change to a data element definition or derivation

Detection and correction of a perceived data quality problem, including revising data quality rules

Changes to usage and creation business rules

The organizing steward is accountable for getting the issue resolved and bringing back a consensus to the Data Stewardship Council to propose adoption and sign-off where appropriate.

In the Real Life

An example from the insurance world should help to illustrate how an interactive forum (called a discussion board at this company) and a set of working group meetings helped to resolve an issue around a key term: “close ratio.” The process for resolving this issue also closely followed the process flow illustrated in Figure 6.2.

The term “close ratio” had been defined, approved, and entered in the business glossary by the owning business function (Sales). However, during a project, the term came up in internal discussions, and business people were confused by the usage and name. They therefore decided to change the definition. Fortunately, one of the team was aware of the DGPO, and advised the project team that they did not have the authority to make the change. Data Governance was then engaged in the process (Identify in Figure 6.2).

The sales Business Data Steward got a working group of the concerned participants together, who voiced their concerns and the confusion around what the term meant. The steward then created an issue on the discussion board, listing those concerns, and subscribed everyone in the working group to the issue. People individually stated what they thought the definition should be, as well as identifying variations on the term that could be considered as additional data terms. The steward then proposed the names and definitions on the discussion board (Rationalize). The assignment (to Sales) did not change. The discussion board entries fleshed out the full definition of the renamed term (“unique quote to close ratio”), the additional identified terms, and how the term was derived (Define/Derive). In the end, the revised definition was entered in the business glossary (Finalize and Document) and the Sales Business Data Steward approved it.

If you are interested in seeing the very complete definition and derivation associated with this business term, please see Appendix A.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780124103894000064

How would you define the term database How would you define the term database management system?

Database defined A database is an organized collection of structured information, or data, typically stored electronically in a computer system. A database is usually controlled by a database management system (DBMS).

What are the main functions of a DBMS?

The DBMS manages the data; the database engine allows data to be accessed, locked and modified; and the database schema defines the database's logical structure. These three foundational elements help provide concurrency, security, data integrity and uniform data administration procedures.

Is a DBMS function in which the DBMS creates and manages the complex structures required for data storage?

One of the DBMS functionality is creating and managing the complex structures required for data storage, thus relieving you from the difficult task of defining and programming the physical data characteristics.

What is a DBMS and what are its functions quizlet?

What is a DBMS, and what are its functions? A database management system (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database. DBMS functions: 1.