Aspects of data modeling and query processing for complex multidimensional data

Torben Bach Pedersen

Aspects of data modeling and query processing for complex multidimensional data

Torben Bach Pedersen

Institut for Datalogi

Publikation: Ph.d.-afhandling

1406 Downloads (Pure)

Abstract

This thesis is about data modeling and query processing for complex multidimensional data. Multidimensional data has become the subject of much attention in both academia and industry in recent years, fueled by the popularity of data warehousing and On-Line Analytical Processing (OLAP) applications. One application area where complex multidimensional data is common is within medical informatics, an area that may benefit significantly from the functionality offered by data warehousing and OLAP. However, the special nature of clinical applications poses different and new requirements to data warehousing technologies, over those posed by conventional data warehouse applications. This thesis presents a number of exciting new research challenges posed by clinical applications, to be met by the database research community. These include the need for complex-data modeling features, advanced temporal support, advanced classification structures, continuously valued data, dimensionally reduced data, and the integration of complex data. OLAP systems typically employ multidimensional data models to structure their data. This thesis identifies eleven modeling requirements for multidimensional data models. These requirements are derived from a realistic assessment of complex data found in real-world applications. A survey of twelve multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, do not have built-in mechanisms for handling change and time, lack support for imprecision, and are unable to insert data with varying granularities. Additionally, most of the models do not support irregular dimension hierarchies and aggregation semantics. This thesis defines an extended multidimensional data model and algebraic query language that address all eleven requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. In addition, alternative queries unaffected by imprecision are offered. The presented data model and query evaluation techniques can be implemented using relational database technology. The approach is also capable of exploiting multidi-mensional query processing techniques like pre-aggregation. This yields a practical solution with low computational overhead. Pre-aggregation, the prior materialization of aggregate queries for later use, is an essential technique for ensuring adequate response time during data analysis. Full preaggregation, where all combinations of aggregates are materialized, is infeasible. Instead, modern OLAP systems adopt the practical pre-aggregation approach of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. This severely limits the scope of the practical pre-aggregation approach. This thesis significantly extends the scope of practical pre-aggregation to cover a much wider range of realistic situations. Specifically, algorithms are given that transform "irregular" dimension hierarchies and fact-dimension relationships, which often occur in real-world OLAP applications, into well-behaved structures that, when used by existing OLAP systems, enable practical pre-aggregation. The algorithms have low computational complexity and may be applied incrementally to reduce the cost of updating OLAP structures. The transformations can be made transparently to the user. A prototype implementation of the techniques is reported. OLAP systems provide good performance and ease-of-use for queries that aggregate large amounts of data. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This thesis presents the concepts and techniques underlying a flexible, "multi-model" federated system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group dimensional data according to object data. The system is designed to be aggregationsafe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. These capabilities may also be integrated into existing languages. A prototype implementation of the system is reported

Originalsprog	Engelsk
Udgivelsessted	Aalborg
Udgiver	Aalborg Universitetsforlag
Status	Udgivet - 2000

Adgang til dokumentet

Pedersen-torbenForlagets udgivne version, 928 KB

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@misc{27e54ab0b4cb11da9c71000ea68e967b,

title = "Aspects of data modeling and query processing for complex multidimensional data",

abstract = "This thesis is about data modeling and query processing for complex multidimensional data. Multidimensional data has become the subject of much attention in both academia and industry in recent years, fueled by the popularity of data warehousing and On-Line Analytical Processing (OLAP) applications. One application area where complex multidimensional data is common is within medical informatics, an area that may benefit significantly from the functionality offered by data warehousing and OLAP. However, the special nature of clinical applications poses different and new requirements to data warehousing technologies, over those posed by conventional data warehouse applications. This thesis presents a number of exciting new research challenges posed by clinical applications, to be met by the database research community. These include the need for complex-data modeling features, advanced temporal support, advanced classification structures, continuously valued data, dimensionally reduced data, and the integration of complex data. OLAP systems typically employ multidimensional data models to structure their data. This thesis identifies eleven modeling requirements for multidimensional data models. These requirements are derived from a realistic assessment of complex data found in real-world applications. A survey of twelve multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, do not have built-in mechanisms for handling change and time, lack support for imprecision, and are unable to insert data with varying granularities. Additionally, most of the models do not support irregular dimension hierarchies and aggregation semantics. This thesis defines an extended multidimensional data model and algebraic query language that address all eleven requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. In addition, alternative queries unaffected by imprecision are offered. The presented data model and query evaluation techniques can be implemented using relational database technology. The approach is also capable of exploiting multidi-mensional query processing techniques like pre-aggregation. This yields a practical solution with low computational overhead. Pre-aggregation, the prior materialization of aggregate queries for later use, is an essential technique for ensuring adequate response time during data analysis. Full preaggregation, where all combinations of aggregates are materialized, is infeasible. Instead, modern OLAP systems adopt the practical pre-aggregation approach of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. This severely limits the scope of the practical pre-aggregation approach. This thesis significantly extends the scope of practical pre-aggregation to cover a much wider range of realistic situations. Specifically, algorithms are given that transform {"}irregular{"} dimension hierarchies and fact-dimension relationships, which often occur in real-world OLAP applications, into well-behaved structures that, when used by existing OLAP systems, enable practical pre-aggregation. The algorithms have low computational complexity and may be applied incrementally to reduce the cost of updating OLAP structures. The transformations can be made transparently to the user. A prototype implementation of the techniques is reported. OLAP systems provide good performance and ease-of-use for queries that aggregate large amounts of data. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This thesis presents the concepts and techniques underlying a flexible, {"}multi-model{"} federated system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group dimensional data according to object data. The system is designed to be aggregationsafe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. These capabilities may also be integrated into existing languages. A prototype implementation of the system is reported",

author = "Pedersen, {Torben Bach}",

year = "2000",

language = "English",

series = "Publication : Department of Computer Science, Aalborg University",

number = "4",

publisher = "Aalborg Universitetsforlag",

}

TY - GEN

T1 - Aspects of data modeling and query processing for complex multidimensional data

AU - Pedersen, Torben Bach

PY - 2000

Y1 - 2000

N2 - This thesis is about data modeling and query processing for complex multidimensional data. Multidimensional data has become the subject of much attention in both academia and industry in recent years, fueled by the popularity of data warehousing and On-Line Analytical Processing (OLAP) applications. One application area where complex multidimensional data is common is within medical informatics, an area that may benefit significantly from the functionality offered by data warehousing and OLAP. However, the special nature of clinical applications poses different and new requirements to data warehousing technologies, over those posed by conventional data warehouse applications. This thesis presents a number of exciting new research challenges posed by clinical applications, to be met by the database research community. These include the need for complex-data modeling features, advanced temporal support, advanced classification structures, continuously valued data, dimensionally reduced data, and the integration of complex data. OLAP systems typically employ multidimensional data models to structure their data. This thesis identifies eleven modeling requirements for multidimensional data models. These requirements are derived from a realistic assessment of complex data found in real-world applications. A survey of twelve multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, do not have built-in mechanisms for handling change and time, lack support for imprecision, and are unable to insert data with varying granularities. Additionally, most of the models do not support irregular dimension hierarchies and aggregation semantics. This thesis defines an extended multidimensional data model and algebraic query language that address all eleven requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. In addition, alternative queries unaffected by imprecision are offered. The presented data model and query evaluation techniques can be implemented using relational database technology. The approach is also capable of exploiting multidi-mensional query processing techniques like pre-aggregation. This yields a practical solution with low computational overhead. Pre-aggregation, the prior materialization of aggregate queries for later use, is an essential technique for ensuring adequate response time during data analysis. Full preaggregation, where all combinations of aggregates are materialized, is infeasible. Instead, modern OLAP systems adopt the practical pre-aggregation approach of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. This severely limits the scope of the practical pre-aggregation approach. This thesis significantly extends the scope of practical pre-aggregation to cover a much wider range of realistic situations. Specifically, algorithms are given that transform "irregular" dimension hierarchies and fact-dimension relationships, which often occur in real-world OLAP applications, into well-behaved structures that, when used by existing OLAP systems, enable practical pre-aggregation. The algorithms have low computational complexity and may be applied incrementally to reduce the cost of updating OLAP structures. The transformations can be made transparently to the user. A prototype implementation of the techniques is reported. OLAP systems provide good performance and ease-of-use for queries that aggregate large amounts of data. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This thesis presents the concepts and techniques underlying a flexible, "multi-model" federated system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group dimensional data according to object data. The system is designed to be aggregationsafe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. These capabilities may also be integrated into existing languages. A prototype implementation of the system is reported

AB - This thesis is about data modeling and query processing for complex multidimensional data. Multidimensional data has become the subject of much attention in both academia and industry in recent years, fueled by the popularity of data warehousing and On-Line Analytical Processing (OLAP) applications. One application area where complex multidimensional data is common is within medical informatics, an area that may benefit significantly from the functionality offered by data warehousing and OLAP. However, the special nature of clinical applications poses different and new requirements to data warehousing technologies, over those posed by conventional data warehouse applications. This thesis presents a number of exciting new research challenges posed by clinical applications, to be met by the database research community. These include the need for complex-data modeling features, advanced temporal support, advanced classification structures, continuously valued data, dimensionally reduced data, and the integration of complex data. OLAP systems typically employ multidimensional data models to structure their data. This thesis identifies eleven modeling requirements for multidimensional data models. These requirements are derived from a realistic assessment of complex data found in real-world applications. A survey of twelve multidimensional data models reveals shortcomings in meeting some of the requirements. Existing models do not support many-to-many relationships between facts and dimensions, do not have built-in mechanisms for handling change and time, lack support for imprecision, and are unable to insert data with varying granularities. Additionally, most of the models do not support irregular dimension hierarchies and aggregation semantics. This thesis defines an extended multidimensional data model and algebraic query language that address all eleven requirements. The model reuses the common multidimensional concepts of dimension hierarchies and granularities to capture imprecise data. For queries that cannot be answered precisely due to the imprecise data, techniques are proposed that take into account the imprecision in the grouping of the data, in the subsequent aggregate computation, and in the presentation of the imprecise result to the user. In addition, alternative queries unaffected by imprecision are offered. The presented data model and query evaluation techniques can be implemented using relational database technology. The approach is also capable of exploiting multidi-mensional query processing techniques like pre-aggregation. This yields a practical solution with low computational overhead. Pre-aggregation, the prior materialization of aggregate queries for later use, is an essential technique for ensuring adequate response time during data analysis. Full preaggregation, where all combinations of aggregates are materialized, is infeasible. Instead, modern OLAP systems adopt the practical pre-aggregation approach of materializing only select combinations of aggregates and then re-use these for efficiently computing other aggregates. However, this re-use of aggregates is contingent on the dimension hierarchies and the relationships between facts and dimensions satisfying stringent constraints. This severely limits the scope of the practical pre-aggregation approach. This thesis significantly extends the scope of practical pre-aggregation to cover a much wider range of realistic situations. Specifically, algorithms are given that transform "irregular" dimension hierarchies and fact-dimension relationships, which often occur in real-world OLAP applications, into well-behaved structures that, when used by existing OLAP systems, enable practical pre-aggregation. The algorithms have low computational complexity and may be applied incrementally to reduce the cost of updating OLAP structures. The transformations can be made transparently to the user. A prototype implementation of the techniques is reported. OLAP systems provide good performance and ease-of-use for queries that aggregate large amounts of data. However, the complex structures and relationships inherent in data in non-standard applications are not accommodated well by OLAP systems. In contrast, object database systems are built to handle such complexity, but do not support OLAP-type querying well. This thesis presents the concepts and techniques underlying a flexible, "multi-model" federated system that enables OLAP users to exploit simultaneously the features of OLAP and object systems. The system allows data to be handled using the most appropriate data model and technology: OLAP systems for dimensional data and object database systems for more complex, general data. As a vehicle for demonstrating the capabilities of the system, a prototypical OLAP language is defined and extended to naturally support queries that involve data in object databases. The language permits selection criteria that reference object data, queries that return combinations of OLAP and object data, and queries that group dimensional data according to object data. The system is designed to be aggregationsafe, in the sense that it exploits the aggregation semantics of the data to prevent incorrect or meaningless query results. These capabilities may also be integrated into existing languages. A prototype implementation of the system is reported

M3 - PhD thesis

T3 - Publication : Department of Computer Science, Aalborg University

PB - Aalborg Universitetsforlag

CY - Aalborg

ER -

Aspects of data modeling and query processing for complex multidimensional data

Abstract

Adgang til dokumentet

AUB Link

Fingeraftryk

Citationsformater