Tuesday, July 5, 2016

A DATA WAREHOUSE SOLUTION FOR E-GOVERNMENT

Data Model It includes conceptual, logical and physical data model. In the conceptual modeling phase, we need to capture and understand the user requirements so that the concepts revealed can be accommodated into the conceptual model. For the conceptual design of DW, we propose to use the multidimensional model rather than relational model as it provides us a more clear view of data structure - facts and dimensions, which is better to support analytical work. In the logical model design, the main issue is how to design the hierarchy levels of dimensions as the multi-levels of dimensions may cause a number of tables joins when do the query operations, such as roll up. In database, join is an expensive operator, especially, for large tables. Hierarchy dimensions design can be less hierarchy levels but with a bigger dimension table, e.g., many fields populated in one table or more hierarchy levels but with a smaller dimension table. Thus, we need to considerate the impact of multi-joins, which the dimension hierarchy levels bring, to the query speed when we do the logical design of data warehouse. In the physical modeling, tables are declared for different dimensions and facts. Star or snowflake data schema is generated for the dimensions and facts physically, indexes are built and data partitions are used for the large amount of data populated in the fact tables. The partition technologies will be our main concern in the physical design. Usually, the data in the warehouse can be vertically partitioned by measures or horizontally partitioned by dimensions. In our design, we propose to horizontal partition according to the location dimension, which is also indicated in our proposed architecture: the repository for each European region. Data Source 3XL storage system will be used for the eGovMon DW data source, The 3XL system can automatically generate a specialized schema for the data based on Web Ontology Language (OWL) descriptions of classes and their properties. It creates a table for each of the OWL classes. The data of instances is hold in the table of the class. In order to achieve high performance, the inserted data will be buffered in the main memory and only be flushed to the database when memory is needed or by committing. The experience from EIAO DW, in which 3store system was used, is that 90% - 99% of used time was spent on extracting data from the 3store  However, by the use of data buffer and bulk loading technologies, 3XL storage system can be expected to gain a better performance than 3store storage system. ETL The ETL is the way of extracting the data from different data sources, converting them into the uniform data format and loading into the data warehouse. In the proposed architecture, there are several number of Right-Time ETLs (RiTE) running in parallel to load the data from distributed 3XL data sources to the central data warehouse. RiTE is an ETL technique that can make the new inserted source data quickly available to data consumers, while still providing bulk-load insert speed. The trick is that catalyst, a middle-ware system, is used to achieve fast loading and concurrency control. eGovMon DW The data warehouse is the central data repository that stores the materialized view of source data. It uses a multidimensional model where the data is stored as facts and dimensions. As the data volume will be very huge in our data warehouse, data partition is necessary to achieve a better query performance. In this proposal, the data partition by the location dimension, e.g., the different European regions, will be applied to our data warehouse but logically it still functions as a whole data repository. Open source PostgreSQL is proposed to use as the DBMS of eGovMon data warehouse as it provides several useful features that are crucial to data warehouse, such as good extensibility, table partitioning and bit-mapped indexes etc. Besides, PostgreSQL performs very well for complex queries on large databases. OLAP A data warehouse stores and manages data. OLAP transforms data warehouse data into strategic information providing users multidimensional views for their analysis. Open source tools, like JasperAnalysis, Pentaho etc, can be chosen to fulfill the normal OLAP operations, such as slice and dice, pivot, filter, chart, drill-down, or roll-up a cube of data in real-time. In summary, we gave an overview and the general components' information of eGovMon DW architecture above. The technologies used and architecture will evolve with the project maturity.