(0 Item)

Data Analysis, Data Quality & Metadata Management (DAMD 2010)

Academic Conferences - Data Analysis, Data Quality & Metadata Management (DAMD 2010)
Academic Conferences

By : Global Science & Technology Forum

Date : 2010

Location : Singapore / Singapore

PDF 100p
Description :

The proceedings present the latest academic research and business applications on data analysis, data quality & metadata management by organisations.

Keywords :

metadata, data mining, data integration, Business Process Redesign, Business network redesign, Business scope redefinition, metamodel, unified modelling language, datamining conference proceedings

Keywords inside documents :

metadata ,quality ,services ,health ,specifications ,system ,service ,management ,systems ,confidence ,object ,region ,approach ,climate ,snapshots ,event ,metamodel ,trust ,banks ,values

Product/ documentation details
Proposed Semantics for the Sampled Values and Metadata of Scientific Values

Company Description : We propose semantics for the manipulation of sample values and metadata for scientific data. Our approach uses domain knowledge from an ontology; computational knowledge about dimensions, units, coordinate systems, etc; and cultural and linguistic norms. It recombines annotated values with specialized operators. This semantics does several types of dynamic error checking and results in “selfdocumented computation”.

Product Type : Academic Conferences

Author : Joseph Phillips, De Paul University, School of Computing and Dig

PDF 8p

Languages : English

Proceedings from Data Analysis, Data Quality & Metadata Management (DAMD 2010) conference.

 

Data that is correct, consistent, timely and coherent could be a key differentiator in these difficult economic times. Organizations recognize the critical importance of their data management capabilities to help achieve their goals for effectively managing information and acknowledge the quantum leap that needs to be made to improve them. Great efforts are made to develop high quality data sets and accompanying metadata, so that individual scientists and organizations can focus their valuable time on the analyses.

Abstracts of Proceedings included in the Data Analysis, Data Quality & Metadata Management (DAMD 2010) conference proceedings. 

 

 

  1. Data Integration and Data mining Framework to Discover Health Impacts of Climate Change

Liwan H. Liyanage, School of Computing and Mathematics, University of Western Sydney, Locked Bag 1797, Penrith, South DC, NSW 1797, Australia & Sushan H. Liyanage, Analytics and Modeling, Yes’ Optus, Building B, Level 3,1 Lyonpark Road Macquarie, Park, NSW 2113, Australia

 

According to “Healthy Planet, Places and People” report prepared by Research Australia (2008) top ten health impacts of climate change are “Deaths during heatwaves – especially heart attacks and strokes; Asthma; Mosquito borne infectious diseases; Gastroenteritis and food poisoning; Mental health; Water-borne disease; Obesity, diabetes and cardiovascular disease; Indigenous health; Declining food yields, nutrition and health and Deteriorating global health”. Thus it becomes crucial that health and medical researchers understand how climate change and the changing environment influence these impacts in order to control, mitigate and combat them.

This paper gives a framework in integrating climate, pollution and health indicator variables such as disease prevalence and occurrence data to discover the impact of climate change on health. Further it identifies the gaps in current data collection procedures and gives a recommendation on how it can be improved to facilitate such an integrated approach widely. Finally the framework is demonstrated using Asthma prevalence data and temperature attributed deaths and illustrates the limitations of the current data collection methods.

 

 

  1. A multilevel approach to interoperability in surveillance and reconnaissance

Barbara Essendorfer, Fraunhofer IOSB, Fraunhoferstr. 1, 76131 Karlsruhe, Germany & Christian Kerth, Fraunhofer IOSB, Fraunhoferstr. 1, 76131 Karlsruhe, Germany & Gerd Schneider, Fraunhofer IOSB, Fraunhoferstr. 1, 76131 Karlsruhe, Germany

 

In the domain of civil and military surveillance and reconnaissance sensors and exploitation systems of different producers are used to achieve an overall picture of a critical situation. In today’s multinational cooperation’s on security and peace keeping it is essential to be able to share data that is produced by one national asset with other systems or even other nations. Therefore interoperability has to be established between those various systems, since each of them is currently dealing with different metadata/data formats and interfaces.

Within the multinational intelligence and surveillance project MAJIIC (Multi-Sensor Aerospace-Ground Joint ISR Interoperability Coalition) various standards have been developed enable sharing data. Those range from common data representation (e.g. imagery or radar data), metadata models and communication protocols to Coalition Shared Data (CSD) servers. The CSD servers provide a decentralized storage facility in which the standardized information is persisted and available to all participants through synchronization. Using standardized client interfaces relevant data can be found and retrieved from the storage facility.

By the approach of standardization, many of the interoperability issues have been overcome on a data representation level, resulting in the ability to share the data. However, to be able to understand the data and translate it into information more work has to be done. The integration of the various systems into a single coherent approach needs to be continued on the process, semantic interpretation and pragmatic level in order to achieve full interoperability. The usage and semantic of the metadata has to be defined as well as user roles and responsibilities. Rules have to be established to enable the correct interpretation and validation of data.

The paper describes the exercise based approach that is used in the project and reflects on the necessity of a multilevel approach to achieve interoperability.

 

 

  1. Proposed Semantics for the Sampled Values and Metadata of Scientific Values

Joseph Phillips, De Paul University, School of Computing and Digital Media, 243 S. Wabash Ave

Chicago, IL 60604, USA.

 

We propose semantics for the manipulation of sample values and metadata for scientific data. Our approach uses domain knowledge from an ontology; computational knowledge about dimensions, units, coordinate systems, etc; and cultural and linguistic norms. It recombines annotated values with specialized operators. This semantics does several types of dynamic error checking and results in “selfdocumented computation”.

 

 

  1. The need for ICT induced Organizational Transformation among the Public Sector Banks in Sri Lanka

Selvarajan, P., Faculty of Business Studies, Vavuniya Campus of the University of Jaffna, Sri Lanka

 

This paper focuses the need for the organizational transformation in the public sector banks in Sri Lanka. Bank of Ceylon and People’s bank are the two major public commercial banks in Sri Lanka. The author investigated the need for the ICT induced organizational transformation in these banks as a case study. Further the five levels of ICT induced Organizational Transformation process identified by Venkatraman (1990) has been compared with the Sri Lankan public sector banks and recommended possible ICT strategies to be implemented in future in order to reduce the customer traffic and sustain in the competitive market.

 

 

  1. Implementing and Evaluating Snapshots + Events Spatiotemporal Modelling Approach

Cristian Vidal, Universidad de Talca, Ingeniería Informática Empresarial & Akbar Ghobakhlou, Auckland University of Technology, Geoinformatics Research Centre & Sara Zandi, Auckland University of Technology, Geoinformatics Research Centre

 

Traditional approaches for modelling spatiotemporal information (snapshots, states and events) are not very efficient and usually not capable of retrieving information based on specific spatiotemporal query. The snapshots modelling technique is the oldest and simplest approach and does not support any spatiotemporal query. The most current approach is the events modelling technique which allows data retrieval using spatiotemporal query. However, deductive capacities are needed for developing a system which is not usually present in traditional databases. One idea is to retain the advantages of each individual traditional approach while combining them to build and develop an efficient spatiotemporal information system. A case study is presented for building a spatiotemporal information system combining snapshots and events approaches which overcomes some of the problems associated with snapshots and events approaches on their own. This work examine the hybrid (snapshots + events) spatiotemporal modeling technique with generic implementation details.

 

 

  1. The Specifications of a Generic QoS Metamodel for Designing and Developing Good Quality Web Services

Wan Nurhayati WAN AB. RAHMAN, Faculty of Computer Science and IT, Universiti Putra Malaysia, 43400 UPM Serdang, Selangor, Malaysia & Farid MEZIANE, School of Computing, Science and Engineering, University of Salford, Greater Manchester, M5 4WT

 

Quality of Service (QoS) is crucial in the design of web services as it allows the development of good, efficient and usable web services. Unfortunately, current research on the QoS for web services is concentrated on service users and implementation. Consequently, web services still suffer from a lack of quality. Our research highlights the importance of incorporating QoS early in the design and development of web services. More precisely, we advocate the introduction of QoS as early as the specification phase and to be performed at the same time as the specification of functional requirements. A web service is based on serviceoriented architecture (SOA) and the basic technologies for web services standards include Simple Object Access Protocol (SOAP), Web Service Description Language (WSDL) and Universal Description, Discovery and Integration (UDDI). However, WSDL only describes the functional elements of a web service and yet QoS is significant for the web service description. Therefore, this paper proposes a lightweight extension to the WSDL through our generic QoS metamodel to incorporate QoS specifications. The main purpose of the QoS metamodel is to guide service providers to identify functionalities that could be extended and to determine suitable QoS specifications that could be applied together with their QoS dimensions. This paper begins with defining the required QoS specifications for the development of good quality web services and then explores the potential to use the Unified Modeling Language (UML) as a technique and notation to specify QoS. To properly integrate QoS in the design, we propose extensions to the existing UML QoS profile. The paper concludes with the evaluation of our proposed framework and summarises its merits.

 

 

  1. Privacy- preserving data mining in peer to peer networks

Ibrar Hussain & Marai Irakleous & Mubashir Ashraf Siddiqi & Mohamad Saraee from School of science & computing, university of Salford, Manchester, UK.

 

In recent years, privacy-preserving data mining has been studied extensively, due to the wide increase of sensitive information on the internet. A number of algorithms and procedures have been designed, some of which are yet to be implementeed, but a few of them are actually employed in the form of a software systems to preserve the privacy of users, and the content in peer-to-peer networks. Privacy issues are becoming widely recognized when using peer-to-peer networks. In this paper, we provide a review of the privacy-preserving data mining techniques used in order to overcome privacy issues.

We discuss methods of sanitization, data distortion, data hiding, cryptography and the data mining algorithm KDEC. Futher discussion involves data transfer using proxy techniques, creating social communieties among peer-to-peer users forming trusted peers. These techniques have shown lack of scalability and performance. We design a framework to perform a comparision study on the techniques shown above and present the results with some recommendations of how we think the issues could be unraveled.

 

 

  1. An experiment and outlook on revised hierarchical agglomerative clustering method

Chuen-Min Huang & Yi-Hua Li from departement of information management, National Yunlin University of science & technology

 

While the effectiveness of hierarchical agglomerative clustering method has been well-recognized the limitation in process on large data set makes it lose its superiority when efficiency issue is taken into serious consideration. In this study, we propose a Revised hierarchical agglomerative clustering (RHAC) method based on the notion of K-way to reduce tree height and time comprexity. The Latent Semantic Analysis (LSA) is used to improve the precision ration of clustering. Three major experiments including dimension reduction, average link distance, and precision comparision are conducted. Our study shows that the precision of RHAC is higher then 0,99 and the entropy, is less than 0,03. The effect of utilizing LSA on precision improvement of clusters is also positive. It is discoverd that the size of data sets size doesn't influence RHAC efficiency during the run time. The result show that the performance of RHAC is better than that of HAC.

 

 

  1. A Semantic Enhanced Data Mining Framework for Web Personalization

Sanjeev kumar Sharma, Research Scholar & Ugrasen Suman, associate Professor, Devi Ahilya university Takshashila, Campus,Khandwa Road, Indore (M.P.) India

 

Web personalization is the process of desining web interface according to specific user needs and taking advantage from user interaction and navigation habit to deliver need-based content and information. In integrates usage data with content, structure ao user profile data and enhances the results of personalization process. Therefore, we propose a standard data mining architectural framework that personalizes the contents of websites using either usage logs or web semantics. Its integrates usage data with content semantics. It also helps to generate useful recommendations of Web pages after computation of semantic enhanced navigational patterns.

 

 

  1. Computational Confidence for Decision Making in Health

Stephen Lean & Hans W. Guesgen & Kudakwashe Dube & Inga Hunter from School of Engineering and Advanced Technology, Massey University, Private Bag 11 222, Palmerston North, New Zealand

 

The New Zealand Health and Disability Sector must handle large volumes of information and complex information flows. Access and effective use of this information is needed for efficiency gains in this sector as a lack of appropriate information is costly, both in financial terms and in adverse outcomes for patients. However, this information must be used safely, especially in clinical decision making. Effective and safe use of information, has become a key driver for this

sector. For safe use, clinicians must have confidence in the veracity of the information they wish to use. This research aims to investigate what factors lead to confidence in health information. How these factors can be mapped and conceptualised into a model for confidence in health information and then a prototype system produced that implements this model and provides both a computational measure for confidence in health information and a representation of that measure. The paper given here describes a work in progress.

 

 

  1. A rule based taxonomy of dirty data

Lin Li & Taoxin Peng, Jessie Kennedy from Edinburgh Napler University, 10, Colinton Road, Edinburgh, EH10 5DT, UK.

 

There is a growing awarness that high quality data is a key to today's business success and that dirty data within data sources is one of the source of poor data quality. To ensure this high quality data, entreprises needs to have a process, methodologies and resources to monitor and analyze the quality of data, methodologier for preventong or/and detecting or repairing dirty data. Nevertheless, research shows that many entreprises do not pay adequate attention to the existence of dirty data and have not applied useful methodologies to ensure high quality data for their applicationd. One of the reasons is a lack of appreciation of the types and extent of dirty data. In practice, detecting and cleaning all the dirty data that exists in all data sources is quite expensive and unrealistic. The cost of cleaning dirty data needs to be considered for most of entreprises. This problem has not attracted enough attention from researches. In this paper, a rule-based taxonomy of dirty data is developed. The proposed taxonomy npt only provides a mechanism to deal with this problem but also includes more dirty data types than any of existing such taxonomy.

 

 

  1. Data Quality Enhancement through Mobile Systems in Asset Management Organisations

Dr. Jing Gao & Prof. Andy Koronios from School of Computer and Information Science, University of South Australia

 

The majority of data quality problems occur during the data entry stage. With improved system interfaces and business logics, organisations are able to verify and enhance the quality of data input electronically. However, paper-based data entry (and later conversion to digital forms) is still common in many industries (for example, field-force for utility companies). This paper investigates the issues of data input in an Australian utility company and develops a prototype system to demonstrate the use of mobile systems as a means to improve data quality.

 

 

  1. Cyber Infrastructure and Data Quality for Environmental pollution control – in Oman

Sanad Al-Maskari, Dinesh Kumar Saini and Wail M Omar, Faculty of Computing and Information Technology, Sohar University,P.O. Box: 44, P.C. 311, Sohar, Sultanate of Oman

 

The aim of this paper is to develop a highly innovative framework and set of services to enable streamlined access to a collection of real-time, near-real-time and static datasets acquired through pollution monitoring sensors and stations. This paper describes the pollution control management system and Web Portal that we are developing to enable the sharing and integration of the high quality data and models for pollution control resource managers. The interactive and dynamic reporting services will be established to enable knowledge exchange between controlling agencies and authorities in the region. The data collected through sensors need to be cleaned before we use in alert system.

 

 

Data Integration and Data mining Framework to Discover Health Impacts of Climate Change

Company Description : According to “Healthy Planet, Places and People” report prepared by Research Australia (2008) top ten health impacts of climate change are “Deaths during heatwaves – especially heart attacks and strokes; Asthma; Mosquito borne infectious diseases; Gastroenteritis and food poisoning; Mental health; Water-borne disease; Obesity, diabetes and cardiovascular disease; Indigenous health; Declining food yields, nutrition and health and Deteriorating global health”. Thus it becomes crucial that health and medical researchers understand how climate change and the changing environment influence these impacts in order to control, mitigate and combat them. This paper gives a framework in integrating climate, pollution and health indicator variables such as disease prevalence and occurrence data to discover the impact of climate change on health. Further it identifies the gaps in current data collection procedures and gives a recommendation on how it can be improved to facilitate such an integrated approach widely. Finally the framework is demonstrated using Asthma prevalence data and temperature attributed deaths and illustrates the limitations of the current data collection methods.

Product Type : Academic Conferences

Author : Liwan H. Liyanage, School of Computing and Mathematics, Universi

PDF 7p

Languages : English

A multilevel approach to interoperability in surveillance and reconnaissance

Company Description : In the domain of civil and military surveillance and reconnaissance sensors and exploitation systems of different producers are used to achieve an overall picture of a critical situation. In today’s multinational cooperation’s on security and peace keeping it is essential to be able to share data that is produced by one national asset with other systems or even other nations. Therefore interoperability has to be established between those various systems, since each of them is currently dealing with different metadata/data formats and interfaces. Within the multinational intelligence and surveillance project MAJIIC (Multi-Sensor Aerospace-Ground Joint ISR Interoperability Coalition) various standards have been developed enable sharing data. Those range from common data representation (e.g. imagery or radar data), metadata models and communication protocols to Coalition Shared Data (CSD) servers. The CSD servers provide a decentralized storage facility in which the standardized information is persisted and available to all participants through synchronization. Using standardized client interfaces relevant data can be found and retrieved from the storage facility. By the approach of standardization, many of the interoperability issues have been overcome on a data representation level, resulting in the ability to share the data. However, to be able to understand the data and translate it into information more work has to be done. The integration of the various systems into a single coherent approach needs to be continued on the process, semantic interpretation and pragmatic level in order to achieve full interoperability. The usage and semantic of the metadata has to be defined as well as user roles and responsibilities. Rules have to be established to enable the correct interpretation and validation of data. The paper describes the exercise based approach that is used in the project and reflects on the necessity of a multilevel approach to achieve interoperability.

Product Type : Academic Conferences

Author : Barbara Essendorfer, Fraunhofer IOSB, Fraunhoferstr. 1, 76131 Ka

PDF 8p

Languages : English

Proposed Semantics for the Sampled Values and Metadata of Scientific Values

Company Description : We propose semantics for the manipulation of sample values and metadata for scientific data. Our approach uses domain knowledge from an ontology; computational knowledge about dimensions, units, coordinate systems, etc; and cultural and linguistic norms. It recombines annotated values with specialized operators. This semantics does several types of dynamic error checking and results in “selfdocumented computation”.

Product Type : Academic Conferences

Author : Joseph Phillips, De Paul University, School of Computing and Dig

PDF 8p

Languages : English

The need for ICT induced Organizational Transformation among the Public Sector Banks in Sri L...

Company Description : This paper focuses the need for the organizational transformation in the public sector banks in Sri Lanka. Bank of Ceylon and People’s bank are the two major public commercial banks in Sri Lanka. The author investigated the need for the ICT induced organizational transformation in these banks as a case study. Further the five levels of ICT induced Organizational Transformation process identified by Venkatraman (1990) has been compared with the Sri Lankan public sector banks and recommended possible ICT strategies to be implemented in future in order to reduce the customer traffic and sustain in the competitive market.

Product Type : Academic Conferences

Author : Selvarajan, P., Faculty of Business Studies, Vavuniya Campus of

PDF 7p

Languages : English

Implementing and Evaluating Snapshots + Events Spatiotemporal Modelling Approach

Company Description : Traditional approaches for modelling spatiotemporal information (snapshots, states and events) are not very efficient and usually not capable of retrieving information based on specific spatiotemporal query. The snapshots modelling technique is the oldest and simplest approach and does not support any spatiotemporal query. The most current approach is the events modelling technique which allows data retrieval using spatiotemporal query. However, deductive capacities are needed for developing a system which is not usually present in traditional databases. One idea is to retain the advantages of each individual traditional approach while combining them to build and develop an efficient spatiotemporal information system. A case study is presented for building a spatiotemporal information system combining snapshots and events approaches which overcomes some of the problems associated with snapshots and events approaches on their own. This work examine the hybrid (snapshots + events) spatiotemporal modeling technique with generic implementation details.

Product Type : Academic Conferences

Author : Cristian Vidal, Universidad de Talca, Ingeniería Informática Emp

PDF 7p

Languages : English

The Specifications of a Generic QoS Metamodel for Designing and Developing Good Quality Web S...

Company Description : Quality of Service (QoS) is crucial in the design of web services as it allows the development of good, efficient and usable web services. Unfortunately, current research on the QoS for web services is concentrated on service users and implementation. Consequently, web services still suffer from a lack of quality. Our research highlights the importance of incorporating QoS early in the design and development of web services. More precisely, we advocate the introduction of QoS as early as the specification phase and to be performed at the same time as the specification of functional requirements. A web service is based on serviceoriented architecture (SOA) and the basic technologies for web services standards include Simple Object Access Protocol (SOAP), Web Service Description Language (WSDL) and Universal Description, Discovery and Integration (UDDI). However, WSDL only describes the functional elements of a web service and yet QoS is significant for the web service description. Therefore, this paper proposes a lightweight extension to the WSDL through our generic QoS metamodel to incorporate QoS specifications. The main purpose of the QoS metamodel is to guide service providers to identify functionalities that could be extended and to determine suitable QoS specifications that could be applied together with their QoS dimensions. This paper begins with defining the required QoS specifications for the development of good quality web services and then explores the potential to use the Unified Modeling Language (UML) as a technique and notation to specify QoS. To properly integrate QoS in the design, we propose extensions to the existing UML QoS profile. The paper concludes with the evaluation of our proposed framework and summarises its merits.

Product Type : Academic Conferences

Author : Wan Nurhayati WAN AB. RAHMAN, Faculty of Computer Science and IT

PDF 10p

Languages : English

Privacy- preserving data mining in peer to peer networks

Company Description : In recent years, privacy-preserving data mining has been studied extensively, due to the wide increase of sensitive information on the internet. A number of algorithms and procedures have been designed, some of which are yet to be implementeed, but a few of them are actually employed in the form of a software systems to preserve the privacy of users, and the content in peer-to-peer networks. Privacy issues are becoming widely recognized when using peer-to-peer networks. In this paper, we provide a review of the privacy-preserving data mining techniques used in order to overcome privacy issues. We discuss methods of sanitization, data distortion, data hiding, cryptography and the data mining algorithm KDEC. Futher discussion involves data transfer using proxy techniques, creating social communieties among peer-to-peer users forming trusted peers. These techniques have shown lack of scalability and performance. We design a framework to perform a comparision study on the techniques shown above and present the results with some recommendations of how we think the issues could be unraveled.

Product Type : Academic Conferences

Author : Ibrar Hussain & Marai Irakleous & Mubashir Ashraf Siddiqi & Moha

PDF 8p

Languages : English

An experiment and outlook on revised hierarchical agglomerative clustering method

Company Description : While the effectiveness of hierarchical agglomerative clustering method has been well-recognized the limitation in process on large data set makes it lose its superiority when efficiency issue is taken into serious consideration. In this study, we propose a Revised hierarchical agglomerative clustering (RHAC) method based on the notion of K-way to reduce tree height and time comprexity. The Latent Semantic Analysis (LSA) is used to improve the precision ration of clustering. Three major experiments including dimension reduction, average link distance, and precision comparision are conducted. Our study shows that the precision of RHAC is higher then 0,99 and the entropy, is less than 0,03. The effect of utilizing LSA on precision improvement of clusters is also positive. It is discoverd that the size of data sets size doesn't influence RHAC efficiency during the run time. The result show that the performance of RHAC is better than that of HAC.

Product Type : Academic Conferences

Author : Chuen-Min Huang & Yi-Hua Li from departement of information mana

PDF 7p

Languages : English

A Semantic Enhanced Data Mining Framework for Web Personalization

Company Description : Web personalization is the process of desining web interface according to specific user needs and taking advantage from user interaction and navigation habit to deliver need-based content and information. In integrates usage data with content, structure ao user profile data and enhances the results of personalization process. Therefore, we propose a standard data mining architectural framework that personalizes the contents of websites using either usage logs or web semantics. Its integrates usage data with content semantics. It also helps to generate useful recommendations of Web pages after computation of semantic enhanced navigational patterns.

Product Type : Academic Conferences

Author : Sanjeev kumar Sharma, Research Scholar & Ugrasen Suman, associat

PDF 9p

Languages : English

Computational Confidence for Decision Making in Health

Company Description : The New Zealand Health and Disability Sector must handle large volumes of information and complex information flows. Access and effective use of this information is needed for efficiency gains in this sector as a lack of appropriate information is costly, both in financial terms and in adverse outcomes for patients. However, this information must be used safely, especially in clinical decision making. Effective and safe use of information, has become a key driver for this sector. For safe use, clinicians must have confidence in the veracity of the information they wish to use. This research aims to investigate what factors lead to confidence in health information. How these factors can be mapped and conceptualised into a model for confidence in health information and then a prototype system produced that implements this model and provides both a computational measure for confidence in health information and a representation of that measure. The paper given here describes a work in progress.

Product Type : Academic Conferences

Author : Stephen Lean & Hans W. Guesgen & Kudakwashe Dube & Inga Hunter f

PDF 7p

Languages : English

A rule based taxonomy of dirty data

Company Description : There is a growing awarness that high quality data is a key to today's business success and that dirty data within data sources is one of the source of poor data quality. To ensure this high quality data, entreprises needs to have a process, methodologies and resources to monitor and analyze the quality of data, methodologier for preventong or/and detecting or repairing dirty data. Nevertheless, research shows that many entreprises do not pay adequate attention to the existence of dirty data and have not applied useful methodologies to ensure high quality data for their applicationd. One of the reasons is a lack of appreciation of the types and extent of dirty data. In practice, detecting and cleaning all the dirty data that exists in all data sources is quite expensive and unrealistic. The cost of cleaning dirty data needs to be considered for most of entreprises. This problem has not attracted enough attention from researches. In this paper, a rule-based taxonomy of dirty data is developed. The proposed taxonomy npt only provides a mechanism to deal with this problem but also includes more dirty data types than any of existing such taxonomy.

Product Type : Academic Conferences

Author : Lin Li & Taoxin Peng, Jessie Kennedy from Edinburgh Napler Unive

PDF 8p

Languages : English

Data Quality Enhancement through Mobile Systems in Asset Management Organisations

Company Description : The majority of data quality problems occur during the data entry stage. With improved system interfaces and business logics, organisations are able to verify and enhance the quality of data input electronically. However, paper-based data entry (and later conversion to digital forms) is still common in many industries (for example, field-force for utility companies). This paper investigates the issues of data input in an Australian utility company and develops a prototype system to demonstrate the use of mobile systems as a means to improve data quality.

Product Type : Academic Conferences

Author : Dr. Jing Gao & Prof. Andy Koronios from School of Computer and I

PDF 6p

Languages : English

Cyber Infrastructure and Data Quality for Environmental pollution control – in Oman

Company Description : The aim of this paper is to develop a highly innovative framework and set of services to enable streamlined access to a collection of real-time, near-real-time and static datasets acquired through pollution monitoring sensors and stations. This paper describes the pollution control management system and Web Portal that we are developing to enable the sharing and integration of the high quality data and models for pollution control resource managers. The interactive and dynamic reporting services will be established to enable knowledge exchange between controlling agencies and authorities in the region. The data collected through sensors need to be cleaned before we use in alert system.

Product Type : Academic Conferences

Author : Sanad Al-Maskari, Dinesh Kumar Saini and Wail M Omar, Faculty of

PDF 8p

Languages : English

Organizer : Global Science & Technology Forum

GSTF provides a global intellectual platform for top notch academics and industry professionals to actively interact and share their groundbreaking research achievements. GSTF is dedicated to promoting research and development and offers an inter-disciplinary intellectual platform for leading scientists, researchers, academics and industry professionals across Asia Pacific to actively consult, network and collaborate with their counterparts across the globe.