By : Global Science & Technology Forum
Date : 2011
Location : Singapore / Singapore
The ADPC conference presented new ideas and system technology for Distribution and concurrency in computer science and technology
parallel algorithms; path planning; GPGPU; robotics;CUDA; OpenMP; Load Balancing; Clusters; Auto Configuration; B+- tree; XBOOLE; ternary vector; parallel; message passing interface; unate SAT problems, mapping heuristics; GPU; dynamic scheduling;distributed system; independent tasks; task redistribution, market-orientated computing, resource reservation, risk assessment, utility computing, MST,
algorithm ,computing ,parallel ,replication ,system ,nodes ,resource ,distributed ,figure ,cluster ,component ,processing ,scheduling ,based ,number ,performance ,memory ,algorithms ,approach ,table
Distribution and concurrency are pervasive in computer science and technology, from the simple PC, already multi-core, all the way up to supercomputing grids, and also graphic cards, PC clusters, distributed databases, and supercomputers. Distributed and parallel computing is also urged by the increasing computational requirements of various applications.
Proceedings from The Advances in Distributed and Parallel Computing (ADPC 2010) conference.
Abstracts of papers included in The Advances in Distributed and Parallel Computing (ADPC 2010) conference proceedings.
Company Description : Many applications in distributed computing systems,such as IP telephony, teleconferencing, collaborative workspaces, interactive chats and multi-user games, involve dynamic peer groups. In order to secure communications in dynamic peer groups, group key agreement protocols are needed. In this paper, we come up with a new group key agreement protocol, composed of a basic protocol and a dynamic protocol, for large-scale dynamic peer groups. Our protocols are natural extensions of one round tripartite Diffie-Hellman key agreement protocol.
Company Description : Utilization of idle capacities of the computing nodes has seen continuous built of research interest over the past one decade and the corporate world finds it even more promising. Among the various delivery models being pursued by the industry is the cluster computing model that may be trusted or open access system. This paradigm strives to provide many of the distributed computing benefits like high availability, manageability, scalability and dependability. To make these features persistent over long period of time, environment must be reliable and fault tolerant enabling the computing nodes to join and leave the cluster easily. Therefore, an efficient mechanism for configuration of clusters to attain high availability, scalability and fault tolerance is required. This paper extends the Jingle-Mingle model for load balancing  to a B+-tree based architectural framework of self organizing clusters to attain load balancing while also ensuring failure recovery and fault tolerance. We propose various operations for the proposed self organizing clusters that ensure proper mix of under loaded and overloaded nodes. It also updates the clusters according to their churn ratio. The experimental results prove that the average cost of network restructuring with our proposed framework is considerably lower. We also observe that although the number of nodes in clusters vary, the cost of load balancing operation is O(1).
Company Description : The flurry of activities in the area of regenerating codes in the past few years have led to different flavors of codes, targeting diverse applications. There are regenerating codes which minimize the repair bandwidth, i.e. the amount of data required to restore the system after a node failure. A failed node can be replaced by its exact replica or a functional equivalent. The codes which facilitate the former, referred to as exact regeneration codes, are practically favorable. However, the existing codes for this purpose involve complex computations over finite field. In this paper, we propose ExR, a reduced complexity scheme for exact regeneration of a failed node in a distributed storage system. The primary aim is to minimize the repair bandwidth which in turn can be useful in applications like Internet-wide peer-topeer backup systems and distributed mail servers. Additionally, the entire file can be reconstructed using computationally efficient XOR operations. The proposed scheme involves two parity nodes and achieves faster regeneration due to low processing and communication overheads.
Company Description : Among various scheduling algorithms, duplication based scheduling algorithms provide superior performance than others. General scheduling algorithms consist of task prioritizing phase and processor selection phase. In this paper, we propose a noveltask prioritizing algorithm, called WPD, which is specialized for duplication based scheduling algorithms. A primary goal of WPD is to minimize the schedule length of applications. The performance of WPD has been observedby applying it to processor selection phase of conventional scheduling algorithm. The comparison results show that WPD provides better performance than algorithms with conventional task prioritizing phase.
Company Description : In this paper, a parallel programming algorithm for DNA sequence alignment on multi-core processor architectures is proposed. We discuss the issues involved in implementation of aligning genome sequences using global sequence alignment algorithm and parallelizing them using tiling and OpenMP on multicore architecture. Sequence alignment is a fundamental instrument in Bioinformatics. In recent years, numerous proposals have been addressed the problem of accelerating this class of applications. This is due to the rapid growth of sequence databases in combination with the high computational demands imposed by the algorithms. In this paper we focus on the analysis of the alignment in global algorithm, a widely used program for performing multiple sequence alignment. We have parallelized global algorithm on multi-core architecture and have carefully analyzed the scalability of its different phases with both the number of cores used and the input size. Tiling is a well-known method for parallelization and for improving data locality. An efficient approach is proposed for the selection of a suitable tile size corresponding to the architecture on which the application is running, so that the total time required for complete execution is reduced. A fine grain parallel processing approach is employed for good scalability, and the OpenMP library is used for enhanced portability. Our experimental results illustrate that by selecting appropriate tile size for different architecture and for different input size, total parallel execution time is significantly reduced. We have performed testing of several examples and achieved significant speedup. The developed computing system is implemented on the Pcbased Linux cluster with different multi-core architecture. The comparison shows that the OpenMP tasking model provides expressiveness, flexibility, and huge potential for performance and scalability.
Company Description : Due to the fast development of internet services and a huge amount of network traffic, it is becoming an essential issue to reduce World Wide Web user-perceived latency. Although web performance is improved by caching, the benefit of caches is limited. To further reduce the retrieval latency, web prefetching becomes an attractive solution to this problem. In this work, we introduce a simple and transparent popularity-based prefetching algorithm which combines both the top 10 and next-n prefetching approaches. In addition to using access-frequency as the criteria for prefetching, we also use the time of access of web documents to generate the top 10 list. This approach of using accessfrequency and time of access is known as the GDSP approach, which has been used in cache management. Instead of generating next-n list for all the documents accessed by the users, we log the next-n documents for the top 10 documents only, thus reducing complexity and overhead. The results obtained from algorithm, in terms of hit rate and prefetching effectiveness show the efficacy of our proposed algorithm as compared to other approaches.
Company Description : Switchbox routing is a type of problems arising in the detailed routing phase of VLSI physical design automation. A switchbox is a rectangular area and its boundary contains terminals; each terminal belongs to a specific net. A switchbox router can connect all the terminals belonging to the same net and the router must complete the connection of each net. It has been proven that a switchbox routing problem containing multiple terminals and multiple nets belongs to the class of Npcomplete. In this article, we present a constraint programming (CP) formulation and a mixed integer linear programming (MILP) formulation to switchbox routing problems. Therefore, CP solvers and MILP solvers can be used to find solutions. Experimental results show that more than 23X speed-up can be achieved by using a parallel MILP solver with 16 threads. The execution time can be further reduced when a computer containing more processor cores is available.
Company Description : In a Multicore architecture, each package consists of large number of processors. This increase in processor core brings new evolution in parallel computing. Besides enormous performance enhancement, this multicore package injects lot of challenges and opportunities on the operating system scheduling point of view. We know that multiagent system is concerned with the development and analysis of optimization problems. The main objective of multiagent system is to invent some methodologies that make the developer to build complex systems that can be used to solve sophisticated problems. This is difficult for an individual agent to solve. In this paper we combine the AMAS theory of multiagent system with the scheduler of operating system to develop a new process scheduling algorithm for multicore architecture. This multiagent based scheduling algorithm promises in minimizing the average waiting time of the processes in the centralized queue and also reduces the task of the scheduler. We actually modified and simulated the linux 2.6.11 kernel process scheduler to incorporate the multiagent system concept. The comparison is made for different number of cores with multiple combinations of process and the results are shown for average waiting time Vs number of cores in the centralized queue.
Company Description : This paper presents a new fully programmable SIMD architecture for image processing using cellular automata (CA). The main benefit is a high processing speed coupled with low power consumption and small chip sizes. In many industrial applications, image processing is mostly carried out by standard PC hardware. Especially in an embedded environment such systems are not applicable because of their high power demands and physical dimensions, resulting in an extraordinary TCO. Instead of this, a smart camera solution featuring an in-camera ASIC could solve this challenge by performing not only frame grabbing, but also image (pre-)processing. Our programmable architecture, called ParCA, is able to accomplish simple image preprocessing tasks at approximately 390 fps for a one megapixel image. By using more complex algorithms ParCA.
Company Description : Data replication among different sites is viewed as an important technique to improve the ap-plication performance and its data availability and reliability and minimization of the cost of data transmission. In this paper, we proposed an architecture and algorithm of heterogeneous persistence layer for synchronous replication based on multithreading technique. In this archi-tecture and algorithm, we developed the multithreading based persistence layer. Our objective is to make the persistence layer more adaptive. In this heterogeneous persistency system the replication server will not depend on the main server, so forth, adding a new replication server will be easier than ever, easy to cope up with heterogeneous system, cost minimizing and finally there will be no down time.
Company Description : This paper aims at better possibilities to solve problems with exponential complexity. Our special focus is on the combination of using four cores of a standard PC together with better models in the application domain. As example we selected the unate covering problem, which must be solved, among others, in the process of circuit synthesis and for graph covering (domination) problems. We introduce this problem, explain the classical solution, and evaluate it by using a benchmark set. Subsequently we study sources of parallelism in the application domain and explore sources of improvements given by the parallel utilization of the available four cores of a PC. Starting with a uniform division of the problem, we suggest improvements by means of an adaptive division and an intelligent master. Our experimental results confirm that the combination of improvements in the application domain and in the algorithmic domain lead to a remarkable speedup and an overall improvement factor of more than 35 millions in comparison to the improved basic approach.
Company Description : The emerging Cloud and Grid computing have become the de facto distributed computation. These environments are dynamic in the sense that the arrivals of user requests are unpredictable and due to this versatility, traditional scheduling algorithms fail to match user requests to the most feasible processing elements because no prior knowledge about the requests is available until they arrive at the schedulers. The situation is getting worse particularly in a large scaled system, in which the single scheduler is often the bottleneck that the queue is congested with a large amount of user requests. One of the solutions to this problem is to use multiple schedulers. However, the scheduling complexity can rise dramatically since a scheduler must consult others and maintain a global state of all processing elements in order to generate a feasible plan. Sometimes the imposed costs of synchronizing the schedulers and processing elements can turn into the major cause of system degradation. In this research, we take the resource competition approach in which the schedulers compete for the resources without acknowledging other competitors and the processing elements accept only the first request arrives and reject the others. We conduct thorough simulations and the results are positive.
Company Description : The coming computing service transformation into the cloud is a major change in industry. Cloud services do offer on demand computing as a utility that is warranted to reduce costs and improve service quality levels. However, there are concerns related to privacy, security, control and governance of the cloud. With the main goal of addressing these risks, this paper provides deep insights and practical guidelines for cloud architects, service providers and consumers for a smooth migration into the cloud.
Company Description : Grid, cluster and cloud computing provide the opportunity for resources to be traded as a commodity in a open marketplace. An options market for computing resources would allow users to reserve a resource for a fee, and then pay an additional fee later should they actually need to use it. However, a major issue is ensuring that users do not falsify their likely requirements with the objective of reducing costs while keeping their right to use the resource. This paper describes a computer simulation of a two-period model that was proposed by Wu, Zhang and Huberman in 2008, which they claimed promoted truth-telling among the resource buyers population. Wu et al. provided a theoretical description and analysis of their model, but presented no empirical results. Our work, reported in this paper, allows for detailed exploration of less abstract, and more heterogeneous, instantiations of their model. We explore the Wu et al. model via techniques similar to replicator dynamics used in studies of evolutionary processes. A discussion on our empirical findings, and how they can be utilised in a commercial offering of the model, is included.
Company Description : Grid computing can be used to aggregate physical and logical resources for executing compute-intensive applications. However, due to the heterogeneous and dynamic nature of computational Grid systems, efficient resource discovery is a very challenging issue. Agents can be used to enhance resource discovery processes, since they are distributed in nature and autonomous and intelligent in behavior. Furthermore, agents can interact with neighboring agents based on local knowledge and move along predetermined routes autonomously on behalf of a user. Ontology can provide benefits in a resource discovery mechanism. These benefits include enhancement of interoperability, quality of information, accuracy and efficiency. In this paper, we propose a comprehensive solution for a decentralized resource discovery process by improving the ontological framework. This proposed work is based on extending the combined semantic and agent approaches. The aim of this research is to extend and develop the decentralized agent and semantic-based resource discovery process to improve the performance and maximize the utilization of resources in a computational Grid. Preliminary simulations carried out using the proposed approach show more tasks being accepted for complex jobs. Consequently, this approach could be useful in increasing the utilization of resources in a Grid Computing environment.
Company Description : Computational Grid enables the aggregation and sharing of geographically distributed computational resources for solving various large-scale applications. Due to the widespread use of resources, systems are highly prone to errors and failures. Hence fault tolerance plays a key role in grid to avoid the problem of unreliability. The two main techniques for implementing fault tolerance in grid environment are check pointing and replication. In this paper, a survey of the different replication techniques is presented. The goal of replication is to ensure at least one replica is always able to continue and complete the computation in the event of failure of other resources. This paper describes the different job replication techniques that have been proposed to improve the fault tolerance in the grid environment. This study will help the researchers who are undertaking their research in improving the fault tolerance in grid, to analyze and compare different job replication techniques with their own work.
Company Description : There are two prevalent strategies for building a high performance HTTP engine, threads and events. In this paper, we introduce one type of non-blocking HTTP engine model which bases on the Stage Event-driven Architecture and benefits from both threads and events. In particular, as part of the real non-blocking mechanism, a state-machine-based HTTP asynchronous parsing algorithm is proposed, and zero data copying and single traversal parsing are achieved. Moreover, optimization techniques regarding lock contention and memory allocation are implemented. Testing results show the performance in concurrency and scalability of the engine.
Company Description : In this paper, we present a GPU-based sorting algorithm, GPUMemSort, which achieves high performance in sorting large-scale in-memory data by exploiting high-parallel GPU processors. It consists of two algorithms: an in-core algorithm, which is responsible for sorting data in GPU global memory efficiently, and an out-of-core algorithm, which is responsible for dividing large-scale data into multiple chunks that fit GPU global memory. GPUMemSort is implemented based on NVIDIA’s CUDA framework, and some critical and detailed optimization methods are also presented. The tests of different algorithms have been run on multiple data sets. The experimental results show that our in-core sorting can outperform other comparison-based algorithms and GPUMemSort is highly effective in sorting large-scale in-memory data.
Company Description : Workflow has been widely used in office automation and E-commerce. While the gap between business domain and workflow specification has increased the difficulty of adopting business workflow. Workflow modeling tool was developed and used in order to reduce the workflow model developing complexity. Multi-sectoral, cross-boundary collaboration has become an unessential node in modern enterprise. But traditional workflow modeling tools only support single user. How to enable multiple departments participating in, professional modeling and business people work together as an important work in workflow management system. In this paper, we introduced an open-source tool for workflow modeling, called Orchestra Designer. It solves above problems by isolating domain tasks from workflow specification and provides a service-oriented collaboration bus. Meanwhile, Orchestra Designer supports the design of cross business workflow and creates BPEL document by it.
Company Description : In robot systems several computationally intensive tasks can be found, with path planning being one of them. Especially in dynamically changing environments, it is difficult to meet real-time constraints with a serial processing approach. For those systems employing standard computers, a viable option is to use a GPGPU as a coprocessor in order to offload those tasks which can be efficiently parallelized. We present implementation results for selected parallel path planning algorithms, taking representatives of every class (graphs, potential fields and iterative), on NVIDIA’s CUDA platform compared to implementations on a standard multicore system using OpenMP. There is no best algorithm for every use case, but one of the iterative methods, the marching pixels solution, seems to be a good choice.
Company Description : Optimally mapping diverse independent tasks onto machines in a distributed system, in general, is an NP hard problem. Further, distributed systems augmented with GPUs are quite different from those without GPUs. In order to better utilize machines with both CPUs and GPUs, dynamic load balancing has been introduced and it can significantly affect the scheduling performance of such distributed systems. In this paper, we proposed four heuristics, adapted them into a set of common assumptions, added dynamic load balancing, and compared their testing results based on the simulation experiments. The performance of these heuristics, as well as the effect of task redistribution, was carefully studied. MCT with task redistribution out-performs other algorithms. Task redistribution can significantly improve performance, especially when the speed of CPU and GPU is close to each other.
Company Description : We propose a fast data parallel minimum spanning tree algorithm designed for general purpose computation graphical processing units (GPU). Our algorithm is a data parallel version of Boru˙ vka’s minimum spanning tree algorithm. Its gist is a synchronization on the central processing unit after each of the parallel iterations computing the components and their outgoing edge minimum weight. Our implementation uses both BrookGPU and CUDA from NVIDIA as programming environments and the performance of our algorithm was evaluated in comparison with the state-of-theart algorithms on different types of datasets. The experimental results show that our algorithm outperforms other algorithms substantially in terms of execution time with up to tenfold speedup.
Company Description : With increasing application requirements of pervasive computing, the software component dynamic behaviour and its compatibility analysis are two important issues in middleware dynamic adaptation. In this article, we firstly present an adaptive middleware architecture called ScudWare for a smart vehicle space. Then a semantic component model of this middleware is given in detail. Next, for ScudWare middleware, we propose a semantic component dynamic behavior formalization and component behavior compatibility verification based on the higher-order calculus. After that, a case study is given to evaluate our model and methods. Finally, we draw a conclusion and give our next work.
Company Description : Query processing in relational database involves two parts: query execution and query optimization. Usually query execution takes more time than query optimization. The Most expensive part of query execution is Joins. Joins are statements that retrieve data from more than one table. A Join is characterized by multiple tables in the FROM clause, and the relationship between the tables is defined through the existence of a Join condition in the WHERE clause. In the case of Very large databases and highly normalized databases frequency of Join queries are high. To perform the Join Query much efficiently, the Join Method the optimizer selects are very vital. Nested Loop Join, Hash Join and Merge Sort Join are primary Join methods available to join tables. The aim of this paper is to introduce the traditional join techniques and to compare the cost of these techniques. This paper also covers some of the recent advancements in the Join Techniques. Join operations can be improved in two ways one in controlling the way in which tuples are stored ie., the physical storage of data such as partitioning the table, composite index, Join index, clustering of data tuples etc.,. Another way is to improve the technique used to Join the relation. A small change to the technique improves the performance of Join dramatically. This paper surveys the existing blocking Join techniques.
Company Description : Now a days, great emphasis is being laid on harnessing locally available resources of energy like low water heads, biogas, wind etc. This has led to a lot of research activities in the field of isolated power generation. The use of an induction machine as a capacitor self excited induction generator (SEIG) is now a days being favoured in such applications. In this paper a new and simple method has been proposed to compute the minimum excitation capacitor requirement for SEIG.
Company Description : Nowadays, General Purpose Computing on GPUs (GPGPU) accelerates many industrial and scientific applications in the high performance computing (HPC) domain. Recently, GPU vendors, such as Nvidia and AMD, promoted the utilization of GPUs in embedded systems. First vendors in the automotive sector will use the computation power of GPUs for their driver assistance systems. In these systems energy constraints are omnipresent. This paper firstly discusses the theoretical background of an energy aware embedded system design including a GPGPU capable graphics chip. In order to support these theoretical considerations, secondly an energy and runtime evaluation of alow power GPU/CPU system is presented. We demonstrate that a profitable GPU integration, seen from an energy perspective, strongly depends on the structure and the features of an application such as a high parallelizability and the utilization level of the graphics card. The evaluation of several real world benchmarks shows that increasing the system’s power consumption by integrating a GPU can lead to a reduced overall energy consumption of a system.
Company Description : With a high accuracy, the Smith-Waterman local sequence alignment algorithm requires a very large amount of memory and computation, making implementations on common computing systems become less practical. In this paper, we present swGPUCluster an implementation of the Smith- Waterman algorithm on a cluster equipped with NVIDIA GPU graphics cards (called a GPU cluster) . Our test was performed on a cluster of two nodes, one node is equipped with a dual graphics card NVIDIA GeForce GTX 295, a Tesla C1060 card, and the remaining node is equipped with 2 dual graphics cards NVIDIA GeForce GTX 295. Results show that the performance has increased significantly compared with the previous best implementations such as SWPS3 or CUDASW++. The performance of swGPUCluster has increased along with the lengths of query sequences, from 37.328 GCUPS to 46.706 GCUPS. These results demonstrate the great computing power of graphics cards and their high applicability in solving bioinformatics problems.
Organizer : Global Science & Technology ForumGSTF provides a global intellectual platform for top notch academics and industry professionals to actively interact and share their groundbreaking research achievements. GSTF is dedicated to promoting research and development and offers an inter-disciplinary intellectual platform for leading scientists, researchers, academics and industry professionals across Asia Pacific to actively consult, network and collaborate with their counterparts across the globe.