dispatched sequence of the kth vehicle. S: the total cost of one solution.
W. Peng and C.-Y. Zhou
Fig. 1. An example of VRP
4 Ant Colony for VRP In ant colony model, an individual ant simulates a vehicle, and its route is constructed by incrementally selecting customers until all customers have been visited. The customers, who were already visited by an ant or violated its capacity constraints, are stored in the infeasible customer list (tabu). Graph Representation To apply ant colony, the underlying problem should be represented in terms of a directed graph, G=
Solving Vehicle Routing Problem Using Ant Colony and Genetic Algorithm
27
Node Transition Rule The node transition rule is a probabilistic one determined by the pheromone intensity τ ij and the visibility value ηij of the corresponding edge. In the proposed method,
τ ij
is equally initialized to any small constant positive value, and is gradually
updated at the end of each cycle according to the average quality of the solution that involves this edge. On the other hand, the value of ηij is determined by a greedy heuristic method, which encourages the ants to walk to the minimal S edge. We now define the transition probability from node i to node j at time t as ⎧ [τ ij (t )]α [ηij ]β ⎪ α β ⎪ pij (t) = ⎨ [τij (t)] [ηij ] tabuk ⎪ ⎪⎩0
if j ∈tabuk
∑
(9)
otherwise
where tabuk are the accessible nodes by walking ants, and the means of other symbols are same to the Eq. (1). tabuk must satisfy the Eq. (5), (6), (7). Pheromone Updating Rule The intensity of pheromone trails of an edge is updated at the end of each cycle by the average quality of the solutions that traverse along this edge. We simply apply and modify Eqs. (2) and (3) to update pheromone intensity. m
τ ij ← ρ .τ ij + ∑ Δτ ijk
(10)
t =1
⎧Q ⎪ Δτ = ⎨Sk ⎪0 ⎩ k ij
if thekth ant walksedge(i, j) (11) otherwise
where Sk is the distance of the kth ant at current cycle.
5 Improved Ant Colony When the pheromone intensity of one route is higher heavily than other routes, ant colony model will be premature and can’t find the most optimal solution. To overcome this drawback, we improve the ant colony model through introducing genetic algorithm. The purpose of introducing genetic algorithm is improving the weak local exploitation capability. In one cycle, the optimal solutions after applying ant colony are selected to run into genetic algorithm. One solution is represented an individual with a string of customers representing a route to be served by only one vehicle. For example, in the dispatched task of six customers {0, 1, 2, 7, 3, 4, 8, 5, 6} can be a solution, where the numbers bigger than six (or M) are separators. This solution is
28
W. Peng and C.-Y. Zhou
explained as 0Æ1Æ2Æ0, 0Æ3Æ4Æ0, 0Æ5Æ6Æ0. After defining the chromosome, selection, crossover and mutation is applied to jump out the local solution. In the crossover, the initial position and crossover length are generated randomly, the crossover method can be described as followed. Given s1: P1|P2|P3, s2: Q1|Q2|Q3 P2 and Q2 are crossover section. Q2 is inserted into s1 before P2, and s3: P1| Q2|P2|P3 is got. Then the duplicate numbers are deleted from s3 and obtain one children individual. Another is got by the same method.
6 Experimental Results The proposed algorithm has been programmed in VC++ language, and run in Windows XP. For comparison of the results, we implement the algorithm in Ref. [10]. Table 1 shows the distances of customers, and there are eight customers and Qk=8T, Dk=40km k=1,…,K. Table 2 is the requirements of each customers. Do experiments independently five times and the results of Ref. [10] and our algorithm are shown in Table 3. Table 1. Distances between customers
i j 0 1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
0 4 6 7.5 9 20 10 16 8
4 0 6.5 4 10 5 7.5 11 10
6 6.5 0 7.5 10 10 7.5 7.5 7.5
7.5 4 7.5 0 10 5 9 9 15
9 10 10 10 0 10 7.5 7.5 10
20 5 10 5 10 0 7 9 7.5
10 7.5 7.5 9 7.5 7 0 7 10
16 11 7.5 9 7.5 9 7 0 10
8 10 7.5 15 10 7.5 10 10 0
6 4
7 2
8 2
Table 2. Requirement of customer
Customer ID requirement
1 1
2 2
3 1
4 2
5 1
Table 3. Comparison between Ref. [10] and our method
Index 1 2 3 4 5 average
Ref.[10] 72.5 76 67.5 72 73.5 72.3
Our 67.5 67.5 67.5 67.5 67.5 67.5
Details of our results 0Æ6Æ7Æ4Æ0 / 0Æ1Æ3Æ5Æ8Æ2Æ0 0Æ6Æ7Æ4Æ0 / 0Æ1Æ3Æ5Æ8Æ2Æ0 0Æ1Æ3Æ5Æ8Æ2Æ0 / 0Æ6Æ7Æ4Æ0 0Æ6Æ7Æ4Æ0 / 0Æ1Æ3Æ5Æ8Æ2Æ0 0Æ6Æ7Æ4Æ0 / 0Æ1Æ3Æ5Æ8Æ2Æ0
Solving Vehicle Routing Problem Using Ant Colony and Genetic Algorithm
29
Table 4. Coordinates and requirements of others customers
ID
X coordinat e 12.8 18.4 15.4 18.9 15.5 3.9 10.6 8.6 12.5 13.8
1 2 3 4 5 6 7 8 9 10
Y coordinat e 8.5 3.4 16.6 15.2 11.6 10.6 7.6 8.4 2.1 5.2
requirem ent
ID
0.1 0.4 1.2 1.5 0.8 1.3 1.7 0.6 1.2 0.4
11 12 13 14 15 16 17 18 19 20
X coordinat e 6.7 14.8 1.8 17.1 7.4 0.2 11.9 13.2 6.4 9.6
Y coordinat e 16.9 2.6 8.7 11.0 1.0 2.8 19.8 15.1 5.6 14.8
require ment 0.9 1.3 1.3 1.9 1.7 1.1 1.5 1.6 1.7 1.5
Fig. 2. Results of our method Table 5. Details of our result
Index 1
distance(km) 109.627
2
110.187
3
109.139
4
109.627
5
107.84
Average distance: 109.284
Details 0Æ5Æ14Æ2Æ12Æ9Æ10Æ7Æ0 0Æ1Æ8Æ19Æ15Æ16Æ13Æ6Æ0/0Æ4Æ0 0Æ18Æ20Æ11Æ17Æ3Æ0 0Æ6Æ13Æ16Æ15Æ19Æ8Æ1Æ0/0Æ4Æ0 0Æ18Æ3Æ17Æ11Æ20Æ0 0Æ5Æ14Æ2Æ12Æ9Æ10Æ7Æ0 0Æ18Æ0/0Æ5Æ14Æ2Æ12Æ9Æ10Æ1Æ7Æ0 0Æ8Æ19Æ15Æ16Æ13Æ6Æ0 0Æ4Æ3Æ17Æ11Æ20Æ0 0Æ18Æ20Æ11Æ17Æ3Æ0 0Æ6Æ13Æ16Æ15Æ19Æ8Æ1Æ0/0Æ4Æ0 0Æ5Æ14Æ2Æ12Æ9Æ10Æ7Æ0 0Æ4Æ3Æ17Æ11Æ20Æ0 0Æ8Æ19Æ15Æ16Æ13Æ6Æ0 0Æ5Æ14Æ2Æ12Æ9Æ10Æ7Æ1Æ0/0Æ18Æ0
30
W. Peng and C.-Y. Zhou
Then we apply the presented algorithm to resolve real vehicle routing problem, and the parameters are Qk=8T, Dk=50km k=1,…,5. The coordinate of the central depot is (14.5km, 13km), and the coordinates and requirements of others customers are in Table 4. Fig. 2 and Table 5 are both our results.
7 Conclusion We have devised a hybrid approach of integrating both ant colony and GA such that both their respective intensifying and diversifying process are exploited and integrated. The experimental results show that it’s feasible and successful to do in this way. However, there should be further research on how to implement the VRP with multiple central supply depot or deadline constraints in this way.
References 1. Dorigo, M.: Optimization, learning and natural algorithms. Ph.D. Thesis, Italy (1992) 2. Huang, K., Liao, C.: Ant colony optimization combined with taboo search for the job shop scheduling problem. Computers and Operations Research 35, 1030–1046 (2008) 3. Jian, S., Jian, S., Lin, B.M.T., Hsiao, T.: Ant colony optimization for the cell assignment problem in PCS networks. Computer and Operations Research 33, 1731–1740 (2006) 4. McMullen, P.R.: An ant colony optimization approach to addressing a JIT sequencing problem with multiple objectives. Artificial Intelligence 15(3), 309–317 (2001) 5. Bell, J.E., McMullen, P.R.: Ant Colony Optimization Techniques for the Vehicle Routing Problem. Advanced Engineering Informatics 1(8), 41–48 (2004) 6. Tavakkoli-Moghaddam, R., Safaei, N., Gholipour, Y.: A hybrid simulated annealing for capacitated vehicle routing problems with the independent route length. Applied Mathematics and Computation 176, 445–454 (2006) 7. Prins, C.: A simple and effective evolutionary algorithm for the vehicle routing problem. Computers & Operations Research 31, 1985–2002 (2004) 8. Brandao, J., Mercer, A.: A Tabu Search Algorithm for the Multi-Trip Vehicle Routing and Scheduling Problem. European Journal of Operational Research 100, 180–191 (1997) 9. Doerner, K.F., Hartl, R.F., Kiechle, G., Lucka, M., Reimann, M.: Parallel Ant Systems for the Capacitated Vehicle Routing Problem. In: Gottlieb, J., Raidl, G.R. (eds.) EvoCOP 2004. LNCS, vol. 3004, pp. 72–83. Springer, Heidelberg (2004) 10. Liu, L., Zhu, J.: The Research of Optimizing Physical Distribution Routing Based on Genetic Algorithm. Computer Engineering and Application 27, 227–229 (2005)
A Research on the Association of Pavement Surface Damages Using Data Mining Ching-Tsung Hung1, Jia-Ray Chang2, Jian-Da Chen3, Chien-Cheng Chou4, and Shih-Huang Chen5 1
Assistant Professor, Department of Transportation Technology and Supply Chain Management, Kainan University [emailprotected] 2 Associate Professor, Department of Civil Engineering, Minghsin University of Science and Technology [emailprotected] 3 Ph.D. Candidate, Department of Civil Engineering, National Central University [emailprotected] 4 Assistant Professor, Department of Civil Engineering, National Central University [emailprotected] 5 Assistant Professor, Department of Traffic Engineering and Management, Feng Chia University [emailprotected]
Abstract. The association of pavement surface damages used to rely on the judgments of the experts. However, with the accumulation of data in the pavement surface maintenance database and the improvement of Data Mining, there are more and more methods available to explore the association of pavement surface damages. This research adopts Apriori algorithm to conduct association analysis on pavement surface damages. From the experience of experts, it has been believed that the association of road damages is complicated. However, through case studies, it has been found that pavement surface damages are caused among longitudinal cracking, alligator cracking and pen-holes, and they are unidirectional influence. In addition, with the help of association rules, it has been learned that, in pavement surface preventative maintenance, the top priority should be the repair of longitudinal cracking and alligator cracking, which can greatly reduce the occurrence of pen-holes and the risk of state compensations.
1 Introduction In the past, the pavement distresses only determined that what is the reason or how to repair pavement. It didn’t take the relation in pavement distresses and determine the maintenance strategy by expert’s experience .It is difficult to take the knowledge from experts, so the experience can’t be passed down. The document generated by pavement maintenance actives, the pavement database has huge data. Developing on data mining technology, the method can process some information from pavement database. This research attempts to use the method of association rules in Data Mining to D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 31–38, 2008. © Springer-Verlag Berlin Heidelberg 2008
32
C.-T. Hung et al.
analyze the association of pavement surface damages and, based on the method of decision tree, determine what maintenance methods should be taken. Section 2 looks into the application of different Data Mining categories in pavement surface maintenance. Section 3 introduces Association Analysis. In section 4, based on the result of pavement surface survey, association rules are established. In the end, further discussions on the application of Data Mining in pavement engineering are provided.
2 Data Mining Application in Pavement Maintenance Data Mining means the process of finding the important hidden information in the data, such as trends, patterns and relationships. That is, exploring information or knowledge in the data. As a result, there are several different names for Data Mining, including Knowledge Discovery in Databases (KDD), Data Archaeology, Data Pattern Analysis and Functional Dependency Analysis. Many researchers view Data Mining as an important field in combining database systems and machine learning technologies. However, Data Mining is not omnipotent. It does not monitor the development process of the data and then pinpoint the special cases in the database. Also, it does not mean that, when Data Mining is adopted, there is no need to understand statistical principles, the background of the issue and what the data itself really means. We should not assume that the information obtained with Data Mining is all accurate and can be applied without any verification. In fact, Data Mining is used to help planning and analysis personnel find hypotheses, but it is not responsible for the verification of such hypotheses, nor dose it determine the real value of them. In some countries, there are cases where Data Mining is used successfully in civil engineering, but they are not common. In 2000, Vanessa Amado[1] applied Data Mining in the pavement surface management data saved in MoDOT’s (Department of Transportation, the State of Missouri). She used great amounts of pavement surface condition data collected between 1995 and 1999. The data format included 28,231 data items and 49 columns, predicting future PSR to determine the remaining lifespan of the pavement surface. This pavement surface database contains pavement surface service data collected by automatic testing vehicles and structural data collected by structural testing equipment. Here is how the analysis process was carried out. The goal of this study was preparing and exploring data. The first step in establishing the analysis model is converting the database files to Excel files and the data type of each column is related to measurements. Then, the software to be used is selected. Due to the fact that IBM Intelligent Miner for Data’s function of association and Data Mining, this study used it to analysis. IBM Intelligent Miner for Data can provide great amounts of relevant data to conduct analysis and this software is compatible with ASCII, Dbase, Oracle and Sybase formats. In addition, it can execute much application analysis of Data Mining, such as prediction, data pre-processing, regression, classification, clustering and association. Moreover, this software also uses Decision Tree and Artificial Neural Network as the method of exploring data. This analysis methods of Data Mining this software adopts include association, neural clustering and tree classification. Analysis results are provided regarding the pavement surface
A Research on the Association of Pavement Surface Damages Using Data Mining
>
33
<
characteristics of two groups, PSR(Pavement Serviceability Rating) 24 and PSR 24. That is, with data mining, separating the analysis data into 2 pavement surface types, PSR 24 and PSR 24. (a) Association is used to reveal the numeric value of each attribute. For cut-anddried data set, such a method can identify the PSR of each specific pavement surface. (b) Neural clustering is used to find the central location of clustering with similar characteristics. This technique is used to analyze the PSR of a specific pavement surface and the similarity of each clustering. When a new pavement surface is assigned to a certain clustering, it means that this pavement surface is most similar to the center in this clustering. (c) The model generated from Tree Classification is based on the known data. This technique can separate pavement surfaces into 2 categories, “not good” (PSR 24) or “good” (PSR 24). This technique’s classification process has 3 parts, the training model, the testing model and the application model. The training model is learning the user how to divide. The testing model applies this model in the testing data, in which the specific pavement surface level has been determined, to test the accuracy of the model generated by the training model. The application model is used to predict the future PSR of a pavement surface. In 2004, Bayrak et al. [2] adopted Neural Network to establish the test model for the flatness of cement concrete pavement surface. This study included pavement surfaces in 83 pavement surface sections in 9 states. 7 variables, including very kinds of data, traffic volume data and road surface damage data were used, and flatness prediction model, which was based on 7-10-10-10-1 network model, was established. This model had a coefficient of determination of 0.83 in the training data and 0.81 in testing data, which means it had a good predicting ability. In 2007, Khaled [3] adopted Data Mining to analyze the transportation project database of Illinois State. Using Association Analysis to analyze 21 data groups with several characteristics, including general data, project specific data, traffic control data and contract data, Khaled generated 9 rules. For example, one of the rules is, a new surface will be built if the final bid amount is less than $508,391 and the total traffic cost is less than $9,125 (This rule is 93% accurate in this database and there are at least 13 cases to support this rule). Therefore, the use of Data Mining can effectively help pavement surface engineers make the right decisions for their projects. The information technology is indispensable to the collection and analysis of pavement surface data. Data, both collected automatically or by people, can be used to determine M&R (maintenance & rehabilitation). However, the amount of data contained in PMS database is very large, and, therefore, there is a need to further explore this data to obtain more unknown and precious knowledge to help make the right M&R decisions.
>
<
>
<
3 Association Analysis There are several Data Mining techniques and more and more techniques which target at different fields of application and different types of databases have been introduced. Each technique has its characteristics and applications, the main and most
34
C.-T. Hung et al.
popular techniques include Characterization and Discrimination, Association Analysis, Classification and Prediction, Cluster Analysis, Outlier Analysis and Evolution Analysis. Han and Kamber [4] pointed out that Association Rules was the most mature and widely used technique in Data Mining. Association Rules was first proposed by Agrawal et. al.[5], and was mainly used to find the association among database items. Brin et al. [6] pointed out that Association Rules was initially used to study Market Basket Data. Through analyzing customers’ purchasing behaviors, the association of products can be found, which can be used as reference for business owners to decide how to shelf the products, what to buy and how much inventory they should have. By doing this, the products will be more competitive so sales turnover of these products is improved and profits will increase. For example, a customer is very likely to purchase bread after he buys milk. Therefore, milk products should be shelved next to bread products. Such information is called “Association Rules”. Here is how it is described: milk→bread[minsup=2% minconf = 80%]. There are two important parameters in Association Rules, which are support and confidence. These two parameters are used to evaluate if Association Rules can meet the expectations of the users. The most common algorithms to obtain Association Rules include Apriori algorithm, DHP, AprioriTid, AprioriHybrid, Boolean, FP-Tree, ICI and AIM. Apriori algorithm is the most representative one among all Association Rules algorithms and many Association Rules-related algorithms are based on, improved from or extended from Apriori algorithm. Currently, the improved algorithms include AprioriTid, AprioriHybrid, Boolean, Partition, DIC, Cloum-Wise Apriori, MultipleLevel and so on. Apriori algorithm includes the following steps:
,
Step 1. Use (k-1) – frequent item sets (LK-1) to generate candidate item sets (Ck). Step 2. Scan through Database D and calculate the support of all candidate item sets. All candidate item sets, whose support is larger or equal to the minimum support, are selected to become frequent item sets Lk , whose length is K. Step 3. Repeat step 1 and 2 until no new candidate item sets can be generated. (a) The rules to join and prune candidate item sets: (1) Follow step 1 to locate (k-1) – frequent item set, which has two identical k-2 items, to form k-item set. (2) Check the k-item set in step 1 to see if all the subsets of (k-1) –item set have appeared. If this is true, then keep this k-item set. (b) Two bottlenecks of Apriori algorithm: (1) Generate a great number of item sets 2-candidate items are generated by combining two 1-frequent items. If there are k items in 1-frequent item set, k-1 + k-2 +…+1 2candidate items will be generated, or k* k-1 /2. If 1-frequent item set has 1,000 items, 450,000 2-candidate items will be generated. (2) Scanning through the database several times is necessary Because there is a great number of candidate items, and each item has to scan through the whole database to obtain support, resulting in the low efficiency. The goal of this study is to shorten the time needed in generating frequent item sets.
( )
( ) ( ) ( )
A Research on the Association of Pavement Surface Damages Using Data Mining
35
4 Case Study 4.1 The Application of Road Repairing Data This study uses 92 groups of data obtained on Line 110A in 1999 to process the following association analysis of pavement surface damages. There are 18 different kinds of pavement distresses.. The severity of the damage can be divided into 3 categories: S (minor), M (medium) and H (severe). The range of the damaged areas can also be divided into 3 categories: a (minor), b (medium) and c (extensive). Apriori algorithm, targeting at the database studied, first establishes a candidate item set with only one item. Next, the database is scanned to find how many times each candidate item set appears in the database, which is called support. If we set the minimum support at 10, then candidate item sets that appear 10 times or more become the large item set. In this example, {1sa}{1ma}{1mb}{3sa}{3sb}{3mb}{4sa} is a large item set L1 . The candidate item set that has 2 items. When we calculate how many times each candidate item set, which contains 2 items, appears in the database, we get {1mb,3sb} as the large item set of two items. Since we cannot continue calculating a large item set with 3 items, the first stage of Aprioir algorithm ends. The next step is finding association rules in the large item set that has at least 2 items. In this example, the only large item set that has at least 2 items is {1mb,3sb}, so there are 2 possible association rules: (a) If Alligator cracking (severity of damage: medium; range of damaged area: medium) are found, then longitudinal cracking (severity of damage: minor; range of damaged area: medium) are very likely to appear. (b) If longitudinal cracking (severity of damage: minor; range of damaged area: medium) are found, then alligator cracking (severity of damage: medium range of damaged area: medium) are very likely to appear.
;
Take rule 1 as an example, the support of this rule can be calculated as follows: Confidence
1 mb → 3 sb
=
Support (1mb ,3 sb ) 11 = = 0 . 34 Support (1mb ) 32
(1)
Take rule 2 as an example, the support of this rule can be calculated as follows: Confidence
3 sb → 1 mb
=
Support (1mb , 3 sb ) 11 = = 0 . 65 Support ( 3 sb ) 17
(2)
If we set the minimum support at 0.5, then only the second rule’s support value is larger than this number. Therefore, based on Apriori algorithm, if longitudinal cracking (severity of damage: minor; range of damaged area: medium) are found, then alligator cracking (severity of damage: medium range of damaged area: medium) are very likely to appear. Based on the result of this study, it can be concluded that longitudinal cracking will affect a pavement surface’s load bearing ability, resulting in the appearing of alligator
;
36
C.-T. Hung et al.
cracking. Alligator cracking have no obvious influence on the serviceability of a pavement surface, but they have a major negative impact on the structure of the pavement surface. The reason is alligator cracking will bring water invasion so they will further develop into pen-holes and dents. Therefore, based on the result of Data Mining, this study concludes that, to prevent future structural damages, longitudinal cracking have to be prevented. Longitudinal cracking are caused when rolling press is not done properly, so on-site quality control should be enhanced to prevent longitudinal cracking from appearing. Table 1. Frequency of a candidate item set with 1 item appears
Damage Type
Support
Damage Type
Support
1sa 1sb 1ma 1mb 1mc 2sa 2sb 2mb 3sa 3sb 3ma 3mb
15 4 17 32 2 1 4 9 15 17 1 22
4sa 4sb 4ma 4mb 4ha 5sa 5sb 5ma 5mb 6sa 6ma 6mb
27 2 7 1 3 4 3 1 5 2 1 1
Damage Type
Support
7sa 7sb 7sc 7ma 7mb 7mc 8sa 8ma 9sb 10mb 10mc 13sa 13mb
1 2 1 1 5 1 3 1 1 1 1 2 2
Table 2. Frequency of a candidate item set with 2 item appears
Damage Type
Support
Damage Type
Support
Damage Type
Support
1sa,1ma 1sa,1mb 1sa,3sa 1sa,3sb 1sa,3mb 1sa,4sa 1ma,1mb
0 0 2 1 3 8 0
1ma,3sa 1ma,3sb 1ma,3mb 1ma,4sa 1mb,3sa 1mb,3sb 1mb,3mb
2 1 6 5 2 11 8
1mb,4sa 1mb,3sb 1mb,3mb 1mb,4sa 3sb,3mb 3sb,4sa 3sm,4sa
3 0 0 7 0 1 5
4.2 Applications of Road Damage Data Using the same method to find the relationship in survey of pavement distresses . There are five kinds of pavement distresses and different items with section 4.1. By Apriori algorithm, it produced one rule base on support is 10 times and confidence rate is 0.5. When pavement had alligator crack, there will cause the pen-hole. This means repairing alligator cracking will help reduce the incidence of pen-holes. Therefore, when it comes to choosing pavement surface maintenance methods, preventative
A Research on the Association of Pavement Surface Damages Using Data Mining
37
maintenance methods, which are used in other countries, should be adopted. If repair work is done to alligator cracking the moment they appear, the incidence of pen-holes on road surfaces can be reduced greatly. As for repair materials, high quality and durable ones should be used. Some other road maintenance departments have used the latest repair materials. Based on their record, it has been shown that these latest repair materials are more durable. When it comes to the question of how to choose materials that can show results in the early stage of repair and are also durable, more studies should be done to find the answer. 4.3 Discussion With the use of Apriori algorithm, we have a better understanding of the association of different pavement surface damages. Also, based on the association of these damages, preventative maintenance methods can be adopted to lengthen the lifespan of pavement surfaces. When studying how reliability changes in the second case, we find that, the lower the reliability is, the larger the number of association rules but the weaker their links are. For instance, when reliability is lowered to 10%, the rule, penholes will result in alligator cracking, is concluded. The incidence of both damages appearing at the same time accounts for 83.70% of all the data; that is, there is a high percentage of both damages occurring at the same time. However, the association that pen-holes will result in alligator cracking is only 10%, which is not reasonable in practice. Even, there is a association rule that alligator cracking will lead to manhole distress, but these two damages are not relevant. Therefore, reliability still relies on the judgment of experts to conclude better association rules. However, compared with experts’ judgment, Apriori algorithm can, based on scientific theories, conclude more accurate association rules. With the increase of data in the database, more associations can be found. Therefore, Data Mining can yield great results in concluding association rules of pavement surface damages.
5 Conclusion With the development of information technology, in recent years, more advanced testing equipment, which is used to collect civil engineering related data, has been invented. As for the interpretation and processing of testing data, some management decision making systems, which are based on flexible calculation or artificial intelligence, have been established, providing reasonable and viable maintenance and repair decisions. An effective public construction management information system should have a database which is complete and has large amounts of data. The data in such a database must be reliable, objective and appropriate so it can assist with the planning of maintenance and the decision-making of budgets. With the development of information technology and the rapid growth of public constructions, automatic data collecting methods have become more and more common. As the capacity of databases continues to increase, new methods and techniques are needed to help engineers and policymakers discovery the useful information and knowledge in the database. This study first adopts Apriori algorithm to conduct association analysis of pavement surface damages to understand the fact that the presence of certain damages is a result of other damages.
38
C.-T. Hung et al.
Acknowledgements. This study is the partial result of the 2003 Project Plan “The Study of Data Collection in the Database of Pavement Surface Management System (NSC92-2211-E-159-005)” by the National Science Council (NSC), and “The Study of the Rapid Re-construction and Repair Techniques of Public Roads(MOTC-IOT-96EDB009)” by the Institute of Transportation of the Ministry of Transportation and Communications in 2007. We would like to thank both NSC and the Institute of Transportation for their financial support.
References 1. Amado, V.: Expanding the Use of Pavement Management Data. In: 2000 MTC Transportation Scholars Conference, Ames, Iowa (2000) 2. Sarimollaoglu, M., Dagtas, S., Iqbal, K., Bayrak, C.: A Text-Independent Speaker Identification System Using Probabilistic Neural Networks. In: Proceedings of the International Conference on Computing, Communication and Control Technologies CCCT 2004, Austin, Texas, USA, vol. 7, pp. 407–411 (2004) 3. Nassar, K.: Application of data-mining to state transportation agencies. IT con. 12, 139–149 (2007) 4. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman, San Francisco (2000) 5. Agrawal, R., Imilienski, T., Swami, A.: Mining association rules between sets of items in large datasets. In: Buneman, P., Jajodia, S. (eds.) Proc. of the 1996 ACM SIGMOD Int’l Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993) 6. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic Itemset Counting and Implication Rules for Market Basket Analysis. In: Proceeding of 1997 ACM-SIGMOD (SIGMOD 1997), Tucson, AZ, pp. 255–264 (1997)
An Integrated Method for GML Application Schema Match Chao Li1, Xiao Zeng2, and Zhang Xiong1 Computer Application Institute, School of Computer Science and Engineering, Beihang University, 37th Xueyuan Road, Haidian District, Beijing, China, 100083 {licc,xiongz}@buaa.edu.cn, {zengxiao29}@gmail.com
Abstract. GML has been a standard in geographical information area for enhancing the interoperability of various GIS systems for data mining. In order to share geography information based on GML, problems in application schema match need to be overcome first. This paper introduces an integrated multistrategy approach on GML application schema match. It combines existing scheme match algorithm with GML3.0 application schema. Firstly, it transforms the input GML application schemas into a GSTree according to linguistic-based and constraint-based match rules. Similarity between two elements is calculated trough different rules separately, and merged into element-level similarity. Secondly, the element-level similarity is rectified by a structure-level match algorithm based on similarity flooding. Finally, the mapping table of GML application schema elements is obtained. The experiment result shows that the approach can effectively discovery the similarity of schema elements, and improve the match results with a high degree of accuracy.
1 Introduction With the development of technologies on multimedia, network communication, data mining and spatial information, WebGIS has become the main trend in building an open, interoperable and internationalized Geographical Information System (GIS). Commercial GIS manufacturers released their products in succession, such as MapInfo ProServer of MapInfo, GeoMedia Web Map of Intergraph, and so forth. Since there is not a standard of development formula among these companies, they built their own spatial data structure independently. These diverse data formats have to be transformed when realizing data sharing. However, the problems of information missing or information losing usually emerge after transformation because of the lack of standard description of their spatial objects. For overcoming problems in sharing multi-sources heterogeneous spatial data, OpenGIS Consortium (OGC) established an encoding standard – Geography Markup Language (GML) [1], which is used for, storage modeling and geographical information transporting. As a communication media of different GIS applications. GML defined a universal data format. Therefore different applications can communicate with each other by using this data description method and thus the geographical information can be shared semantically among different areas as a basis for deeply spatial data mining. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 39–46, 2008. © Springer-Verlag Berlin Heidelberg 2008
40
C. Li, X. Zeng, and Z. Xiong
GML defined various geographical elements by using XML Schema. It provided some basic schemas as meta-schema from which users can choose necessary elements to build their own application schema. However, though the source of spatial data is fairly abroad and the structure is complicated, GML allows users to model unrestrictedly. It caused that the application models made by different users may differ in thousands ways from each other in namespace, data-type, modeling structure and so forth, even if they are all defined by the same geographical element. Therefore, GML application schema match is essential for sharing of GML-based geographical information [2]. This paper put forward an integrated multi-strategy method based on GML 3.0 specification for GML application schema match. It combined element-level schema match and structure-level schema match, included linguistic-based match rules and constraint-based match rules. It took use of similarity-flooding-based structure match method to rectify the similarity when considering the interaction between neighboring nodes of GML schema trees
2 Related Work and Techniques 2.1 Schema Match The process of schema match can be simply summarized as follows: inputting two schemas, taking use of certain algorithm to match the elements and then outputting the result which is the mapping of the elements of the two schemas. As shown in Fig. 1, methods of match can be classified into schema-level match, instance-level match, element-level match and structure-level match in terms of the diversity of match objects; and schema match methods also can be classified into linguistic-based match and constraint-based match [3].
Fig. 1. Classification of schema match methods
Schema match can be used to build a matcher based on one single match algorithm. However, every single algorithm has its own limit. Recent research mainly focuses on building mixed-matcher based on several match rules or combining the match results of several matchers with weight. Furthermore, a mass of assistant information such as data dictionary, knowledge library, users’ input and the reuse of match result are utilized during the process of real schema match.
An Integrated Method for GML Application Schema Match
41
2.2 GML Application Schema Match GML 1.0 was released officially on April, 2000 and it described a variety of geography elements by using Document Type Definition (DTD). GML 2.0 was released on February, 2001 and it begun to use XML Schema to define the geography elements and their attributes. GML 2.0 only contained three meta-schemas and mainly focused on simple geography elements. GML 3.0 was released on January, 2003 and the number of its meta-schema was added to 32. Except the simple elements, GML 3.0 also described some geography elements which are not 2D linetype element. These elements include complex, non-linear 3D elements, 2D elements with topological structure, temporal elements, dynamic elements and layers. A support to complex geometrical entities, topological, spatial reference system, metadata, temporal characters and dynamic elements was added into GML 3.0 [4]. By now, most methods on GML application schema match are based on GML 2.0 [2][4][5]. Since the modeling mechanism of GML 2.0 is comparatively simple; the main differences to a same geography element of various models are in the naming of element and the data type. Thus most relevant match methods can only be classified into element-level match [2][5]. GML 3.0 provides a suit of much more abundant basic labels, public data type and the mechanism which allows users to build their application schema, all of which make the GML3.0-based modeling to a same element not only be different in naming of element and data type, but also might be different in the organization method of the element; thus only taking use of elementlevel match method cannot meet the requirement of complex matches. Reference [4] proposed a structure match method. This method firstly sets the similarity of subnodes as their linguistic similarity and then gets the similarity of two nodes by comparing the similarity of the sub-nodes of the two nodes. Although this method considered the influence between the similarity of different elements to some extent, it only focused on the influence that sub-nodes bring to nodes but overlooked the influence that nodes bring to sub-nodes. In terms of the complexity of GML 3.0 application schema match, it is necessary to consider the element-level and structure-level influential factors comprehensively, pre-saving a mass of GIS assistant information and GML 3.0 meta-schema information in database.
3 GML Application Schema Match This paper advances a multi-strategies method on GML application schema match. Firstly, the inputted GML application schemas need to be transformed into GML schema trees and then a process is implemented for calculating the similarity of the element-pairs of the GML schema trees by using linguistic-based match rule and constraint-based match rule separately. After doing that, the similarity in elementlevel match can be got by weighted combining the two results. Secondly, the similarity need to be modified by using structure-level match based on similarity flooding. Finally, the mapping tables of the two inputs are generated.
42
C. Li, X. Zeng, and Z. Xiong
Fig. 2. GML application schema match
3.1 GML Application Schema Match Since GML inherits the characters of XML, we choose tree structure as the model of GML application schema match and name this kind of tree as GSTree. The root node of GSTree represents the root element of GML application schema and the leaf node of GSTree represents the element of GML application schema which must not contain any other object. Thus the data type of leaf node is the basic data type of GML. The connection between father node and son node in GSTree represents the “Contain” relationship between the relevant elements in GML application schema. By taking use of GSTree, the influence of loop can be removed and thus the unlimited similarity flooding can be avoided. Furthermore, in GSTree, every node has one father at most, which guarantees that there is only one route from root node to a certain leaf node. A typical GSTree is shown in Fig. 3.
Fig. 3. A typical GML schema tree
In this paper we only defined a sub-element named “Feature” simply for the “FeatureMember” element. In real application, there might be multiple sub-elements. We can build the GSTree by adding sub-nodes to the node “FeatureMember” according to same rules.
An Integrated Method for GML Application Schema Match
43
3.2 Linguistic-Based Element Match Based on element name, we introduce the rules of match from two aspects—semantic match and string match [4][5]. Definition 1: dualistic bidirectional relational operator ≌ represents the match relation between two elements. The similarity value of element e1 and element e2 which is calculated based on rule R is saved as λR(e1, e2). Definition 2: If input data cannot meet the restriction requirement set by rule R, then rule R would be called invalidation. z Semantic Match Rule Rule 1: in same namespace, if element e1 and element e2 have a same name, λR1(e1, e2)=1; otherwise, λR1(e1, e2)=0. Rule 2: in different namespace, if element e1 and element e2 have a same name after proper noun pretreatment, then λR2(e1, e2)=1; Otherwise, rule 2 is invalidated. Rule 3: in different namespace, if element e1 and element e2 have a same name after synonymous pretreatment, then λR3(e1, e2)=S; Otherwise, rule 3 is invalidated. Rule 4: in different namespace, if element e1 and element e2 have the same name after approximated pretreatment, then λR4(e1, e2)=H; Otherwise, rule 4 is invalidated. The rules above need to be supported by the data dictionary, proper noun library, and synonymy library in geography information area. The S represents the similarity of synonymy and the H represents the similarity of approximator and S>H. z String Match Rule Rule 5: if str1 represents the name string of element e1 and str2 represents the name string of element e1, then λR5(e1, e2)=1/ ( F(str1, str2)); the function F means the EditDistance between str1 and str2, which is implemented based on Levenshtein-Distance algorithm [7].
Fig. 4. PRI and combination relations of the rules
Fig.4 shows the PRI (Priority) and combination relations of the 5 rules mentioned. If the rules with high PRI take effect, the rules with lower PRI would not be imported; if rule 3 and rule 4 take effect, rule 5 would be imported to implement combination
44
C. Li, X. Zeng, and Z. Xiong
computing; for the combination of two rules, if one of them is invalid, then this combination would be invalid; otherwise, the maximum among the computing results of the two rules in the combination would be set as similarity. For example, if rule 4 takes effect, the similarity value would be set as Max(H, λR5(e1, e2)). 3.3 Constraint-Based Element Match Schema usually contains some restriction to define attributes of elements, such as key mark, data type, range, uniqueness, selectivity and so forth. Considering the characters of GML application schema, we need to build basic label library and basic data library based on XML Schema and GML 3.0. The match rules are as follows: Rule 6: in a same name space, if the data type and label type of element e1 and element e2 are same, then λR6(e1, e2)=1; otherwise, λR6(e1, e2)=0. Rule 7: in a different name space, if element e1 and element e2 whose data type is not empty have a same data type, then λR7(e1, e2)=1; otherwise, rule 7 is invalidated. Rule 8: in a different name space, if element e1 and element e2 have a same label type, then λR8(e1, e2)=1; otherwise, λR8(e1, e2)=0. In the three rules mentioned above, rule 6 has the highest PRI and rule 8 has the lowest PRI. 3.4 Weighted Combination of Similarity Constraint-based element match is usually combined with other match method, which can help to limit the amount of candidate match [3]. If we set e1 as the node of input schema A, set e2 as the node of input schema B, set L as the similarity calculated by using linguistic-based match method, set C as the similarity calculated by using constraint-based match method and set ω as the weight inputted by users, then:
λ (e1 , e2 ) = ω * L + (1 − ω ) * C
(1)
3.5 Structure-Level Match Based on Similarity Flooding Element-level schema match method only computes the similarity of the two input schema elements, but neglects the inter-influence of similarity between those elements. In GSTree, there are abundant of “contain” and “be contained” relations among element nodes. The change of the similarity of one couple of nodes may lead to the changes of the similarity of the couple of its father nodes and the couple of its son nodes, so we take use of structure-level match to modify the result of elementlevel match. The idea of similarity flooding came from Similarity Flooding Algorithm [6], based on which reference [8] proposed a general structure match method Structure Match(SM) which includes a flooding mechanism based on directed graph and a method of similarity modification. The difference between GSTree and general directed graph is: there is only one route between root node and leaf node and the
An Integrated Method for GML Application Schema Match
45
similarity of each node on this route has influence on the similarities of others. We improved SM algorithm and made it be applicable on the tree structure of GSTree. Definition 3: for one node e of GSTree, P(e) represents its father node and C(e) represents its son node. For two GSTree, the initial similarity of their node pair (e1, e2) is λ( e1, e2)0, which is the similarity calculated by using element-level match method. Then after a loop of k times, we can get the similarity of (e1, e2) by using the following expression:
λ ( e1 , e2 ) k = θλ ( e1 , e2 )k −1 + θ P λ ( P ( e1 ) , P ( e2 ) )k −1 + θ C λ ( C ( e1 ) , C ( e2 ) )k −1
(2)
In Expression (2), θ, θP, θC represent the weight inputted by users and θ+θP+θC= 1. It can be seen that the similarity of node pair (e1, e2) is defined by its similarity in previous loop and the similarity of its father node and its son node.
λ (e 1 , e 2 )k − λ (e 1 , e 2 )k − 1 < ε
(3)
The loop will end when it satisfies Expression (3). ε is a threshold inputted by users. By using loop calculation, the similarity of node pair (e1, e2) spreads in the whole route. After similarity rectifying, there might be several results, in which the maximum value is chosen as output. In order to improve precision of match, a similarity threshold value can be set and only those similarities whose values are higher than the threshold can be chosen as final right results.
4 Power Management in DM-Sensors We implemented above algorithm in a prototype system and applied practical data sets to carry out experiment. We chose 6 groups of experiment data and each group contained two GML application schemas which describe city geography information. We chose the right detection ratio of element-level match, which is the ratio of the right match number detected by the system to real right match number, as criterion of evaluation.
A. Threshold of similarity=0.7
B. Threshold of similarity=0.8
Fig. 5. Correct match ratio with different thresholds of similarity
46
C. Li, X. Zeng, and Z. Xiong
According to the experimental results, the average value of right detection rate is 71.3% when the threshold value of similarity is 0.7 by used element-level match method only. When we imported structure-level match to implement similarity modification, the average value of right detection rate increased to 94.5%. The average value of right detection rate is 67.3% when the threshold value of similarity is 0.8 and only element-level match method is applied; after rectifying, the average value of right detection rate increased to 92.6%. The experiment results indicate that taking use of the multi-strategy method for GML application schema match is an effective way to detect the element match relation of GML application schema.
5 Conclusion This article proposes a multi-strategy method for GML application schema match based on GML 3.0 application schema. This method uses GSTree as the model of schema match and takes use of linguistic-based match and constraint-based match separately to calculate similarity of element pair and then combines those two match results with weight. Considering the influence of similarity between neighboring nodes, we adopt a structure-level match method based on similarity flooding to rectify the similarity of element pair. The experimental results indicate that this method can detect the element match relation of different GML application schema effectively, holds higher right match ratio and can be applied widely to the integration of geography information and spatial data mining based on GML.
References 1. OpenGIS Consortium Inc.: Geographic Information–Geography Markup Language (GML) (2003) 2. Guan, J.H., Zhou, S.G., Chen, J.P.: Ontopology Based GML Schema Match for Spatial Information Integration. In: Proceedings of the Second International Conference on Machine Learning and Cybernetics, pp. 2240–2245 (2003) 3. Rahm, E., Bernstein, P.A.: A Survey of Apporaches to Automatic Schema Match. The VLDB J. 10(4), 334–350 (2001) 4. Guan, J.H., Yu, W., An, Y.: Geography Markup Language Schema Match Algorithm. J. Wuhan Univ. 29(2), 169–174 (2004) 5. Zhang, Q., Sun, S., Yuan, P.P.: Fuzzyset-based Schema Match Algorithm for Geographic Information. J. Huazhong Univ. Sci. Technol. 34(7), 46–48 (2006) 6. Melnik, S., Hector, G.M., Rahm, E.: Similarity Flooding: A Versatile Graph Match Algorithm and Its Application to Schema Match. In: Proceedings of the 18th International Conference on Data Engineering, pp. 117–128 (2002) 7. Zhou, J.T., Zhang, S.S., Wang, M.W.: Element Match by Concatenating Linguistic-based Matchers and Constraint-based Matcher. In: Proceedings of the 17th IEEE International Conference on Tools with Artificial Intelligence, pp. 265–269 (2005) 8. Cheng, W., Zhou, L.X., Sun, Y.F.: A Multistrategy Generic Schema Match Approach. Computer Science 31(11), 121–123 (2004)
Application of Classification Methods for Forecasting Mid-Term Power Load Patterns Minghao Piao, Heon Gyu Lee, Jin Hyoung Park, and Keun Ho Ryu* Database/Bioinformatics Laboratory, Chungbuk National University, Cheongju , Korea {bluemhp,hglee,neozean,khryu}@dblab.chungbuk.ac.kr
Abstract. Currently an automated methodology based on data mining techniques is presented for the prediction of customer load patterns in long duration load profiles. The proposed approach in this paper consists of three stages: (i) data preprocessing: noise or outlier is removed and the continuous attributevalued features are transformed to discrete values, (ii) cluster analysis: k-means clustering is used to create load pattern classes and the representative load profiles for each class and (iii) classification: we evaluated several supervised learning methods in order to select a suitable prediction method. According to the proposed methodology, power load measured from AMR (automatic meter reading) system, as well as customer indexes, were used as inputs for clustering. The output of clustering was the classification of representative load profiles (or classes). In order to evaluate the result of forecasting load patterns, the several classification methods were applied on a set of high voltage customers of the Korea power system and derived class labels from clustering and other features are used as input to produce classifiers. Lastly, the result of our experiments was presented.
1 Introduction Electrical customer load patterns prediction has been an important issue in the power industry. Load patterns prediction deals with the discovery of power load patterns from load demand data. It attempts to identify existing customer load patterns and recognize new load forecasting methods, employing methods from sciences such as statistical analysis [1], [2] and data mining techniques [3], [4], [5]. In power system, data mining is the most commonly used methods to determinate load profiles and extract regularities in load data and load pattern forecasting. In particular, it promises to help in the detection of previously unseen load patterns by establishing sets of observed regularities in load demand data. These sets can be compared to current load pattern for deviation analysis. Load patterns prediction using data mining is usually made by building models on relative information, weather, temperature and previous load demand data. Such prediction is aimed at short-term prediction [6, 7, 8, 9, 10, 11], since mid- and long-term prediction may not be reliant because the results of prediction contain high forecasting errors. However, mid- and long-term [12] (load patterns for longer period) forecasting on load demand is very useful and interest. *
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 47–54, 2008. © Springer-Verlag Berlin Heidelberg 2008
48
M. Piao et al.
Input Data Customer information Temperature AMR load data
Representative monthly Load patterns for each customer
Preprocessing Discretization Removing Noise & Outliers Select Classifier Method acquisition
Cluster analysis (K-means) -generate load profiles and classes
Class label (cluster) assign
Build & evaluate Classifiers Build up the Model with training set
Validate the Model with testing set
Results of validation
Fig. 1. Load pattern prediction framework
The main objective of our work is to forecast monthly load patterns from capacity of daily power usage dataset measured for 10 months and customer information in terms of accuracy for the classification processes. The main tasks are the following: and a framework of our approaches is showed in Figure. 1. 1. 2. 3.
Cluster analysis is performed to detect load pattern classes and the load profiles for each class. Classification module is performed using customer load profiles to build a classifier able to assign different customer load patterns to the existing classes. The classifiers are evaluated to select a suitable classification method.
2 Data Collection and Preprocessing A case study concerning a database with load patterns and power usage from 1049 high voltage consumers is considered and this information has been collected by KEPRI (Korea Electric Power Research Institute). The collected load patterns from AMR were made during a period of ten months (from Jan. to Oct.) in 2007. The instant power consumption for each consumer was collected with a cadence of 15 min. The commercial index related with customer electricity use code, and max load demand and temperatures are also applied. To compare the load patterns, we use features of load shapes [13], able to capture relevant information about the consumption behavior, must be create the classifier. These features must contain information about the daily load curve shape of each consumer for each month and presented in Table 1. Lastly, since the extracted features contain continuous variables, entropy-based discretization has been used because the intervals are selected according to the information they contribute target variable. Due to the decision tree’s discretization [14], all continuous contributed variables are cut up into a number of intervals. Let T partition the set D of examples into the subsets D1 and D2. Let there be k classes C1,...,Ck. Let
Application of Classification Methods for Forecasting Mid-Term Power Load Patterns Table 1. Load curve shape features
Shape Feature
Definition
PatternAvg . for day
L1: Load Factor (24h)
s1 =
L2: Night Impact (8h: 23pm~07am)
s2 =
1 Pattern Avg . for night 3 Pattern Avg . for day
L3: Lunch Impact (3h: 12am~03pm)
s3 =
1 PatternAvg . for lunch 8 PatternAvg . for day
PatternMax. for day
Feature
Type
Description
Customer Electricity Use Code
nominal
Different 21 values
Max Load Demand
continuous
Min.:0.32 ~ Max.:5544
Temperature
continuous
Min.:-15.34 ~ Max.:35.23
continuous
Min.:0.32 ~ Max.:5544
1st Jan. AMR daily Power usage (15min. Interval)
0,15,…,2345
…
0,15,…,2345
continuous
Min.:0.32 ~ Max.:5544
1st May
0,15,…,2345
continuous
Min.:0.32 ~ Max.:5544
…
0,15,…,2345
continuous
Min.:0.32 ~ Max.:5544
0,15,…,2345
continuous
Min.:0.32 ~ Max.:5544
Cluster
nominal
{cluster1,…,cluster12}
30th
Oct.
Class
Data Preprocessing
Feature
Type
Description
Customer Electricity Use Code
nominal
Different 21 values
Max Load Demand
nominal
Discrete values
Temperature
Daily Load Factors
Class
1
1
1
2
3
1st Jan.
L ,L ,L
…
…
1st May
1
120
120
120
2
3
L ,L ,L 1
…
…
30th Oct.
L ,L ,L
304
304
304
1
2
3
Cluster
nominal
Discrete values
nominal
Discrete values
nominal
Discrete values
nominal
Discrete values
nominal
Discrete values
nominal
Discrete values
nominal
{cluster1,…,cluster12}
Fig. 2. Data preprocessing for AMR data
49
50
M. Piao et al. 1 st Jan. CUD
MLD
TEM
1
1
L
L
L
1
2
1 st M ay
… 1 3
221
2
2
3
1
3
721
5
6
6
5
2
311
1
4
7
4
2
…
…
…
…
…
…
…
…
…
120
120
L
L
2
L
6
2
3
6
5
1
6
3
1
…
…
…
1
30 th O ct.
… 120 3
…
…
…
304
L
1
304
L
2
304
L
Class
3
5
2
4
6
2
3
cluster1 cluster3
4
2
3
cluster6
…
…
…
…
Fig. 3. Sample of preprocessed input data
P(Ci, Dj) be the proportion of examples in Dj that have class Ci. The class entropy of a subset Dj, j=1, 2 is defined as, k
Ent ( D j ) = −∑ P(Ci , D j ) log(P(Ci , D j ))
(7)
i =1
Suppose the subsets D1 and D2 are induced by partitioning a feature A at point T. Then, the class information entropy of the partition, denoted E(A, T ; D), is given by: E ( A, T ; D) =
D1 D Ent ( D1 ) + 2 Ent ( D2 ) D D
(8)
A binary discretization for A is determined by selecting the cut point TA for which E(A, T ; D) is minimal amongst all the candidate cut point. The same process can be applied recursively to D1 and D2 until some stopping criteria is reached. The Minimal Description Length Principle is used to stop partitioning. Recursive partitioning within a set of values D stop if Gain( A, T ; D ) <
log 2 ( N − 1) δ ( A, T ; D) + , N N
(9)
where N is the number of values in the set D, Gain( A, T ; D ) = Ent ( D )
− E ( A, T ; D ),
k
δ ( A, T ; D ) = log 2 (3 − 2) − [ k ⋅ Ent ( D ) − k1 ⋅ Ent ( D1 ) − k 2 ⋅ Ent ( S 2 )] ,
and ki is the number of class labels represented in the set Di. Fig. 2 shows the data preprocessing for load demand data. Fig. 3 shows the sample of input data. Discretized values are converted from intervals to integers and considered as nominal values, i.e. {20 < temperature < 30} = 9.
3 Generating Representative Load Profiles Using K-Means We describe clustering algorithms for generating the load profiles and class label which will be used classification process. The load pattern associated with any customer contains the information of commercial indexes such electricity use and load factors which recoded every 15 minutes. The representative monthly load pattern (i.e. April, June, Sep., Oct. 2007) of the mth consumer is following: k
V (m) = ∑V (m)i , V (m)i = {V0 (m)i ,... , Vt (m)i ,... , VT (m)i }, k = 30 i =1
(1)
Application of Classification Methods for Forecasting Mid-Term Power Load Patterns
51
where t=0, …, T with T=2345, representing the 15 min. interval between the collected measurements. In cluster analysis, K-means is used to group the load patterns and the optimal clusters are obtained. The use of clustering in this step detects the number of classes as an input of the classification model. In order to evaluate the performance of the clustering algorithm, adequacy measure (MIA: Mean Index Adequacy [15]) is applied MIA is defined as the average of the distances between each input vector assigned to the cluster and its center. It is possible to see that 12 clusters would be good choice, considering the MIA.
4 Classification Methods for Forecasting Load Patterns As shown in Fig.3, the input data is the set of load curve shape features extracted from load patterns and it consists of high dimensions. According to the characteristic, in this section, we describe several classification methods to forecasting customer load patterns. 4.1 CMAR (Classification Based on Multiple Association Rules)
CMAR [16] generates rules using the FP-growth algorithm. In the pruning phase, CMAR selects only positively correlated rules. Only rules that are positively correlated are used for later classification. Also CMAR prunes rules based on database coverage. That is, CMAR removes one data object from the training dataset after it is covered by at least υ rules ( υ expresses the database coverage parameter). In the testing phase, for a new sample, CMAR collects the subset of rules matching the sample from the total set of rules. If all the rules have the same class, CMAR assigns this class to the new sample. If the rules are not consistent in the class label, CMAR divides the rules into groups according to the class label and yields the label of the “strongest” group. The “strength” of a group of rules is computed using weighted chi-square. 4.2 CPAR (Classification Based on Predictive Association Rules)
CPAR [17] is based on a rule generation algorithm for classification known as FOIL [18]. FOIL builds rules to distinguish positive examples from negative ones. FOIL repeatedly searches for the current best rule and removes all the positive examples covered by the rule until all the positive examples in the dataset are covered. For multi-class problems, FOIL is applied to each class: the examples for each class are used as positive examples and those of other classes as negative ones. The rules for all classes are merged together to form the resulted rule set. 4.3 Support Vector Machine
A SVM is an algorithm for the classification of both linear and nonlinear data. It transforms the original data in a higher dimension, from where it can find a hyperplane for separation of the data using essential training examples called support vectors. In our model each object is mapped to a point in a high dimensional space, each dimension of which corresponds to features. The coordinates of the point are the
52
M. Piao et al.
frequencies of the features in the corresponding dimensions. SVM learns, in the training step, the maximum-margin hyper-planes separating each class. In testing step, it classifies a new object by mapping it to a point in the same high-dimensional space divided by the hyper-plane learned in the training step. For our experiments, we used the sequential minimal optimization (SMO) algorithm [19]. 4.4 C4.5 (Decision Tree)
C4.5 is a decision tree generating algorithm, based on the ID3 algorithm [20]. It contains several improvements, especially needed for software implementation. These improvements contain: 1) Handling both continuous and discrete. 2) Handling training data with missing attribute values. 3) Handling attributes with differing costs. 4) Pruning trees after.
5 Experiments and Results In this section, we evaluate our experiments in building a customer load pattern prediction model. In our experiment, we evaluate the classifiers performance. The accuracy was obtained by using the methodology of stratified 10-fold cross-validation. One of the criteria for evaluating classifier is the accuracy of the classification results. We want to be able access how well the classifier can classify. For this purpose, the mean absolute error, root mean squared error, and accuracy were used. Cost of run time will not be considered in here since we mostly consider about the classifier’s accuracy in the application of real world. The parameters of the CMAR were set as follows: the min. support was set to 0.4%, the min. confidence to 70%, and the database coverage was set to 3.75 (critical threshold for a 5% “significance” level, assuming degree of freedom equivalent to 1). More specifically for the CPAR algorithm, the minimum gain set to 0.7, gain similarity ratio to 0.95 and the weight decay factor to 0.67. The best ten rules were used for prediction. For the SVM, the soft margin allowed errors during training. We set 0.1 for the soft margin value. C4.5 parameters were default values. We tested both the C4.5 tree method and the rule method. Classifiers accuracy 94.00% 92.00% 90.00% 88.00% 86.00% 84.00% 82.00% 80.00% 78.00%
Fig. 4. Comparison of classifier error rate
CMAR
CPAR
SVM
C4.5
Fig. 5. Comparison of classifier accuracy
Application of Classification Methods for Forecasting Mid-Term Power Load Patterns
53
As shown in Fig. 4, the CPAR algorithm shows the lowest error rate than others. The error rate is almost about half of decision tree both on mean absolute error and root mean squared error. At Fig. 5, the CPAR and CMAR show the highest accuracy.
6 Conclusion The purpose of this paper is to find useful features and the automated methodology to predict the power load patterns. In this study, we applied k-means clustering to create load pattern classes and the representative load profiles for each class. To compare the load patterns, we used features of load curve shapes such as load factor, night impact and lunch impact, and temperature and max load demand. These features contain information about the daily load curve shape of each consumer. For forecasting the load patterns, we applied several classification methods such as CMAR, CPAR, SVM and C4.5 on the data set of high voltage customers of the Korea power system. In order to evaluate the performance of classifiers, the mean absolute error, root mean squared error, and accuracy were used. In our experiments, CPAR algorithm outperformed the other classifiers.
Acknowledgements This work is supported by development of AMR system interfacing model on internet GIS environment project of the Korea Electric Power Research Institute (KEPRI) and a Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST) (R01-2008-000-10926-0).
References 1. Perry, C.: Short-Term Load Forecasting Using Multiple Regression Analysis. In: Rural Electric Power Conference, pp. B3/1-B3/8 (1999) 2. Bruhns, A., Deurveilher, G., Roy, J.S.: A Non-linear Regression Model for Mid-term Load Forecasting and Improvements in Seasonality. In: 15th PSCC (2005) 3. Huang, S.J., Shih, K.: Short-term Load Forecasting via ARMA Model Identification Including Non-Gaussian Process Considerations. IEEE Trans. Power System 18(2), 673–679 (2003) 4. Chicco, G., Napoli, R., Postulache, P., Scutariu, M., Toader, C.: Customer Characterization Options for Improving the Tariff offer. IEEE Trans. Power System 18, 381–387 (2003) 5. Pitt, B., Kirchen, D.: Applications of Data Mining Techniques to Load Profiling. In: IEEE PICA, pp. 131–136 (1999) 6. Hippert, H.S., Pedreire, C.E., Souza, R.C.: Neural Networks for Short-Term Load Forecasting: A review and Evaluation. IEEE Transactions on Power Systems 16(1), 44–55 (2001) 7. Liu, K., Subbarayan, S., Shoults, R.R., Manry, M.T., Kwan, C., Lewis, F.L., Naccarino.: Comparison of Very Short-Term Load Forecasting Techniques. IEEE Transactions on Power Systems 11(2), 877–882 (1996)
54
M. Piao et al.
8. Liu, Z.Y., Li, F.: Fuzzy-Rule based Load Pattern Classifier for Short-Tern Electrical Load Forecasting. In: IEEE International conference on Engineering of Intelligent systems 2006, pp. 1–6 (2006) 9. Filik, U.B., Kurban, M.: A new Approach for the Short-Term Load Forecasting with Auto Regressive and Artificial Neural network Models. Int. J. Comput. Intel. Res. 3(1), 66–71 (2007) 10. Amjady, N.: Short-term Hourly Load Forecasting Using Time-series Modeling with Peak Load Estimation Capability. IEEE Trans. Power Syst. 16, 498–505 (2001) 11. Chicco, G., Napoli, R., Piglione, F.: Load Pattern Clustering for Short-term Load Forecasting of Anomalous Days. In: IEEE Porto Power Tech Proceedings 2001, vol. 2 (2001) 12. Kandil, M.S., El-Debeiky, S.M., Hasanien, N.E.: Long-Term Load Forecasting for Fast Developing Utility Using a Knowledge-Based Expert System. IEEE Trans. Power Syst. 17(2), 491–496 (2002) 13. Figueiredo, V., Rodrigures, F., Vale, Z., Gouveia, J.B.: An Electric Energy Consumer Characterization Framework Based on Data Mining Techniques. IEEE Trans. Power Syst. 20(2), 596–602 (2005) 14. Dougherty, J., Kohavi, R., Sahami, M.: Supervised and Unsupervised Discretization of Continuous Features. In: Machine Learning: Proceeding of the 12th International Conference. Morgan Kaufmann Publishers, San Francisco (1995) 15. Tsekouras, G.J., Hatziargyriou, N.D., Dialynas, E.N.: Two-Stage Pattern Recognition of Load Curves for Classification of Electricity Customers. IEEE Trans. Power Syst. 22(3), 1120–1128 (2007) 16. Li, W., Han, Pei, J.: CMAR: Accurate and Efficient Classification Based on Multiple Classassociation Rule. In: Proc ICDM 2001, pp. 369–376 (2001) 17. Yin, X., Han.: CPAR: Classification Based on Predictive Association Rules. In: Proc. SIAM Int. Conf. on Data Mining (SDM 2003), San Francisco, pp. 331–333 (January 5, 2003) 18. LUCS-KDD implementations of FOIL (First Order Inductive Learner),http://www. cxc.liv.ac.uk/~frans/KDD/Software/FOIL_PRM_CPAR/foil.html 19. Platt, J.C.: Sequential Mining Optimization: A Fast Algorithm for Training Support Vector Machines. Microsoft Research Technical Report MSR-TR-98-14 (1998) 20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kauffman, San Mateo (1993)
Design of Fuzzy Entropy for Non Convex Membership Function Sanghyuk Lee, Sangjin Kim, and Nam-Young Jang School of Mechatronics, Changwon National University #9 sarim-dong, Changwon, Gyeongnam 641-773, Korea {leehyuk,aries756,optofiber}@changwon.ac.kr
Abstract. Fuzzy entropy is designed for non convex fuzzy membership function using well known Hamming distance measure. Design procedure of convex fuzzy membership function is represented through distance measure, furthermore characteristic analysis for non-convex function are also illustrated. Proof of proposed fuzzy entropy is discussed, and entropy computation is illustrated. Keywords: Fuzzy entropy, non-convex fuzzy membership function, distance measure.
1 Introduction Characterization and quantification of fuzziness are important issues about the data management. Especially the management of uncertainty affect in many system model and designing problem. The results about the fuzzy set entropy have been well known by the previous researchers [1-6]. Liu had proposed the axiomatic definitions of entropy, distance measure and similarity measure, and discussed the relations between these three concepts. Kosko viewed the relation between distance measure and fuzzy entropy. Bhandari and Pal gave a fuzzy information measure for discrimination of a fuzzy set relative to some other fuzzy set. Pal and Pal analyzed the classical Shannon information entropy. Also Ghosh used this entropy to neural network. However, all these results are based on the convex fuzzy membership functions. For fuzzy set, uncertainty knowledge in fuzzy set can be obtained through analyzing fuzzy set itself. Thus most studies about fuzzy set are emphasized on considering membership function. At this point we have an interest for non-convex fuzzy membership. Applying fuzzy entropy to non convex fuzzy membership function, first we analyze the characteristics for fuzzy sets. With previous result of fuzzy entropy, we have designed the fuzzy entropy for non convex membership function [7]. The fuzzy entropy was designed based on the distance measure. Entropy value is proportional to the difference area between fuzzy set membership function and crisp set. However, considered fuzzy membership function was restricted to convex-type fuzzy membership function. In this paper, we extend the fuzzy entropy for convex membership function to the non convex membership function. To overcome sharpening and complementary properties of fuzzy entropy definition, it is required to add assumptions. To verify the D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 55–60, 2008. © Springer-Verlag Berlin Heidelberg 2008
56
S. Lee, S. Kim, and N.-Y. Jang
usefulness of proposed fuzzy entropy for non convex membership function, we also utilize the definition of fuzzy entropy. In the next chapter, the axiomatic definitions of entropy, previous fuzzy entropy for convex membership function are introduced. Preliminary study of non convex membership function is proposed in Chapter 3. Fuzzy entropy for non convex membership function is derived and proved in Chapter 4. Finally, conclusions are followed in Chapter 5. Notations of Liu's are used in this paper [4].
2 Fuzzy Entropy 2.1 Preliminary Results We introduce some preliminary results about axiomatic definitions of fuzzy entropy and related results. Definition 2.1 represents the axiomatic definition of fuzzy entropy. + Definition 2.1. (Liu, 1992) A real function e : F ( X ) → R or e : P( X ) → R + is called an entropy on F ( X ) , or P( X ) if e has the following properties:
(E1) e( D ) = 0, ∀D ∈ P( X ) (E2) e([1/ 2]) = max A∈F ( X ) e( A) (E3) e( A* ) ≤ e( A) , for any sharpening A* of A (E4) e( A) = e( Ac ), ∀A ∈ F ( X ) . where [1 2] is the fuzzy set in which the value of the membership function is 1 2 , R + = [ 0, ∞ ) , X is the universal set, F ( X ) is the class of all fuzzy sets of X , P( X ) is the class of all crisp sets of X and D c is the complement of D . A lot of fuzzy entropy satisfying Definition 2.1 can be formulated. We have designed fuzzy entropy in our previous literature [7]. Now two fuzzy entropies are illustrated without proofs. Fuzzy entropy 1. If distance d satisfies d ( A, B ) = d ( AC , BC ) , A, B ∈ F ( X ) , then e ( A ) = 2d
( ( A ∩ A ) , [1]) + 2d ( ( A ∪ A ) , [0]) − 2 near
near
(1)
is fuzzy entropy. Fuzzy entropy 2. If distance d satisfies d ( A, B ) = d ( AC , BC ) , A, B ∈ F ( X ) , then e ( A ) = 2d
(( A ∩ A ) , [0]) + 2d (( A ∪ A ) , [1]) far
far
(2)
is also fuzzy entropy. Exact meaning of fuzzy entropy of fuzzy set A is fuzziness of fuzzy set A with respect to crisp set. We commonly consider crisp set as Anear or A far . In the above
Design of Fuzzy Entropy for Non Convex Membership Function
57
fuzzy entropies, one of well known Hamming distance is commonly used as distance measure between fuzzy sets A and B , d ( A, B ) =
1 n ∑ μ A ( xi ) − μB ( xi ) 2 i =1
where X = { x1 , x2 ,L xn } , k is the absolute value of k . μ A ( x) is the membership func-
tion of A ∈ F ( X ) . Basically fuzzy entropy means the difference area between two fuzzy membership functions. Fuzzy entropy (1) and (2) satisfy Definition 2.1. However, Definition 2.1 does not restrict to convex fuzzy membership function. Next, we introduce non-convex fuzzy membership function. Definition of non-convex fuzzy membership function can be found in reference [8]. Non-convex fuzzy sets are not common fuzzy membership function. Definition of non-convexity derived from convexity definitely. 2.2 Non Convex Membership Function
By Jang et.al., it has been known that definition of convexity of a fuzzy set is not as strict as the common definition of convexity of a function [8]. Definition 2.2 represents the definition of convexity. Definition 2.2. [8] A fuzzy set A is convex if and only if for any x1 , x2 ∈ X and any λ ∈ [0,1] ,
μ A (λ x1 + (1 − λ ) x2 ) ≥ min{μ A ( x1 ), μ A ( x2 )}
(3)
Non convexity fuzzy set is said if it is not convex. Non convex membership functions can be notified naturally 3 sub classes [9]. • • •
Elementary non-convex membership functions Time related non-convex membership functions Consequent non-convex membership functions
First, a discrete fuzzy set express elementary non-convex fuzzy membership functions. However continuous domain non-convex fuzzy set may be less common. Next, time related non-convex membership functions can be found in energy supply by time of day or year, mealtime by time of day. This fuzzy set is interesting as it is also sub-normal and never has a membership of zero. Finally, Mamdani fuzzy inferencing is a typical example of consequent nonconvex sets. In a rule based fuzzy system the result of Mamdani fuzzy inferencing is a non-convex fuzzy set where the antecedent and consequent fuzzy sets are triangular and/or trapezoidal. Jang et.al insisted that the definition of convexity of a fuzzy set is not as strict as the common definition of convexity of a function [8]. Then the mathematical definition of convexity of a function is f (λ x1 + (1 − λ ) x2 ) ≥ λ f ( x1 ) + (1 − λ ) f ( x2 )
which is a tighter condition than (3).
(4)
58
S. Lee, S. Kim, and N.-Y. Jang
Fig. 1. Convex MF and Non-convex MF [8]
Fig. 1 (a) show two convex fuzzy sets, the left fuzzy set satisfies both (3) and (4) while the right one satisfies (3) only. Whereas, Fig. 1(b) is a con-convex fuzzy set. By the definition of Jang et.al, fuzzy entropy of Fig. 1 (a) are satisfied. However if two fuzzy set are considered as one fuzzy, then it has to be consider as non-convex fuzzy set. Hence by the computation of (1), we can obtain the fuzzy entropy value. Fig. 1 (b) is typical non-convex membership function. If the crisp set is applied as rectangular, we also compute fuzzy entropy.
3 Fuzzy Entropy of Non Convex Membership Function Drinking milk temperature is proper as the non convex fuzzy membership function. Medium temperature is not popular to drink. Fig. 2 shows that the preference temperature of milk. Now we focus the non convex membership function. Conditions of (E1) and (E2) are natural for non convex membership function too. However, (E3) and (E4) are important to decide structure of fuzzy entropy. For fuzzy membership function, we assign crisp set corresponding to fuzzy set as follows. For every non convex fuzzy set A we let crisp set be Anear . Then two fuzzy entropy measure (1) and (2) are applicable to non convex fuzzy membership function. Now, we can show that two fuzzy entropy (1) and (2) are satisfied as fuzzy entropy as non convex fuzzy membership function [6,7]. It is essential to assign the crisp set Anear of fuzzy set A . Crisp set Anear of non convex fuzzy set A is also non convex. Next two fuzzy entropy measures are presented as fuzzy entropy of non convex membership function. Theorem 3.1. If distance d satisfies d ( A, B ) = d ( AC , B C ) and for convex or non convex A, B ∈ F ( X ) ,
e( A) = 2d ( ( A ∩ Anear ) ,[1]) + 2d ( ( A ∪ Anear ) ,[0]) − 2 is fuzzy entropy.
Design of Fuzzy Entropy for Non Convex Membership Function
59
Fig. 2. Preference temperature of drinking milk
Fig. 3. Fuzzy set and crisp set
Proof is natural, because fuzzy set A has the same condition of our previous result [6,7]. For Fig. 3, d ( ( A ∩ Anear ) ,[1]) computation are performed two times, whereas d ( ( A ∪ Anear ) ,[0]) once. Hence, our previous result can be extended to non convex
case. Theorem 3.2. If distance d satisfies d ( A, B ) = d ( AC , B C ) and for convex or non convex A, B ∈ F ( X ) ,
e( A) = 2d is also fuzzy entropy.
(( A ∩ A ) ,[0]) + 2d (( A ∪ A ) ,[1]) far
far
60
S. Lee, S. Kim, and N.-Y. Jang
Proof is similar to those of Theorem 3.1. In Theorem 3.2, computation of d
(( A ∩ A ) ,[0]) is also two times are preformed, whereas d (( A ∪ A ) ,[1]) is far
far
one. For non convex fuzzy set, it is also applicable to convex fuzzy entropy compution. However, proper assignment of crisp set is required to formulate fuzzy entropy measure.
4 Conclusions Fuzzy entropy of non convex fuzzy membership function is designed. Non convex fuzzy membership function is introduced and its property was discussed. Furthermore, characteristic analysis for non convex function is also illustrated. Our fuzzy entropy measure for fuzzy set is also applicable to non convex fuzzy membership function. We have discussed this fact, it is essential to assign corresponding crisp set. We have find out that the corresponding crisp set is also non convex set. Acknowledgment. This work was supported by 2nd BK21 Program, which is funded by KRF(Korea Research Foundation).
References 1. Bhandari, D., Pal, N.R.: Some New Information Measure of Fuzzy Sets. Inform. Sci. 67, 209–228 (1993) 2. Ghosh, A.: Use of Fuzziness Measure in Layered Networks for Object Extraction: a Generalization. Fuzzy Sets and Systems 72, 331–348 (1995) 3. Kosko, B.: Neural Networks and Fuzzy Systems. Prentice-Hall, Englewood Cliffs (1992) 4. Xuecheng, L.: Entropy, Distance Measure and Similarity Measure of Fuzzy Sets and Their Relations. Fuzzy Sets and Systems 52, 305–318 (1992) 5. Pal, N.R., Pal, S.K.: Object-background Segmentation Using New Definitions of Entropy. IEEE Proc. 36, 284–295 (1989) 6. Lee, S.H., Kang, K.B., Kim, S.S.: Measure of Fuzziness with Fuzzy Entropy Function. Journal of Fuzzy Logic and Intelligent Systems 14(5), 642–647 (2004) 7. Lee, S.H., Cheon, S.P., Kim, J.: Measure of Certainty with Fuzzy Entropy Function. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 134–139. Springer, Heidelberg (2006) 8. Jang, J.S.R., Sun, C.T., Mizutani, E.: Neuro-Fuzzy and Soft Computing. Prentice Hall, Upper Saddle River (1997) 9. Garibaldi, J.M., Musikasuwan, S., Ozen, T., John, R.I.: A Case Study to Illustrate the Use of Non-convex Membership Functions for Linguistic Terms. In: 2004 IEEE International Conference on Fuzzy Systems, vol. 3, pp. 1403–1408 (2004)
Higher-Accuracy for Identifying Frequent Items over Real-Time Packet Streams Ling Wang, Yang Koo Lee, and Keun Ho Ryu∗ Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Chungbuk, Korea {smile2867,leeyangkoo,khryu}@dblab.chungbuk.ac.kr
Abstract. In this paper, we classified the synopses data structure into two major types, the Equal Synopses and Unequal Synopses. Usually, a Top-k query is always processed over equal synopses, but Top-k query is very difficult to implement over unequal synopses because of resulting inaccurate approximate answers. Therefore, we present a Dynamic Synopsis which is developed by DSW (Dynamic Sub-Window) algorithm to support the processing of Top-k aggregate queries over unequal synopses and guarantee the accuracy of the approximation results. Our experiment results show that using Dynamic Synopses have significant performance benefits of improving the accuracy of approximation answers on real time traffic analyses over packet streaming networks. Keywords: sliding window, Top-k, frequent items, dynamic synopses.
1 Introduction A data stream is a real-time, continuous, ordered sequence of items generated by sources such as sensor networks, Internet traffic flow, credit card transaction logs, or on-line financial tickers. In the last several years, it has been shown that the unique properties of data streams- virtually unbounded length, fast arrival rate, and lack of system control over the order in which items arrive- generate many interesting research problems in algorithm analysis and data management. On-line data streams possess interesting computational characteristics, such as unknown or unbounded length, a possibly very fast arrival rate, the inability to backtrack over previously arrived items (only one sequential pass over the data is permitted), and a lack of system control over the order in which the data arrive[1]. The real-time analysis of network traffic has been one of the primary applications of data stream management systems, examples of which include Gigascope [2] and STREAM [3]. A problem of particular interest, motivated by traffic engineering, routing system analysis, customer billings, and the detection of anomalies such as denial-of service attacks, concerns the statistical analysis of data streams with a focus on newly arrived data and frequently appearing packet types. For instance, an ISP may be interested in monitoring streams of IP packets originating from its clients and identifying those ∗
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 61–68, 2008. © Springer-Verlag Berlin Heidelberg 2008
62
L. Wang, Y.K. Lee, and K.H. Ryu
users who consume the most bandwidth during a given time interval. The objective of these types of queries is to return a list of the most frequent items (called Top-K Queries or hot list queries) or items that occur above a given frequency (called threshold queries). Usually, Top-k query is always processed over an equal synopsis, but it’s very hard to implement over an unequal synopsis because of the resulting inaccurate approximate answers. Therefore, in this paper, we focus on periodically refreshed Top-k queries over sliding windows on Internet traffic streams. We present a Dynamic Synopsis to support the processing of Top-k aggregate queries over an unequal synopsis and guarantee the accuracy of the approximation results.
2 Related Work The question of how to maintain an efficient synopsis data structure is very important for the stream in order to compute the statistics rapidly. There are many types of synopsis data structures that have been presented in recent years. The running synopsis as an unequal synopsis is good at subtractable aggregates [4] such as SUM and COUNT. Paned-window synopsis, which was proposed by Jin Li et al. in [5], is an extended version of the basic-window synopsis in which the sub-windows are called “panes.” Compared with the unequal synopsis, the paired-window synopsis, which was proposed by Sailesh Krishnamurthy et al. in [6], improves on the paned-window synopsis by using paired-windows which chop a stream into pairs of possibly unequal sub-windows. Focus on periodically refreshed Top-K queries over sliding windows on Internet traffic streams [7]. Queries that return a list of frequently occurring items are important in the context of traffic engineering, routing system analysis, customer billing, and the detection of anomalies such as denial-of-service attacks. There has been some recent work on answering Top-K queries over sliding windows [8]. Lukasz Golab et al. in [7] proposed a FREQUENT algorithm, which identifies the frequently occurring items in sliding windows and estimates their frequencies. They answer the frequent item queries using small-size Basic-Window synopses (sub-windows), because there is no obvious rule for merging the partial information in order to obtain the final answer. They store a top-k synopsis in each Basic-window and maintain a list of the k most frequent items in each window at the same time. Finally, they output the identity and value of each global counter over the threshold, δ.
3 Sliding Window Model The sliding window model causes old items to expire as new items arrive. The sliding windows maintain the last N tuples seen at all times which arrived in the last t time units. This means that producing an approximate answer to a data stream query requires that the query be evaluated not over the entire past history of the data streams, but rather only over sliding windows of recent data from the streams. Imposing sliding windows on data streams is a natural method of approximation that has several attractive properties. It is well-defined and easily understood: the semantics of the approximation are clear, so that the users of the system can be confident that they understand what is sacrificed in producing the approximate answer. It is
Higher-Accuracy for Identifying Frequent Items over Real-Time Packet Streams
63
deterministic, so there is no danger that unfortunate random choices will produce a bad approximation. Most importantly, it emphasizes recent data, which in the majority of real-world applications is more important and relevant than old data: if one is trying in real-time to make sense of network traffic patterns, phone call or transaction records or scientific sensor data, then general insights based on the recent past will be more informative and useful than insights based on stale data.
Fig. 1. Sliding Window Model
Internet traffic on a high-speed link arrives so fast that useful sliding windows may be too large to fit in main memory. In this case, the window must somehow be summarized and an answer must be approximated on the basis of the available synopsis information. Figure 1 shows a structure example of Sliding Window Model. The Sliding Window Model contains two major parts: the Slide Manager and Synopsis. Slide Manager keeps track of time and determines when to end the next slice. The whole sliding windows can be classified into some sub-windows and only store a sketch of each sub-window in memory, and re-evaluate the query when the most recent sub-windows are full. A summary which contains the entire sub-windows that called Synopsis. Using an approach called partial aggregates, we can do some aggregates over each sub-window and the results can be combined and reprocessed for each sliding window over the final aggregates operators. 3.1 Semantics of Sliding Windows Usually, a non-partitioned window specification consists of three parameters: RANGE, SLIDE and WARRT. RANGE defines the size of the window, SLIDE defines the steps at which the window moves, and WARRT represents the windowing attribute. Both RANGE and SLIDE are specified in terms of windowing attribute values, including the units. This definition of the window specification allows users to use any data attribute with a total ordered domain as the windowing attribute, such as a timestamp attribute, tuple sequence number, and other, non-temporal attributes.
4 Dynamic Synopsis In this paper, we classified the synopses data structure into two major types, the Equal Synopses and Unequal Synopses. Usually, a Top-k query is always processed over equal synopses, but Top-k query is very difficult to implement over unequal synopses because of resulting inaccurate approximate answers. Therefore, we present a
64
L. Wang, Y.K. Lee, and K.H. Ryu
Dynamic Synopsis which is developed by DSW (Dynamic Sub-Window) algorithm [9] to support the processing of Top-k aggregate queries over unequal synopses and guarantee the accuracy of the approximation results. 4.1 Equal Synopsis and Unequal Synopsis Here, we classify the Synopsis Data Structure into two major types, Equal Synopsis and Unequal Synopsis. In the Equal Synopsis (e.g. Basic Window Synopsis), all the sub-windows have the same size. Otherwise, we call it the Unequal Synopsis (e.g. Paired Window Synopsis). The use of the synopsis can reduce both the space and computation cost of evaluating sliding window queries by sub-aggregates and sharing the computation. The Equal Synopsis is very easy to implement, but it’s very hard to share the overlapping windows when we solve the multi-aggregate queries over the streams, because it always leads to more slices. Unequal Synopsis solves this problem in a good way; it’s very efficient to process the multi-aggregate queries over sharing the overlapping windows. For identifying the frequent items over unequal synopsis, the problem is that false negatives always occur very highly [9]. Most useful sliding windows can be considered as non-overlapping sliced windows. Now, we define overlapping and non-overlapping sliced windows, further to show the difference between Equal Synopsis and Unequal Synopsis in an algebraic expression. Definition 1 (overlapping): An overlapping window W with range r and slide s (r>s) is denoted by W[r, s] and is defined at time t, as the tuples in the interval:
Definition 2 (Sliced): A sliced window W that has m slices is denoted by W (s1, ... , sm). We say that W has |W| slices, a period s = s1 + … + sm(s means the period which is based on the attribute slide), and that each slice si has an edge ei = s1 + … + si. At time t, W is the tuples in the interval:
Intuition: An aggregate over an overlapping window W[r, s] can always be computed by a process that aggregates partial aggregates over a sliced window V (s1, … , sk,… ,sn) with period s if an only if , sk + … + sm = r mod s. these sliced windows can be paned or paired, defined as: 1. Equal Synopsis: X (s1, s2); s2 = r mod s and s1 = s2 2. Unequal Synopsis: Y (s1, s2); s2 = r mod s and s1 ≠ s2 This intuition is based on the following Lemma. Lemma 1: An aggregate over a window W [r, s] can be computed from partial aggregates of a window V (s1, … , sk, … , sn) with period s if and only if: sk + … + sn = r mod s
Higher-Accuracy for Identifying Frequent Items over Real-Time Packet Streams
65
4.2 Dynamic Synopsis In this paper, we present a Dynamic Synopsis which is developed by DSW (Dynamic Sub-Window) algorithm [9] to support the processing of Top-k aggregate queries over unequal synopses and guarantee the accuracy of the approximation results. DSW (Dynamic Sub-Window) employs the Unequal Synopsis approach and stores a Top-k sketch in each sub-window. In the following selection, we will introduce this method use three phases: Redefinition, Initialization and Maintain. The Dynamic Synopsis process contains three phases. The first phase, we call it Redefinition Phase, see Figure 2 In this example, an unequal synopsis which contains 7 sub-windows such likes S1, S2, S3……S7. In our method, we design a new window called Dynamic Sub-window to redefine the long sub-window (such like S2, the size is much larger than others) into some new small sub-windows (the shadow area in Figure 3). By this process, all the sub-windows will be kept in a similar or same size which for reducing a big difference among all of sub-windows.
Fig. 2. A real synopsis for discussing the Redefinition phase; the shadow area is a window of redefined area named Dynamic Sub-window. It divides the larger sub-window into some small ones, and the size is similar to the others. The timestamp * can be signed by a useful function in the synopsis for maintain the size of Dynamic Sub-window.
The second phase called Initialization Phase which initializes the primary size of the Dynamic Sub-window. The size of Dynamic Sub-window always can be maintained by the attributes of sliding window query automatically. Usually, a nonpartitioned window specification consists of three parameters: RANGE, SLIDE and WARRT. In our case, the primary size of Dynamic Sub-window can be initialized by the values of RANGE and SLIDE of the aggregate query. And we let the basic size of Dynamic Synopsis always equals to the greatest common divisor of RANGE and SLIDE of this query. After the basic size of Dynamic Synopsis has been initialized, a significant problem is how to implement this Dynamic Sub-window in the synopsis. So, we design the third phase: Maintain Phase. A timestamp * will embed into synopsis by a useful function which controls the dynamic sub-window end and restart. 4.3 Function Definition In studying the implementation of the Dynamic Synopsis, we found that its subwindow size can be maintained by a useful function. That is to say, it is always controlled by three parameters which can be maintained by a useful function.
66
L. Wang, Y.K. Lee, and K.H. Ryu
In our method, we use the function θ= {(N-1) / M} + 1 to sign the timestamp * into the synopsis. The Dynamic Sub-Window can be controlled by the timestamp *. This function is defined by three parameters: θ which is the frequency count of Top-1 in each sub-window, N which is the number of arriving stream data and M which is the total number of types which have arrived in the synopsis. There three parameters are always changed in real-time, and when we did an analysis on the relationship among them using real data, we found that N is always bounded by two other parameters, θ and M, such that θ ≤ N ≤ (θ -1) × M + 1. Theorem 1: Given a maximum frequency count of Top-1 in each sub-window θ; M kinds of frequency types which has been arrived in the synopsis, the value of the number of arriving stream data N is found as follows:
The value of N converges to (θ × M) as the values of both θ and M increase infinitely. Proof: Suppose each Dynamic Sub-window contains M kinds of tuple types, and the frequency count of the most frequent item is θ’, then the total number N’ equals θ’× M. When M is fixed, θ= θ’+1, then N= N’+1 and N’= θ’ ×M. Therefore, since θ’= θ1, and N’= N-1, then N-1= (θ-1) ×M. Therefore, θ= {(N-1) / M} + 1. This means that if θ increases 1 more, N will be the saturation state.
5 Experiment and Evaluation We tested this method on tuple-based windows over TCP traffic data. The trace contains 1592 distinct source IP addresses for workloads (A), respectively, which can be treated as distinct item types. We set workload (A) to contain N = 80000. Experiments were conducted with two values of Θ: Θ = 3 (initialized sub-window size is 80) and Θ = 9 (initialized sub-window size is 400). The size of the top-k list, k, is varied from one to ten. 5.1 Experimental Results We examine queries with identical selection predicates and different periodic windows over a real data set. We compare the execution time for 2 strategies (DSwindow and Paired-window) based on the variation in the percentage of identified frequent item types with increasing Top-k. Table 1 shows the performance environment and Figure 3 shows the percentage of IP addresses that were identified by our method. The general trend is that for k ≥ 2, at least 80% of the IP addresses are identified. Comparing with the Paired window, at least k ≥ 4, 80% of the IP addresses can be identified. By k increasing, the space usage will be very high. Therefore, if we use an algorithm to improve the identification rate of the frequent items, it not only reduces the space usage, but also improves the accuracy rate. In the figure, we can see that the dynamic sub-window ensure most of the frequent items can be identified from approximately k = 2.
Higher-Accuracy for Identifying Frequent Items over Real-Time Packet Streams
67
Table 1. Performance environment (A) with initialized DS-window size equals to 80 tuples or 400 tuples Type Workload Distinct IP addresses Predicate
Name A M
Values (tuples) [0,80000] 1592
Θ
Window
DS-window Paired-window
3 (80 tuples) 9 (400 tuples) Initialized size: 80 RANGE: [0,400] SLIDE: [0,320] Initialized size: 400 RANGE: [0,2400] SLIDE: [0,2000]
Fig. 3. Accuracy of identified frequent items with initialized DS-window size equals to 80 tuples or 400 tuples in workload (A)
In this Figure, we can see that although the size of RANGE is larger than before, from k = 7, the DS-window also keeps high accuracy, whereas the paired–window only ensures half of the frequent items can be identified.
6 Conclusions In this paper, we classified the synopsis data structure into two major types, the Equal Synopsis and Unequal Synopsis and gave an algebraic expression. We present a Dynamic Synopsis to support the processing of Top-k aggregate queries over unequal synopses and guarantee the accuracy of the approximation results. In future work, we intend to solve more complex synopses which contain too many small sub-windows for the multi-aggregate queries. We are also working on other aspects of processing streams, including formalization of window semantics, evaluation of window queries and processing disordered streams. Acknowledgment. This research was supported by a grant (#07KLSGC02) from Cutting-edge Urban Development - Korean Land Spatialization Research Project funded by Ministry of Construction & Transportation of Korean government and a
68
L. Wang, Y.K. Lee, and K.H. Ryu
Korea Science and Engineering Foundation (KOSEF) grant funded by the Korea government (MOST) (R01-2008-000-10926-0).
References 1. Golab, L., Ozsu, M.T.: Issues in data stream management. ACM SIGMOD Record 32(2), 5–14 (2003) 2. Cranor, C., Gao, Y., Johnson, T., Shkapenyunk, V., Spatscheck, O.: Gigascope: High performance network monitoring with an SQL interface. In: 2002 ACM SIGMOD international conference on Management of data, p. 623. ACM Press, New York (2002) 3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data streams. In: 21st ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–16. ACM Press, New York (2002) 4. Cohen, S.: User-defined aggregate functions: bridging theory and practice. In: 2006 ACM SIGMOD international conference on Management of data, pp. 49–60. ACM Press, Chicago (2006) 5. Li, J., Maier, D., Tufte, K., Papadimos, V., Tucker, P.A.: No pane, no gain: efficient evaluation of sliding-window aggregates over data streams. ACM SIGMOD Rocord 34(1), 39–44 (2005) 6. Krishnamurthy, S., Wu, C., Franklin, M.J.: On-the-fly sharing for streamed aggregation. In: 2006 ACM SIGMOD international conference on Management of data, pp. 623–634. ACM Press, Chicago (2006) 7. Toman, D.: On Construction of Holistic Synopses under the Duplicate Semantics of Streaming Queries. In: 14th International Symposium on Temporal Representation and Reasoning (TIME 2007), pp. 150–162. IEEE Press, Alicante (2007) 8. Kyriakos, M., Spiridon, B., Dimitris, P.: Continuous monitoring for top-k queries over sliding windows. In: 2006 ACM SIGMOD international conference on Management of data, pp. 635–646. ACM Press, New York (2006) 9. Wang, L., Lee, Y.K., Ryu, K.H.: Supporting Top-k Aggregate Queries over Unequal Synopsis on Internet Traffic Stream. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds.) APWeb 2008. LNCS, vol. 4976, pp. 590–600. Springer, Heidelberg (2008)
Privacy Preserving Sequential Pattern Mining in Data Stream Qin-Hua Huang Modern Education Technique Center, Shanghai University of Political Science and Law 201701 Shanghai, China [emailprotected]
Abstract. The privacy preserving data mining technique researches have gained much attention in recent years. For data stream systems, wireless networks and mobile devices, the related stream data mining techniques research is still in its’ early stage. In this paper, an data mining algorithm dealing with privacy preserving problem in data stream is presented. Keywords: Privacy preserving, Data stream, Data mining.
1 Introduction The network information comes from vary kinds of sources, including web sites, news servers, BBS, etc. In the procedure of data stream application the server receives the queries requested by the clients. Then server constructs synopsis structures and executes queries on it. In the end the server returns the query result. Since there is big importance of stream data applications, the data mining problems in data stream are proposed and researched [1][2][3]. Meanwhile the problems of privacy preserving data mining research is in its’ early stage [5][6]. But the researches didn’t take data stream security problem into consideration. We considered the privacy preserving problem in the pattern mining in data stream. The topics related are researches on private search [8][9]. Ostrovsky first proposed the question of private search in data stream. The search is carried on secretly by the data stream server in the condition of preserving the customer’s query secret. Based on their work, John Bethencourt applied Paillier’s homomorphic encryption system [7] and bloom filter technique to propose secret query problems. Our work is further on this way. We put forward a method of mining sequential patterns in data stream under the condition of privacy preserving.
2 Problems Definition Suppose a server processing data stream I, the customer wants to discover his interested sequential pattern sequence on the server privately. In the end of computation the server can not get the detailed information about the interested sequential patterns, while the customer can make it under the condition of privacy preserving. The procedure can be described in Figure 1. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 69–75, 2008. © Springer-Verlag Berlin Heidelberg 2008
70
Q.-H. Huang
Fig. 1. Model of privacy preserving sequential pattern mining in data stream
3 Methodology As a direct method to solving the problem of privacy preserving sequential pattern mining, the customer can download the stream data and execute knowledge discovery on the data locally. For we have mentioned previously, data stream is continuous, boundless and fast arrived, and the bandwidth is limited between the server and customers, this method can not practically applied. Our tactics is to apply a primary rough discovery on the server and constructs a LSP-tree synopsis [10]. During the procedure this synopsis is updated in real time on the server. The customer encrypts his query sequence and send to the server. When received the specified encrypted sequence, server executes a second discovery on the LSP-tree to find the patterns which meet the customer requirements in a privacy preserving way. These sequential patterns will be sent back to customer. The customer receives the returned data and reconstructs the query answers. Detailed algorithm can be described in three steps as follows. 3.1 Algorithm Outline At the beginning of our algorithm the client generates a encryption pair, (Keypub, Keypriv). Then they encrypt support, min_supp, and father sequence. These encrypted sequences will be sent to server with the Keypub. After received the encrypted sequences the server will mine support in privacy preserving condition. Server compares the two sequential models and accumulates the support count. When the server receives the message, it searches the sequence in LSPtree. This goal can be reached within two steps. Firstly the server encrypts the LSPtree sequential model with public key and compares it with the query model. The result should be the encryption of 0 or 1. Secondly the server accumulates the product of sequential model support and the comparison result.
Privacy Preserving Sequential Pattern Mining in Data Stream
71
Fig. 2. The process of privacy preserving sequential pattern mining in data stream
In the end of algorithm the customer gets the result. When the customer receives the returned data, it decrypts the data to get the support of specified sequential pattern. The process outline is presented in Figure 2. 3.2 Customer Sequence Encryption Firstly the customer transfers its sequential pattern into a sequence composed by 1 and 0 in the order of lexicographic order. Suppose the dictionary of all items is set D, D={i1; i2;…; i|D|}, the query sequence is Qseq=(e1, e2,…, em), where e is the event in sequence in the time order, m is the account of events in sequence, |D| is the size of dictionary. Set the sequence length to |D|*m. Let 1 be the event appeared in sequence, 0 means the corresponding event doesn’t appear. Arranged the events in lexicographic order, a sequence of 0,1 is generated. The customer applies the Paillier’s homomorphic encryption algorithm to generate key pair (Keypub, Keypriv). Encrypt the sequence with public key Keypub. Set the encrypted sequence SQ= E(q1);E(q2);…;E(qN), where qN=|D|*m. Then the encrypted sequence SQ is send to the server, along with the encryption public key. Note that the plain text can be encrypted into different cypher text with same Keypub. Here is an example. Suppose a customer need to query sequence (1,0,1). Apply homomorphic encryption on the sequence, we can get cypher text sequence (E(1), E(0), E(1)), as described in Figure 3.
Fig. 3. The customer encrypts the sequential pattern
72
Q.-H. Huang
3.3 Server Compares Patterns In the first step, the server processes each LSP-tree sequential pattern. When receives the encrypted sequence and the public key, the server generates a encrypted sequence for each frequent sequential pattern in LSP tree. We call the encrypted LSP tree by SLSP. The server will process each sequence in different way by the bit value 1 or 0. For the bit value 1, apply Paillier’s homomorphic encryption to calculate E( ∏ qi ). For simplicity, suppose the first k qi’s value be 1. Use equation 1 to calculate the product of qi.
R1 = E (q1 ) E ( q2 )
E ( q ...)E ( qk )
= E (∏ qi )
(1)
Noted if exists any bit of value 1, the result of equation 1 will be E(0). For the bit value 0, it needs more steps. Primarily turn each bit value 0 into -1. Encrypt each bit value -1 to get each cypher text E(-1). Calculate each E(-1) plus correspond bit E(qi). We have:
⎧− 1 , qi = 0; E (−1) * E (qi ) = E (qi − 1) = ⎨ , qi = 1; ⎩0
(2)
Then apply the operation same with bit value 0 on each bit, we get: E ( q|D| −1 ) N
R0 = E (qk +1 − 1) E ( qk +2 −1)
| D|
= E ( ∏ (qi − 1))
(3)
i = k +1
Obviously, if exists any bit value 1, the result of equation 3 will be 0. If and only if all of query sequence bit value be 0, the result be 1 or -1. In the following step the server calculates results described in equation 4.
Pj
R0R1
E f ( x) .
.
(4)
where x denotes the bit value of LSP-tree sequence,
⎧q − 1 , x = 0; f ( x) = ⎨ i 1 , x = 1; ⎩ It is concluded easily only if when the value of x and the qi are same, Uj=E(1) or E(-1). In this case the two sequences are same with each other. Or else Uj=0. Set
⎧ E (−1) F ( x) = ⎨ 1 ⎩
, x = 0; . , x = 1;
(5)
Privacy Preserving Sequential Pattern Mining in Data Stream
73
Integrated the above steps, we get: ( E ( q|D| )∗F ( x ) )
μ j = (E (q1) ∗ F ( x) )N
(6)
.
Find support of sequential pattern j in LSP-tree, say to be suppj. Encrypted suppj to E(suppj). Secondly calculate support.Apply calculation presented in equation 6, we get: E ( supp j )
uj
⎧ E (supp j ) =⎨ ⎩ E (0)
, if sequence j is same with the query seq. .
, in other cases
(7)
For each frequent sequential pattern in LSP-tree, we calculate:
SUPP = ∏ μ j
E(supp j )
.
(8)
j
Send SUPP to customer. For our example query sequence in Figure 3, the detailed process is described in Figure 4.
Fig. 4. Detailed server processing example sequence
Suppose there are two sequential patterns in LSP, {1,0,1} and {1,1,0}. The corresponding customer sequence bits are denoted in different colors, respectively. R1 denotes the calculation of LSP sequence bit value 1 with corresponding customer sequence encrypted bit, while R1 denotes the bit value 0. Plus each sequential patters calculation results, we can conclude that only the support corresponding to query sequence will be reserved.
74
Q.-H. Huang
3.4 Customer Decrypts Support Customer decrypts the SUPP to get D(SUPP). We have:
(
D(SUPP) = D ∏ j μ j
E(supp j )
)
⎧⎪ D( E (1* supp j )), if query sequence same with j pattern in LSP =⎨ ⎪⎩ D( E (0 * supp j )), other cases ⎧supp j , if query sequence same with j pattern in LSP = ∑⎨ other cases ⎩0, For the sequential pattern unique character in LSP, D(SUPP) is the support of query sequence. For the case as mentioned previously, the customer executes calculation in Figure 5. Thus the correct query result can be achieved.
Fig. 5. Customer decrypts the query result
4 Analysis of Privacy Preserving and Communication Price In the process of customer encryption, for each different encryption with same key, the cypher text differ with others’. Therefore, the server can not get the plaintext of customer sequence by comparing other encryptions. In the process of server calculation, server only needs to calculate the cypher text by the value of LSP-tress sequence. From the calculation the server can’t conclude the plaintext. From customers’ side, for the factors of homomorphic encryption, equation 8 can be rewritten as: SUPP =
∏μ
Esupp j j
= E (0 + 0 + ... + supp j + ... + 0) .
(9)
j
where suppj is support of the query sequence. The customer decrypts SUPP, D(SUPP)=D(E(suppi))= suppi. Thus the correct result is achieved.
Privacy Preserving Sequential Pattern Mining in Data Stream
75
The communication price is included of |D| sending of cypher text and 1 returning encrypted query result, which is appropriate for our algorithm.
5 Conclusion In this paper we discussed that need to be considered when designing a data mining technique in data stream under privacy preserving condition. We reviewed how these problems arise, including some related topics. We proposed privacy preserving algorithm using our LSP-tree structure to mining stream data. Research in data stream data mining under privacy preserving is still in its early stage. To fully address the issues discussed in this paper would accelerate the process of developing data mining applications in data stream systems. As more of these problems are solved and more efficient and user-friendly mining techniques are developed for the end users, it is quite likely that in the near future data stream mining will play a key role in the business world.
References 1. Aggarwal, C., Han, J., Wang, J., Yu, P.S.: A Framework for Clustering Evolving Data Streams. In: Proc. 2003 Int. Conf. on Very Large Data Bases (VLDB 2003), Berlin, Germany (2003) 2. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining, AAAI/MIT (2003) 3. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in Data Stream Systems. In: Proceedings of PODS (2002) 4. Wright, R., Yang, Z.: Privacy-preserving Bayesian Network Structure Computation on Distributed Heterogeneous Data. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 713–718 (2004) 5. Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. ACM SIGMOD, 439-450 (2000) 6. Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB, pp. 487–499 (1994) 7. Paillier, P.: Public-key Cryptosystems Based on Composite Degree Residuosity Classes. In: Stern, J. (ed.) EUROCRYPT 1999. LNCS, vol. 1592, pp. 223–233. Springer, Heidelberg (1999) 8. Ostrovsky, R., Skeith, W.: Private Searching on Streaming Data. In: Shoup, V. (ed.) CRYPTO 2005. LNCS, vol. 3621, pp. 223–240. Springer, Heidelberg (2005) 9. Bethencourt, J., Song, D., Waters, B.: New Constructions and Practical Applications for Private Stream Searching. In: Proc. 2006 IEEE Symp. Security and Privacy (S&P 2006), p. 6 (2006) 10. Huang, Q.H.: Privacy Preserving Data Mining and Knowledge Discovery, Shanghai University Thesis, 60–70 (2007)
A General k-Level Uncapacitated Facility Location Problem Rongheng Li1 and Huei-Chuen Huang 2 1
Dept of Mathematics, Hunan Normal University, Changsha 410081, P.R. China [emailprotected] 2 Dept of Industrial and Systems Engineering, National University of Singapore, 1 Engineering Drive 2, Singapore 117576
Abstract. In this paper a general k-level uncapacitated facility location problem(k-GLUFLP) is proposed. It is shown that the 2-level uncapacitated facility location problem with no fixed cost(2-GLUFLNP) is strong NP-complete and a heuristic algorithm with worst case ratio of 3/2 is given for 2-GLUFLNP when the service costs are assumed to be in the metric space. We also present a randomized 3-approximation algorithm for the k-GLUFLP, when k is a fixed integer. Keywords: Approximation algorithm, Facility location, Complexity, k-level.
1 Introduction In the classical simple plant location or 1-level uncapacitated facility location problem, we have to select a set of facilities to set up and a set of clients for each facility to service so as to minimize the total cost of setting up the facilities and servicing the clients. In the last few years, a number of constant factor approximation algorithms have been proposed for this problem when the service cost is assumed to be in the metric space. The first approximation algorithm with a performance guarantee of 3.157 was given by Shmoys et al.[8]. Coupling with a local search phase with the LP rounding, Guha \& Khuller[4] improved the factor to be 2.408. Later Chudak \& Shmoys[2] further strengthened the LP rounding approach to obtain a 1.736approximation algorithm. Another interesting and elegant approach to obtain a constant factor approximation is via the primal-dual algorithm proposed by Jain \& Vazirani[6]. Jain et al.[5] gave a simple greedy algorithm with a performance guarantee of 1.61. This ratio was improved to 1.52 by Mahdian et al.[7], which is close to the lower bound of 1.463 proved by Guha and Kuller[4]. The classic k-level uncapacitated facility location problem is an extension of the 1level problem and can be described formally as follows. A set of clients, D, is given, and there are k sets of facilities, Fl, where facilities on level l may be located, 1≤ l ≤ k. The sets Fl, 1≤ l ≤ k, are pairwise disjoint. Each client j∈ D must be supplied by exactly one facility at each of the k levels. Aardal et al.[1] obtained a 3-approximation D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 76–83, 2008. © Springer-Verlag Berlin Heidelberg 2008
A General k-Level Uncapacitated Facility Location Problem
77
algorithm for the k-level uncapacitated facility location problem. Zhang[9] proposed a 1.77-approximation algorithm for the case of k=2. In this paper we relax the assumption that the sets of facilities are prefixed as designated distribution levels and call it a general k-level uncapacitated facility location problem(k-GLUFLP). It is clear that with this restriction removed, the solution would provide a much better cost-effective distribution network in the supply chain. In Section 3, we study the general 2-level uncapacitated facility location problem with no fixed cost(2-GLUFLNP). We show that it is NP-complete when the service costs are assumed to be in the metric space and a 3/2approximation algorithm is proposed for it. We also show that a linear program(LP) relaxation of this problem has a tight integrality gap of 3/2. In Section 4 we present a randomized 3-approximation algorithm for the k-GLUFLP, when k is a fixed integer.
2 Formulation for the k-GLUFLP A k-GLUFLP can be described formally as follows. A set of clients, D, and a set of facilities, F, are given. Each facility i ∈ F may be set up on at most one of the k levels. The cost of setting up a facility i on level l is
f i l , i∈ F, 1≤ l ≤ k. The cost of shipping
between any two points i,j∈F∪D is equal to cij . Each client j∈ D must be assigned to precisely one facility at each of the k levels. In the following, we refer to s=
(i1 , i 2 , L, ik ) as a feasible sequence of facilities, where il∈ F, l=1,2, …, k, are k different facilities. The set of all possible feasible sequences is denoted by S(k). Each client j∈ D must be supplied by exactly one feasible sequence s= (i1 , i 2 , L , ik ) ∈ S(k)
and
the
total
cost
incurred
by
this
assignment
is
equal
to
c sj = ci1i2 + ci2i3 + L + cik −1ik + cik j . Let xsj be equal to 1 if client j is assigned to the feasible sequence s, and 0 otherwise. Let
y il be 1 if facility i is set up at level l, and 0 otherwise. In the following, if
s= (i1 , i 2 , L , ik ) ∈S(k), we use sl to represent il and say that s uses facility sl(or il) on level l(l=1,2,…,k). By the notation defined above, the k-GLUFLP can be formulated as (P1). In (P1), constraints (2) ensure that each client is supplied by exactly one feasible sequence. Constraints (3) ensure that a facility is set up at level l if it is used to supply a client at level l. Constraints (4) ensure that a facility is set up on at most one level. It is easy to see that the model considered by Aardal et.al.[1] is a special case of the model proposed here as we can simply make the facility i can not be used at level l.
f i l to carry a very high setup cost when
78
R. Li and H.-C. Huang
(P1)
k
min
∑∑
f i j y ij +
l = 1 i∈ F
s.t.
∑x
= 1,
sj
∑ ∑c
sj
x sj
s∈ S ( k ) j∈ D
∀j ∈ D,
(2)
s∈ S ( k )
∑x
≤ y il ,
sj
∀ i ∈ F , j ∈ D , l = 1, 2 , L , k ,
s:s l = i k
∑y
l i
≤ 1,
(1)
∀i ∈ F ,
(3) (4)
l =1
x sj ∈ {0 ,1},
(5)
y il ∈ {0 ,1}.
We consider a relaxed LP of (P1) by removing its integral requirements and constraints (4). Denote this LP by (P2). The dual problem (P3) of (P2) is given as follow:
(P3)
s.t.
max
∑v
(6)
j
j∈ D
∑ω
≤ fi ,
l ij
∀ i ∈ F , l = 1, 2 , L , k ,
l
j∈ D
vj −
k
∑ω l =1
l sl j
≤ c sj ,
∀ s ∈ S ( k ), j ∈ D ,
ω ijl ≥ 0 .
(7) (8) (9)
It is known that, for a fixed k, we can solve (P2) and its dual (P3) in polynomial time. Throughout this paper, we make the following assumptions on costs unless specially mentioned: (a)
f i l ≥ 0, ∀ i∈ F, l=1,2,…,k;
(b) cij≥ 0, ∀ i,j∈ F∪ D;
(c) cij=cji, ∀ i,j∈ F∪ D, i.e., the service costs are symmetric; (d) cij ≤ cih+chj, ∀ i,j,h∈ F∪ D, i.e., the service costs satisfy the triangle inequality.
3 Computational Complexity and Algorithm of 2-GLUFLNP In this section we consider the 2-GLUFLNP, i.e.,
f i l =0, ∀ i∈ F, l=1,2. Without
loss of generality, we may assume that all facilities are set up. Thus the integer linear program becomes (P4). It is not difficult to see that solving this problem is equivalent to determining an optimal partition of the facility set F into 2 disjoint subsets with the understanding that each facility of the lth subset is set up as a facility on the lth level. In the following, we will show that the 2-GLUFLNP with the metric space property is NP-complete. This implies that our problem is strictly harder than the classic
A General k-Level Uncapacitated Facility Location Problem
79
k-level facility location problem under the assumption that P≠NP because the classic k-level facility location problem is trivial if fixed costs are all zero. 2
(P4)
s.t.
min
∑∑ l = 1 i∈ F
∑x
sj
f i j y ij +
∑ ∑c
sj
x sj
s∈ S ( 2 ) j∈ D
(10)
= 1,
∀j ∈ D,
(11)
≤ 1,
∀ i ∈ F , j ∈ D , l = 1, 2 ,
(12)
s∈ S ( 2 )
∑x
sj
s: s l = i 2
∑y
l i
= 1,
∀i ∈ F ,
l =1
x sj ∈ {0 ,1},
y il ∈ {0 ,1}.
(13) (14)
Theorem 1. The 2-GLUFLNP with the metric space property is NP-complete. Proof. We shall reduce the minimum dominating set problem to the 2-GLUFLNP. For a given undirected graph G=(V,E) with node set V and arc set E, a subset V’ of V is called a dominating set if V’ satisfies that, ∀v∈V\V’, ∃v’∈V’ such that (v,v’)∈ E. In the following of this paper, we use |S| to represent the cardinality of a set S. The minimum dominating set problem is: for a given undirected graph G=(V,E), to obtain a minimum dominating subset V’ of V, i.e., to obtain a dominating subset V’ of V with the smallest cardinality. This problem has been shown to be NP-complete[3]. Consider a given undirected graph G=(V,E). Without loss of generality, let us assume that G is connected. In the following we construct an instance of the 2GLUFLNP with the set of facilities F=V and the set of clients D=1× V from graph G, where 1× V={(1,v)|v∈ V}. We define the shipping cost between any two points in F
∪D as follows: ⎧1 c(v1 , v 2 ) = ⎨ ⎩2 ⎧1 c(v1 , (1, v)) = ⎨ ⎩2
∀v1 , v 2 ∈ V , (v1 , v 2 ) ∈ E , ∀v1 , v 2 ∈ V , (v1 , v 2 ) ∉ E , ∀v1 , v ∈ V , v1 = v, ∀v1 , v ∈ V , v1 ≠ v.
It is easy to check that the costs defined above satisfy the triangle inequality. We will show that the problem of finding a minimum dominating set of G is the same as the problem of solving a 2-GLUFLNP. Let C* and M represent the optimal value of the facility location problem and a minimum dominating set of G, respectively. We can show that C*=2|F|+|M|. Hence to find an optimal assignment for the 2-GLUFLNP is equivalent to find a minimal dominating set of G and the 2-GLUFLNP is NP-complete. By the same way used in obtaining the LP relaxation of (P1), now we consider a relaxed LP of (P4) by removing its integral requirements and the constraints (13).
80
R. Li and H.-C. Huang
Denote this LP by (P5). It is not difficult to see that the constraints (12) are redundant when the constraints (13) are removed. Hence (P5) can be expressed as follows. It is easy to see that the optimal value of (P5) provides a lower bound of the 2-GLUFLNP and an optimal solution x of (P5) can be obtained by a greedy method. That is, for any client j∈D, we select a feasible sequence s(j)∈S(2), where s(j) satisfies cs(j)j= min s∈S ( 2 ) {c sj } , and then set x sj =1 if s=s(j) and x sj =0 otherwise. It is obvious that x is integral and therefore we refer to any optimal solution of (P5) which is integral as an overlapped solution of the 2-GLUFLNP. For an overlapped solution x of (P5), we say that j∈ D is serviced by sequence s(j) if xs(j)j=1 and refer to
∑c
s( j) j
as the cost of the overlapped solution x in the following.
j∈D
(P5)
L=
∑ ∑c
sj
x sj
s∈ S ( 2 ) j ∈ D
∑x
s.t.
sj
= 1,
∀j ∈ D ,
s∈ S ( 2 )
x sj ≥ 0 .
(15) (16) (17)
For the 2-GLUFLNP, assuming that the service costs satisfy the properties of the metric space, we propose a 3/2-approximation algorithm by the following three steps: Step 1. First we solve (P5) to obtain an overlapped solution x for the given 2-GLUFLNP. Let L be the objective function value of x . From the solution x , we construct an undirected graph G( x )=(F,
E ) which is called the servicing graph of
x , where E ={(i1,i2)| i1, i2∈ F, ∃ j∈ D such that xsj=1, where, s = (i1 , i 2 ) ∈ S ( 2) or s = (i 2 , i1 ) ∈ S (2) }. The length of an edge (i1,i2)∈ E is defined as the shipping cost between the two facilities i1 and i2. the overlapped solution
Step 2. In this step we construct an overlapped solution x’ such that its servicing graph is contained in MSP(G( x )), where MSP(G( x )) is a minimum spanning forest of G( x ). For any client j∈ D, suppose j is serviced by s ( j ) the value of
= (i1 , i2 ) . We will set
x ′sj , s∈ S(2), according to the following two cases:
Case 1. (i1,i2) is an edge of MSP(G( x )). In this case, we set
x ′sj := x sj for any s∈S(2). It is obvious that the cost doesn't in-
crease in this case. Case 2. (i1,i2) is not an edge of MSP(G( x )). In this case, a unique cycle C will be produced if we add this edge (i1,i2) to MSP(G( x )) because MSP(G( x )) is a spanning forest of G( x ). Suppose the cycle is C= (i1, i2, i3,…,ir-1, ir=i1). Let
s ′ = (i3 , i2 ) . Then we set x ′sj :=1 if s = s ′ ,
A General k-Level Uncapacitated Facility Location Problem
81
otherwise x ′sj :=0. Since MSP(G( x )) is a minimum spanning forest of G( x ) , it implies that edge (i1, i2) is the longest edge on cycle C. Thus we have
c s ( j ) j = ci1i2 + ci2 j ≥ ci2i3 + ci2 j = c s′j . Case 1 and case 2 mean that the cost of x’ is not more than the cost of x and hence x’ is an overlapped solution with its servicing graph contained in MSP(G( x )). Step 3. In this step we construct an integer solution (x,y) of the 2-GLUFLNPwith cost of at most 3/2 times of the cost of x’. Since G(x’)=MSP(G( x )) is a spanning forest of G( x ), it is a bipartite graph. Thus we can partition F into two disjoint subsets F1 and F2 such that all of the edges of G(x’) are between F1 and F2. Let S1={ (i1 , i2 ) |i1∈ F1, i2∈ F2} and S2={ (i1 , i2 ) |i1∈ F2, i2∈ F1}. Then the cost of overlapped solution x’ is ∑∑ c sj x sj′ + ∑∑ c sj x ′sj . Without loss of generality, sups∈S1 j∈D
pose
∑∑ c
sj
x sj′ ≥
s∈S1 j∈D
y
2 i :=1
∑∑ c
s∈S 2 j∈D
1
sj
2
1
x ′sj . Then we set y i :=1, y i :=0 if i∈ F1 and y i :=0,
s∈S 2 j∈D
if i∈ F2, i.e., we assign F1 and F2 to level 1 and level 2, respectively. For any
j∈ D, let s ′( j ) = (i1 ( j ), i 2 ( j )) be the sequence which services client j in the overlapped solution x’, i.e., x ′s′ ( j ) j =1 and x ′sj =0 for all s≠ s ′( j ) . Let D(Sl)={j|j∈ D,
s ′( j ) ∈ Sl}, l=1,2. We set xsj according to the following cases: Case 1. i1(j) ∈ F1, i2(j) ∈ F2 , i.e., In this case we set xsj:=
x ′sj . The service cost of client j doesn't increase.
Case 2. i1(j) ∈ F2, i2(j) ∈ F1 , i.e., In this case we set
s ′( j ) ∈ S1, j∈ D(S1). s ′( j ) ∈ S2, j∈ D(S2).
x s ( j ) j :=1 and xsj:=0 for all s ≠ s ( j ), where s ( j ) =
(i 2 ( j ), i1 ( j )) ∈ S1 . By the triangle inequality, we have the service cost of client j on the new assignment:
c s ( j ) j = ci2 ( j )i1 ( j ) + ci1 ( j ) j ≤ ci2 ( j )i1 ( j ) + ci1 ( j )i2 ( j ) + ci2 ( j ) j ≤ 2c s′( j ) j . This means that the service cost of client j in solution (x,y) is at most two times of its service cost in the overlapped solution x’. Considering the total service cost in solution (x,y), we have
∑ ∑c
s∈ S ( 2 ) j∈ D
≤
sj
x sj =
∑ ∑c
sj
x sj +
s ∈ S 1 j∈ D ( S 1 )
3 ( ∑ ∑ c sj x ′sj + 2 s∈ S 1 j ∈ D
∑∑c
s∈ S 2 j ∈ D
∑ ∑c
sj
x sj ≤
s ∈ S 1 j∈ D ( S 2 )
sj
x ′sj ) ≤
3 2
∑ ∑c
s∈ S ( 2 ) j ∈ D
∑ ∑c
sj
s∈ S 1 j ∈ D ( S 1 )
sj
x sj =
3 L. 2
x ′sj + 2 ∑
∑c
s∈ S 2 j ∈ D ( S 2 )
sj
x sj′
82
R. Li and H.-C. Huang
Hence we obtain the following theorem: Theorem 2. For the 2-GLUFLNP, the steps described above yield a 3/2approximation algorithm. The following example shows that the analysis for our algorithm is tight, i.e. 3/2 is the best ratio one could obtain by using the overlapped cost as a lower bound. We consider the following general 2-level problem with no fixed cost. F={a,b}, D={1,2}, cab=1, c1a=0, c2b=0, c1b=1, c2a=1, c12=1. We also assume that the costs are symmetric. It is easy to check that the triangle inequalities are satisfied and the optimal total cost is 3. However the overlapped solution has a total cost of 2. Hence together with Theorem 2, we have: Theorem 3. (P5), an LP relaxation of (P4), has an integrality gap of 3/2.
4 An Algorithm for the k-GLUFLP The following randomized algorithm is proposed by Aardal et al.[1] for the classic klevel uncapacitated facility location problem. We will show that it can be applied to (P1) to produce an integer feasible solution with expected cost not more than three times of the optimal cost of (P2). The Randomized Algorithm (A): We start by solving (P2) and its dual program (P3) to get their optimal solutions ( x, y ) and ( v, w ), respectively. Let c j =
∑
s∈S ( k )
c sj x sj ,∀i ∈ D. Initially we set D := D , x := x and y := y. ∀j ∈ D , we
define: S ( j ) = {s ∈ S (k ) | x sj > 0} and
F ( j ) = {i ∈ F | i belongs to at least one
feasible sequence in S (j) }. In each iteration, we select a client minimum value of
j ∈ D with the
v j + c j . Let jt denote the client chosen in iteration t and refer to it
as the center client in this iteration. We define
Ft = F jt ;
S t = S jt ;
Dt = { j ∈ D | F j I Ft ≠ Φ}.
It is obvious that Dt is not empty because jt∈ Dt. Next we select a feasible st∈St with probability x st jt and call st as the selected feasible sequence. Round all variables
y il , i = s tl , to 1 and all variables yil (i ∈ Ft , i ≠ s tl ) to 0. We assign every client in Dt to this selected feasible sequence st; that is, for j∈Dt we set
x st j =1 and xsj=0 for s∈
S(k)\{st}. And then we proceed to the next iteration by setting iterate this process until
D := D \ Dt . We
D = Φ.
Lemma 4. When the above algorithm (A) terminates, we get a feasible integer solution of problem (P1). Proof. First we show that (x,y) defined by the algorithm continues to be feasible to (P2) at each iteration. This can be shown by the same way used by Aardal et.al.[1].
A General k-Level Uncapacitated Facility Location Problem
83
Next it is easy to show that when the algorithm terminates, (x,y) satisfies constraints (4) and (5) of (P1). Theorem 5. Algorithm (A) produces a feasible integer solution to (P1) with expected total cost not more than 3 times of the optimal value of (P2). Proof. Lemma 4 shows that algorithm (A) produces a feasible integer solution to (P1). As for the ratio of 3, it can be proved by the similar way used by Aardal et al.[1]. The above algorithm can be derandomized by a greedy method(referring to Aardal et al.[1] for a more detailed proof). We know that (P2) can be solved polynomially because k is fixed. Hence it is easy to see that our algorithm is polynomial when k is fixed.
5 Conclusion In this paper, a heuristic algorithm with worst case ratio of 3/2 is given for 2GLUFLNP. We also present a randomized 3-approximation algorithm for the kGLUFLP, when k is a fixed integer. But it is still an open problem to find an algorithm with worst case ratio less than 3 for k-GLUFLNP(k≥3).
References 1. Aardal, K., Chudak, F.A., Shmoys, D.B.: A 3-Approximation Algorithm for the k-Level Uncapacitated Facility Location Problem. Inform. Process. Lett. 72, 161–167 (1999) 2. Chudak, F.A., Shmoys, D.B.: Improved Approximation Algorithms for the Uncapacitated Facility Location Problem. SIAM J. Comput. 33, 1–25 (2003) 3. Garey, M.R., Johnson, D.S. (eds.): Computers and Intractability- A guide to the Theory of NP-Completeness. W.H. Freeman & Company, San Francisco (1979) 4. Guha, S., Khuller, S.: Greedy Strikes Back: Improved Facility Location Algorithms. J. Algorithm 31, 228–248 (1999) 5. Jain, K., Mahdian, M., Saberi, A.: A New Greedy Approach for Facility Location Problem. In: Reif, J. (ed.) Proceedings of the 34th ACM Symposium on Theory of Computing (STOC), Association for Computing Machinery, pp. 731–740 (2002) 6. Jain, K., Vazirani, V.V.: Primal-dual Approximation Algorithms for Metric Facility Location and k-median Problems. In: Proceedings of the 40th Annual IEEE Symposium on Foundations of Computer Science, pp. 2–13 (1999) 7. Mahdian, M., Ye, Y., Zhang, J.W.: Improved Approximation Algorithms for Metric Facility Location Problems. In: Jansen, K., Leonardi, S., Vazirani, V.V. (eds.) APPROX 2002. LNCS, vol. 2462, pp. 229–242. Springer, Heidelberg (2002) 8. Shmoys, D.B., Tardos, E., Aardal, K.I.: Approximation Algorithms for Facility Location Problems. In: Tom Leighton, F., Shor, P. (eds.) Proceedings of the 29th Annual ACM Symposium on Theory of Computing, pp. 265–274. ACM, New York (1997) 9. Zhang, J.W.: Approximating the Two-level Facility Location Problem Via a Quasi-greedy Approach. Mathematical Programming 108, 159–176 (2006)
Fourier Series Chaotic Neural Networks Yao-qun Xu and Shao-ping He Institute of System Engineering, Harbin University of Commerce, 150028 Harbin, China [emailprotected], [emailprotected]
Abstract. Chaotic neural networks have been proved to be strong tools to solve the optimization problems. In order to escape the local minima, a new chaotic neural network model called Fourier series chaotic neural network was presented. The activation function of the new model is non-monotonous, which is composed of sigmoid and trigonometric function. First, the figures of the reversed bifurcation and the maximal Lyapunov exponents of single neural unit were given. Second, the new model is applied to solve several function optimizations. Finally, 10-city traveling salesman problem is given and the effects of the non-monotonous degree in the model on solving 10-city traveling salesman problem are discussed. Seen from the simulation results, the new model we proposed is more effective. Keywords: Chaotic neural network, Fourier series, Trigonometric function.
1 Introduction Neural networks have been shown to be powerful tools for solving optimization problems, particularly NP-hard problems. The Hopfield network, proposed by Hopfield and Tank [1, 2], has been extensively applied to many fields in the past years. The Hopfield neural network converges to a stable equilibrium point due to its gradient decent dynamics; however, it causes sever local-minimum problems whenever it is applied to optimization problems. Several chaotic neural networks with non-monotonous activation functions have been proved to be more powerful than Chen’s chaotic neural network(CSA) in solving optimization problems, especially in searching global minima of continuous function and traveling salesman problems [3, 8]. The reference [4] has pointed out that the single neural unit can easily behave chaotic motion if its activation function is non-monotonous. And the reference [5] has presented that the effective activation function may adopt kinds of different forms, and should embody non-monotonous nature. In this paper, a new chaotic neural network model is presented to improve the ability to escape the local minima so that it can effectively solve optimization problems. The chaotic mechanism of this new model is introduced by the self-feedback connection weight. The activation function of the new chaotic neural network model is composed of Sigmoid and trigonometric function, therefore the activation function is non-monotonous. And because trigonometric function is a kind of basic function, the model can solve optimization problems more effectively. Finally, the new model is applied to solve both function optimizations and combinational optimizations and the effects of the non-monotonous degree in the model on D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 84–91, 2008. © Springer-Verlag Berlin Heidelberg 2008
Fourier Series Chaotic Neural Networks
85
solving 10-city TSP are discussed. The simulation results in solving 10-city TSP show that the new model is valid in solving optimization problems.
2 Fourier Series Chaotic Neural Network (FSCNN) The model of Chen’s chaotic neural network can be described as follows:
xi ( t ) = f ( yi (t )) ⎤ ⎡n ⎢ yi ( t + 1) = kyi ( t ) + α ∑ wij x j ( t ) + I i ⎥ − zi ( t )( xi ( t ) − I 0 ) ⎥ ⎢ j =1 ⎥⎦ ⎢⎣ j ≠ i zi ( t + 1) = (1 − β ) zi ( t )
f ( yi ( t )) =
1 1 + exp(− yi (t ) / ε 0 )
(1)
(2)
(3) (4)
where i is the index of neurons and n is the number of neurons, xi(t) is the output of neuron i, yi(t) the internal state for neuron i ,wij the connection weight from neuron j to neuron i, Ii is the input bias of neuron i, α the positive scaling parameter for inputs, k is the damping factor of the nerve membrane (0
f (u ) = S1 (u) + S2 (u ) S1 (u ) =
1 1 + exp(− yi (t ) / ε 0 )
S2 (u ) = ω1 cos(ε 1u ) + ω 2 sin( ε 2u )
(5) (6) (7)
where ω1 , ω2 , ε 1 , ε 2 are the parameters of the trigonometric function. Fourier series chaotic neural network is described as (2), (3), (5), (6), (7). In this model, the variable zi (t ) corresponds to the temperature in the usual stochastic
86
Y.-q. Xu and S.-p. He
annealing process and the equation (3) is an exponential cooling schedule for the annealing. The chaotic mechanism is introduced by the self-feedback connection weight as the value of zi (t ) becomes small step by step. In this model, the parameter
ω1 , ω2 presents the non-monotonous degree of the activation function. Seen from the equations (5) and (6), it is concluded that the equation (5) is similar to the function of Sigmoid alone in form in the circumstance of the value of ω1 , ω2 being between 0
and 1 without consideration of the monotonous nature. So the parameter ω1 , ω2 presents a local non-monotonous phenomenon of the activation function. In other words, if the parameter ω1 , ω2 borders on 1, the non-monotonous phenomenon of the activa-
tion function is very apparent; otherwise, if the parameter ω1 , ω2 borders on 0, the non-monotonous phenomenon of the activation function is very weak.
3 Research on Single Neural Unit In this section, we make an analysis of the neural unit of the Fourier series chaotic neural networks. The single neural unit can be described as (8) ~ (10) together with (5) ~ (7):
x (t ) = f ( y (t ))
(8)
y (t + 1) = ky(t ) − z (t )( x(t ) − I 0 )
(9)
z (t + 1) = (1 − β ) z (t )
(10)
In one dimension dynamical system xn+1 = F ( xn ) it is useful to study the mean exponential rate of divergence of two initially close orbits using the formula:
1 n −1 dF ( x) λ = lim ∑ ln n →∞ n dx x = xi i =0
(11)
This number, called the Lyapunov exponent "λ", is useful for distinguishing among the various types of orbits. It works for discrete as well as continuous systems. When λ < 0 the orbit attracts to a stable fixed point or stable periodic orbit, and λ = 0 the orbit is a neutral fixed point (or an eventually fixed point), and λ > 0 the orbit is unstable and chaotic. In order to make the neuron behave transient chaotic behavior, the parameters are set as follows:
ε 0 =0.02, ε1 =2, ε 2 =2, ω1 =1/3, ω 2 =1/3, y (1) =0.283, z (1) =0.4, k =1, I 0 =0.65 The state bifurcation figures (in left) and the time evolution figures of the maximal Lyapunov exponent (in right) are respectively shown as Fig.1~Fig.4 when β =0.004 and β =0.002.
Fourier Series Chaotic Neural Networks
87
0.4 1.5
Ly apunov ex ponent
0.2
x
1
0 -0.2
0.5
-0.4
0 0
200
400
600
800
1000
-0.6 0
50
100
Fig. 1.
150
200
250
300
iterations
iterations
β =0.004
Fig. 2.
1.5
β =0.004
0.6 0.4 0.2
x
Ly apunov exponent
1
0.5
0 -0.2 -0.4 -0.6
0 0
200
400
600
800
iterations
Fig. 3.
β =0.002
1000
-0.8 0
100
200
300
400
500
600
iterations
Fig. 4.
β =0.002
Seen from the above state bifurcation figures, the neuron behaves a transient chaotic dynamic behavior. The single neural unit first behaves the global chaotic search, and with the decrease of the value of z (0,0) , the reversed bifurcation gradually converges to a stable equilibrium state. After the chaotic dynamic behavior disappears, the dynamic behavior of the single neural unit is controlled by the gradient descent dynamics. When the behavior of the single neural unit is similar to that of Hopfield, the network tends to converge to a stable equilibrium point. The simulated annealing parameter β affects the length of the reversed bifurcation.
4 Application to Continuous Function Optimization Problems In this section, we apply the Fourier series chaotic neural network to search global minima of the following function. The function is described as follows [7]:
f 2 (x1, x2 ) = (x1 − 0.7)2 [(x2 + 0.6)2 + 0.1] + (x2 − 0.5)2[(x1 + 0.4)2 + 0.15] (12)
88
Y.-q. Xu and S.-p. He
ε 0 =2.5, ε1 =20, ε 2 =10, k =1, ω1 =0.1, ω 2 =0.05, β =0.05, α =0.4, z1 (1) = z2 (1) =0.3, y1 (1) = y2 (1) =0.283, I 0 =0.65. The The parameters are set as follows:
time evolution figure of the energy function of FSCNN in solving the function is shown as Fig.5. 0.04
y
0.03
0.02
0.01
0 0
50
100
150
200
250
300
iterations
Fig. 5. Time evolution figure of energy function
The global minimum and its responding point of the simulation are respectively 1.3236e-016 and (0.7, 0.5). This section indicates that FSCNN has a good performance to solve function optimization problems. In order to testify the performance of FSCNN, the new model is applied to solve 10-city traveling salesman problems.
5 Application to TSP A solution of TSP with N cities is represented by N*N-permutation matrix, where each entry corresponds to output of a neuron in a network with N*N lattice structure. Assume Vxi to be the neuron output which represents city x in visiting order i . A computational energy function which is to minimize the total tour length while simultaneously satisfying all constrains takes the follow form: E =
A 2
n
n
∑ (∑ V x =1
i =1
xi
− 1) 2 +
B 2
n
n
∑ (∑ V i =1
x =1
xi
− 1) 2 +
D 2
n
n
n
∑∑∑
x =1 y =1 i =1
d xy V xi V y , i + 1
(13)
A and B(A=B) are the coupling parameters corresponding to the constrains and the cost function of the tour length, respectively. d xy is the distance between city x and city y . 5.1 Application to 10-City TSP This paper adopts the 10-city problem, the shortest distance of the 10-city is 2.6776. The models with different values of ω1 , ω2 in solving 10-city TSP are analyzed as follows. The parameters of the network are set as follows:
ε1 =20, ε 2 =10, ε 0 =1/30,
Fourier Series Chaotic Neural Networks
89
k =1, α =0.6, z (1) =0.1, I 0 =0.2,A=1.4,D=1.5, Δt =0.04. 2000 different initial conditions of
yij are generated randomly in the region [0, 1] for different β . The results
are summarized in Table1; the column ‘NL’, ‘NG’, ‘LR’ and ‘GR’ respectively represents the number of legal route, the number of global optimal route, the rate of legal route, and the rate of global optimal route. Seen from Table 1, the follow observations can be drawn according to numerical simulation test: First, the model with smaller ω1 , ω2 s such as ω1 =0.01, ω2 =0.01;
ω1 =0.02, ω2 =0.01; ω1 =0.025, ω2 =0.01; ω1 =0.01, ω2 =0.025; ω1 =0.02, ω2 =0.02,
in solving 10-city TSP can all converge to the global minimum. But, it is not true that the smaller the parameter ω1 , ω2 is, the more powerful the ability to solve 10-city.
ω1 =0.02, ω2 =0.02 can all converge to the the parameter ω1 =0.01, ω2 =0.01 can almost
Because, for example, the parameter global minimum as β =0.001 while
converge to the global minimum as β =0.0008.
Second, with the decrease of the value of ω1 and ω2 , the value of ‘NG’ becomes
ω1 =0.01, ω2 =0.005) as β =0.008. In other word, with the decrease of the value of ω1 and ω2 , the ability to large gradually from 1927(
ω1
=0.02,
ω2
=0.02)to 1994(
get global optimal route becomes strong. Third, when the parameter ω1 =0, ω2 =0, the network is Chen’s chaotic neural network. The results of compare Chen’s chaotic neural network with Fourier series chaotic neural network in Table1, can conclude that Fourier series chaotic neural network is more powerful in solving TSP in same parameters. However, as is analyzed in second, the ability in reaching ‘NG’ when the parameter ω1 =0.01, ω2 =0.025and ω1 =0.025, ω2 =0.01is weaker than that
of ω1 =0.01, ω2 =0.01 when β =0.008. So, which model is needed is connected with the concrete request. However, in order to get the tradeoff effect, the value of ω1 =0.01, ω2 =0.01 may be chose. The set of these parameters in the chaotic neural network is without a priori knowledge, so this is a difficulty of the application of chaotic neural network. In this paper, the parameters of Fourier series chaotic neural network are based on a great lot of experimentations. As the test result is not based on the theoretical analysis, the relationship between ω1 , ω2 and the other performance need to be studied further. 5.2 Application to 30-City TSP This paper adopts the following 30-city coordinates:(41,94),(37,84),(54,67), (25, 62),(7,64),(2,99),(68,58),(71,44),(54,62),(83,69),(64,60),(18,54),(22,60),(83,46),(91,38 ),(25,38),(24,42),(58,69),(71,71),(74,78),(87,76),(18,40),(13,40),(82,7),(62,32),(58,35) ,(45,21),(41,26),(44,35),(4,50). The shortest distance of the 30-city is 423.7406.
90
Y.-q. Xu and S.-p. He Table 1. Results of 2000 different initial conditions for each value β on 10-city TSP
ω1 , ω2 ω1 =0, ω2 =0 ω1 =0.001, ω2 =0.001 ω1 =0.005, ω2 =0.01 ω1 =0.01, ω2 =0.01 ω1 =0.02, ω2 =0.01 ω1 =0.025, ω2 =0.01 ω1 =0.01, ω2 =0.005 ω1 =0.01, ω2 =0.02 ω1 =0.01, ω2 =0.025 ω1 =0.02, ω2 =0.02
β 0.008 0.001 0.0008 0.008 0.001 0.0008 0.008 0.001 0.0008 0.008 0.001 0.0008 0.008 0.001 0.0008 0.008 0.001 0.0008 0.008 0.001 0.0008 0.008 0.001 0.0008 0.008 0.001 0.0008 0.008 0.001 0.0008
NL 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000
NG 1774 1620 1563 1983 1994 1995 1994 1994 1998 1998 1998 2000 1945 1996 2000 1931 2000 2000 1994 1995 1995 1928 1998 1997 1923 2000 2000 1927 2000 2000
LR 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%
GR 88.7% 81% 78.15% 99.15% 99.7% 99.75% 99.7% 99.7% 99.9% 99.9% 99.9% 100% 97.25% 99.8% 100% 96.55% 100% 100% 99.7% 99.75% 99.75% 96.4% 99.9% 99.85% 96.15% 100% 100% 96.35% 100% 100%
The parameters of the network are set as follows: ε 1 =20, ε 2 =20, ε 0 =1/250,
k =1,
α =0.6, z (1) =0.1, I 0 =0.2, A =1, D =1, β =0.001, Δt =0.5, ω1 =0.01, ω2 =0.01. 100 different initial conditions of
yij are generated randomly in the region [0, 1]
for different β . The results are summarized in Table 2; the column ‘NG’ and ‘GR’ respectively represents as before, the mean of legal route (ML). Seen from table2, Fourier series chaotic neural network can be application to 30city TSP and meanwhile it is more powerful than Chen’s chaotic neural network in solving 30-city TSP in same parameters.
Fourier Series Chaotic Neural Networks
91
Table 2. Results of 100 different initial conditions on 30-city TSP
arithmetic FSCNN CSA
NG 25 17
GR 25% 17%
ML 429.37 430.05
6 Conclusions The presented chaotic neural network called FSCNN is proved to be effective in solving optimization problems, and in the section of application to 10-city TSP, the model with different ω1 , ω2 is analyzed and made a comparison. As a result, the simple rule of the model is disclosed. However, there are a lot of points in the model needed to be studied. Acknowledgments. This work is supported by Program for New Century Excellent Talents In Heilongjiang Provincial University (1153-NCET-008) and the Nature Science Foundation of Heilongjiang Province (F2007-15).
References 1. Hopfield, J., Tank, D.W.: Neural Computation of Decision in Optimization Problems. Biol. Cybern. 52, 141–152 (1985) 2. Hopfield, J.: Neural Networks and Physical Systems with Emergent Collective Computational Abilities. In: Proc. Natl. Acad. Sci., vol. 79, pp. 2554–2558 (1982) 3. Xu, Y., Sun, M., Duan, G.: Wavelet Chaotic Neural Networks and Their Application to Optimization Problems. In: Adi, A., Stoutenburg, S., Tabet, S. (eds.) RuleML 2005. LNCS, vol. 3791, pp. 379–384. Springer, Heidelberg (2005) 4. Kali, M.: A Potapove Robust chaos in neural networks. Phys. Lett. A 277(6), 310–322 (2000) 5. Shuai, J.W., Chen, Z.X., Liu, R.T.: Self-evolution Neural Model. Phys. Lett. A 221(5), 311– 316 (1996) 6. Chen, L., Aihara, K.: Chaotic Simulated Annealing by a Neural Network Model with Transient Chaos. Neural Networks 8(6), 915–930 (1995) 7. Wang, L.: Intelligence optimization algorithm and its application. Press of TUP (2001) 8. Xu, Y., Sun, M.: Gauss Morlet Sigmoid Chaotic Neural Networks. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS, vol. 4113, pp. 115–125. Springer, Heidelberg (2006)
Numerical Simulation and Experimental Study of Liquid-Solid Two-Phase Flow in Nozzle of DIA Jet Guihua Hu, Wenhua Zhu, Tao Yu, and Jin Yuan CIMS & Robot Center of Shanghai University, Shanghai 200072, China [emailprotected]
Abstract. The velocity of abrasive particles at the nozzle exit of Direct Injection Abrasive (DIA) Jet is a key factor affecting cutting capacity of jet. The powerful Computational Fluid Dynamics (CFD) analysis software Fluent is applied to numerical simulation of liquid-solid two-phase flow in the hard alloy nozzle of different cylindrical section length under a certain conditions. The optimum ratio of diameter to length is obtained when the particle velocities are the largest at the nozzle exit. The rule of velocity distribution of liquid-solid twophase flow of the optimum nozzle is analyzed. The numerical control cutting machine tool of DIA Jet is adopted to finish cutting experiments on different variety of materials. The analytic results of experiments verify the results of numerical simulation. Keywords: DIA Jet, numerical simulation, liquid-solid two-phase flow, the optimum ratio of diameter to length, cutting experiments.
1 Introduction DIA Jet is a new technology that has been developed for more than ten years. Its elementary principle is described as follows: first high-pressure generating equipment provides medium water with huge energy [1-3], then abrasives are put into highpressure water with delivery and mixing equipment. Compared with pure Water Jet, DIA Jet’s acting effect is largely improved. Compared with entrained Abrasive Water Jet (AWJ), DIA Jet’s working pressure can be largely reduced. Research shows that, when AWJ cuts material, kinetic energy of AWJ is one of important factors affecting the energy distribution process. Kinetic energy of AWJ is most influenced by the impact velocity of abrasive particles. Therefore, Research on improvement of particle velocities at the nozzle exit is of important meaning for the cutting capability of AWJ. Proper numerical simulation method is explored to ascertain influencing rule of geometric shape on particle velocities at the nozzle exit. It is very important foundation for technology theory research and engineering practical application of AWJ. CFD analysis is found to be a viable approach because direct measurement of particle velocities and visualization of particle trajectories are very difficult for the ultrahigh speed and small dimensions involved. At present, most of works about AWJ are concerned with the mechanism of entrained AWJ while the mechanism of DIA Jet has D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 92–100, 2008. © Springer-Verlag Berlin Heidelberg 2008
Numerical Simulation and Experimental Study
93
,
little attention [4-6]. Even if any only the distribution rule of liquid and solid twophase velocities in nozzle and qualitative optimization of nozzle’s geometrical shape are studied. Research on quantitative optimization of nozzle’s geometrical shape has little attention. This paper applies the methods of numerical simulation and experimental research on DIA Jet and attains the optimum ratio of diameter to length under a certain conditions, which makes particle velocities the biggest at the nozzle exit. i.e. the cutting capacity of abrasive particles is strongest. By research, theoretical basis is provided for technological research and engineering application of AWJ.
2 Theoretical Analysis of Liquid-Solid Two-Phase Flow in the Nozzle Nozzle is generating part of DIA Jet, whose function is to change static pressure provided by high-pressure generating equipment into dynamic pressure of AWJ. The structure of nozzle is shown in Fig.1. In Fig, D is inlet nozzle diameter, θis one-half of convergence angle, L is the length of nozzle, l is the cylindrical section length of nozzle d is the cylindrical section diameter of nozzle. Pressure loss along distance of nozzle is given by
,
l ρ ΔP = λ ( ) × ( ) × V 2 d 2
(1)
Fig. 1. The structural simple figure of nozzle
Mass flow is written as Q=
ρπd 2V
(2)
4
As above two equations, λ is coefficient of friction resistance, in the range of 3 × 10 3 < Re < 10 7 , a very precise result’s Prandtl-Kalman formula is given by 1 / λ = 2 lg(Re λ ) − 0.8 , in which Re is Reynolds number, Re = Vd / υ , υ is kinematic viscosity coefficient; ρis density of water; V is average velocity of water in the cylindrical section. Bernoulli equation is given by P+
1 1 ρw1 2 = P0 + ρw2 2 + ΔP 2 2
(3)
94
G. Hu et al.
Where w1 is average velocity of water at the nozzle inlet; w2 is average velocity of water at the nozzle exit; P is static pressure at the nozzle inlet; P0 is air pressure. By synthesizing (1), (2) and (3), when P is certain, w2 decreases with the increase of ΔP . Therefore, the longer the cylindrical section length is, the smaller average velocity at the nozzle exit is A relational expression between the velocity of abrasive particles and the velocity of water [7] is given by
du p dt
=
3 ρC D (u − u p ) 2 4ρ p d p
(4)
Where up is particle velocity; ρp is particle density; CD is drag coefficient; according to the formula of Newton, CD=0.44(Rep>1000), Re p = u − u p d p / υ ; dp is particle diameter; u is average velocity of water in the cylindrical section of nozzle. By integrating on equation (4) and acquire: u p (t ) = u −
Where
1 ⎛ 3 ρCD ⎞ ⎜ ⎟t + c0 ⎜ 4ρ d ⎟ ⎝ p p⎠
(5)
c0 is integration constant.
According to equation (5), particle velocities in the cylindrical section of nozzle approach to water velocities with the increase of time. Therefore, the longer cylindrical section length is, the bigger particle velocities at the nozzle exit are. In a word, under a certain diameter nozzle and hydraulic pressure, there is an optimum length of cylindrical section at the nozzle exit where particle velocities reach the biggest, which in turn leads to the strongest cutting capacity.
3 Building Model and Numerical Method of Liquid-Solid Two-Phase Flow in the Nozzle In order to look for optimum length of cylindrical section of nozzle under a certain condition, the paper uses Fluent to numerically simulate liquid-solid two-phase flow. For different cylindrical section length nozzles, the particle velocities at the nozzle exit are computed. 3.1 Physical Model of Liquid-Solid Two-Phase Flow in the Nozzle According to the structure of nozzle and jet characteristic, the model of numerical simulation can be built in the light of half an axis section, and some hypotheses are as follows: (1) Water is as continuous medium; (2) Water is as incompressible fluid;
Numerical Simulation and Experimental Study
95
(3) Abrasive particles are looked upon as rigid spherical small equal diameter par ticles, and there is not mass exchange between the liquid and solid; (4) There is not heat exchange between liquid-solid two-phase flow and outside, and the temperature is not change; (5) Liquid-solid two-phase flow is steady turbulent flow. 3.2
Mathematical Model of Liquid-Solid Two-Phase Flow in the Nozzle
3.2.1 Mathematical Model of Liquid Phase Water It is an unfortunate fact that no single turbulence model is universally accepted as being superior for all classes of problems. Turbulent models adopted in the computing have mostly zero equation model, one equation model and two equations model. Among these equations, because the standard k − ε model is the most widely used turbulence model in engineering computations and will make the simulated results more accurate in the case of high Reynolds numbers, it is taken as the preferred turbulence model [8]. This paper selects axisymmetric physical model under the cylindrical coordinate, and makes use of standard k − ε model under high Reynolds number and builds closed mathematical model. Generalized control equation [9] is given by ∂ (ρuφ ) + 1 ∂(rρvφ ) = ∂ ⎛⎜ Γφ ∂φ ⎞⎟ + ∂ ⎛⎜ rΓφ ∂φ ⎞⎟ + Sφ ∂r ⎠ r ∂r ∂x ⎝ ∂x ⎠ ∂r ⎝ ∂x
(6)
Where φ is independent variable, u, v is axial and radial velocity of water, x, r is axial and radial coordinate, respectively, ρis density of water, Τφ is generalized diffusion coefficient. Sφ is source item. 3.2.2 Mathematical Model of Solid Phase Abrasive Particles According to the theory about two-phase, motion models of particles researched have ordinarily dynamics model of single particle, quasi-fluid model of particles (also called multi-fluid model) and trajectory model of grain (also called EulerianLagrangian mixed model). Currently, among simulations about two-phase, trajectory model of grain is the broadest application. Volume fraction of abrasive particles in the liquid-solid two-phase flow is from 1% to 7%. Abrasive particles is taken as discrete phase, ignoring the particle-particle interactions and the effect of abrasive particles on the continuum, adopting EulerianLagrangian model to build mathematical equations of discrete phase: dU pi dt
Where
U ci
=
3μC D Re p 4ρ p d p
2
(U ci − U pi )
(7)
and U pi are respectively velocity component of water and abrasive particles,
d p is diameter of abrasive particles, ρ p is density of abrasive particles, C D = 0.44 is drag coefficient.
96
G. Hu et al.
3.3 Mesh Division, Boundary Conditions and Numerical Method 3.3.1 Mesh Division 3.3.1.1 Confirming Computation Domain. Half a symmetric structure of nozzle is selected as computation domain. As shown Fig.2, computation domain of two-phase flow about fluid and solid in nozzle is composed of wall of nozzle, axis, inlet boundary and outlet boundary. r
Inlet
wall
Outlet Axis
x
Fig. 2. Computation domain and boundary conditions
3.3.1.2 Dividing Mesh. Building model and dividing mesh are all completed in the special fore treatment software gambit [10]. Mesh division from center of axis to wall and from the nozzle inlet to the nozzle exit is arranged from thin to dense. As to slender pipe, it is proper to divide quadrangle mesh. Blocking partition method is adopted to divide discrete computing domain of mesh of convergence section and cylindrical section, respectively. Fig.3 is the mesh sketch map of the half nozzle for the case of D =4mm, L =21mm, l =17mm and d =1.3mm.
Fig. 3. Mesh division sketch map of the half nozzle
3.3.2 Confirming Boundary Conditions [11] 3.3.2.1 Inlet Boundary Condition. Inlet boundary condition is confirmed as velocity inlet boundary. Firstly, mass flow of water at the nozzle exit is measured. Secondly, according to
Q=
ρπd 2V 4
, water velocity V is calculated when water enters nozzle. This
velocity is velocity inlet boundary condition of water and abrasive particles. 3.3.2.2 Outlet Boundary Condition. Pressure outlet boundary condition is applied to definition of the static pressure of flow outlet, because the static pressure of nozzle exit is air pressure, which is confirmed to pressure outlet boundary condition.
Numerical Simulation and Experimental Study
97
3.3.2.3 Wall Boundary Conditions. Wall boundary conditions are used to bound fluid and solid regions. In viscous fluid, the non-slip boundary condition is enforced at walls. Abrasive particles on the wall are complete elastic collision condition. 3.3.2.4 Axis Boundary Conditions. The axis boundary type must be used as the centerline of axisymmetric geometry. Any boundary conditions need not to be defined at axis boundaries. 3.3.3 Numerical Method Integral method of control volume is adopted to discretize control equations. Convection items are second-order upwind scheme. SIMPLE (Semi-Implicit Method for Pressure-Linked Equations) algorithm is used to solve pressure-velocity coupling. Wall function is used to simulate continuous phase.
4 Results and Analyses of Numerical Simulation The numerical simulation is carried out in 5 different length of cylindrical section nozzles, i.e. 17, 15, 13, 11, 9mm According to 3.3.2.1, the calculated water and particle velocities at the nozzle inlet are input Fluent and computed. Water and particle velocities are obtained at the nozzle exit. Fig.4 shows the relationship between water and particle velocities and the length of cylindrical section. Curve 1 shows velocity distribution of water, curve 2 shows velocity distribution of particles. According to curve 1, it is shown that water velocities at the nozzle exit gradually decrease with the increase of the length of cylindrical section. According to curve 2, when the length of cylindrical section of nozzle arrives at 13mm, particle velocities at the nozzle exit are the biggest. Namely, that length of cylindrical section is 13 mm is optimum. Therefore, that the ratio of diameter to length is 1/10 is optimum.
Fig. 4. The relationship between water and particle velocities and the length of cylindrical section. Inlet nozzle diameter= 4mm, convergence section length of nozzle=4mm, cylindrical section diameter of nozzle=1.3 mm, working pressure=30MPa, mass concentration of abrasive particles is 13%, particle diameter is mesh #60.
98
G. Hu et al.
Fig. 5. Velocity distribution of water in the 13mm length of cylindrical section
Fig. 6. Velocity trajectory of abrasive particles in the 13mm length of cylindrical section
Fig.5 shows velocity distribution of water in the 13mm length of cylindrical section. It is shown that water velocities accelerate rapidly (x ≤ 4mm) in the convergence section. However, in the cylindrical section (x ≥ 4mm), water velocities are stable. Fig.6 shows velocity trajectory of abrasive particles in the 13mm length of cylindrical section. As shown in Fig.6, abrasive particles are easily accelerated in the convergence section of nozzle. However, particle velocities’ acceleration is quite small in the cylindrical section. Besides, particle velocities’ acceleration is small in the beginning of convergence section of nozzle. While abrasive particles approach to the end of convergence section of nozzle, particle velocities rise very rapidly. Abrasive particles are accelerated all the way in the cylindrical section of nozzle, but the acceleration is very small.
5 Experimental Research 5.1 Basis of Experiments and Equipment Under a certain conditions, because cutting deep which is decided by kinetic energy of abrasive particles determined by particle velocities at the nozzle exit can represent the cutting capacity, the bigger cutting deep is, the bigger particle velocities at the nozzle exit is.Equipment of experiment is Numerical Control Machine tool of DIA Jet.
Numerical Simulation and Experimental Study
99
5.2 Design of Experimental Project The conditions is as follows: working pressure is 30MPa, mass concentration of abrasive particles is 13%, particle diameter is mesh #60, cylindrical section diameter of nozzle is 1.3mm, inlet diameter of nozzle is 4mm, convergence section length of nozzle is 4mm. 5 different length of cylindrical section nozzles of hard alloy have chosen, i.e. 17,15, 13, 11 and 9mm. Glass and Steel A3 are chosen as the cutting materials. The experimental process is described as follows: glass and Steel A3 is respectively cut by 5 kinds of nozzles mentioned above on the condition that cutting speed is same for the same material and cutting times of per kind of nozzle are 4 for per material. 5.3 Results and Analyses of Experiments This paper adopts average value that comes from 4 times measured cutting deep values of glass and Steel A3 respectively. Fig.7 shows the relationship between cutting deep of Glass and Steel A3 and the length of cylindrical section of nozzle. The curve marked “Steel A3” and “Glass” all indicate that, when the length of cylindrical section of nozzle arrives at 13mm, cutting deep is the biggest. Therefore, length of cylindrical section is 13mm is optimum. So results of experiments verify the results of numerical simulation.
Fig. 7. The relationship between cutting deep of Glass and Steel A3 and the length of cylindrical section of nozzle
6 Conclusions The paper applies numerical simulation and experimental method to research on liquid-solid two-phase flow in the hard alloy nozzle of DIA Jet. The results are as follows: (1) Based on qualitative analysis, under a certain conditions, there is an optimum length of cylindrical section at the nozzle exit where particle velocities reach the biggest, which in turn leads to the strongest cutting capacity.
100
G. Hu et al.
(2) Under a certain cutting parameters and geometrical structures of nozzle, by numerical simulation of the nozzles of different length of cylindrical section, cylindrical section 13mm in length is optimum is acquired, i.e. that the ratio of diameter to length is 1/10 is optimum, which verifies the conclusion (1). The rule of velocity distribution of liquid-solid two-phase flow of cylindrical section 13mm in length of nozzle provides meaningful studied method for improving cutting capacity of DIA Jet is brought forward. (3) The results of numerical simulation coincide with data of experimental studies, that is, experimental research verifies the results of numerical simulation. Acknowledgments. This work was supported by technology project of Shanghai Science and Technology Committee fund (No. 037252022) and Shanghai Leading Academic Discipline (Project No.Y0102).
References 1. Jia, M.F., Yu, T., Fang, M.L., Lin, J.S.: The Research and Manufacture of Numerical Control Abrasive Water-jet Machine Tool. Machine Engineer 4, 18–20 (2001) 2. Yu, T., Jia, M.F., Fang, M.L.: Parameter Model for Abrasive Water Jet Machining Based on Depth. Mechatronics 3, 29–30 (2002) 3. Xu, L.M., Jia, M.F., Yu, T.: The Research of Parameter Model for Abrasive Water Jet Machining. Mach. engineer. 6, 29–30 (2002) 4. Wang, M.B., Wang, R.H.: Numerical Simulation on Fluid-particle Two-phase Jet Flow Field in Nozzle. J. Univers. Petrol. 29(5), 46–49 (2005) 5. Li, J.Y., Xue, S.X., Zhou, Q.Y.: Numerical Simulation of Superhigh Pressure Water Jet in Rust Remover. Engineering Journal of Wuhan University 40(1), 48–57 (2007) 6. Wang, H.L., Gong, L.H., Wu, G.H.: Numerical Simulation of Pre-mixed Abrasive High Pressure Water Jet Cutting Nozzle. J. PLA Univ. Sci. Technol. 8(4), 387–390 (2007) 7. Zhou, L.X.: Two Phase of Turbulence and Combustion Numerical Simulation. Tsinghua University Press, Beijing (1991) 8. Chen, T., Huang, X.B.: Numerical Simulation of the Air Flow Field in the Melt Blowing Process. J. Dong Hua Univ. 19(4), 1–3 (2002) 9. Tao, W.Q.: Study of Numerical Heat-transfer. Xi’an Jiaotong University press, Xi’an (1988) 10. Gambit, Gambit 2.1 User Guide, Fluent Inc. 11. Fluent, Fluent 6.0 User Guide, Fluent Inc.
Shape Matching Based on Ant Colony Optimization Xiangbin Zhu College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua Zhejiang, 321004 [emailprotected]
Abstract. We propose a shape matching method for the fast retrieval of objects in 2D images. The algorithm is based on recent developments in ant colony optimization and skeleton match. The method has been implemented and performed experiments on some image data. Our experimental results showed characteristics of our method. In the end, the future research directions are discussed. Keywords: Shape Matching, Skeleton, ACO, Topology.
1 Introduction With the development of compute vision, CAD, Internet and so on, images and 3D object models are used in many diverse applications. So there has urgent need to object matching technology that is to search for similar shapes in a large database of designs or models. There are many shape matching methods, such as feature based methods, graph based methods and other methods. Some methods employ the distribution of moment, normal, cord, color, material and texture[1], volume-surface ratio, aspect ratio, moment invariants and Fourier transformation coefficient[2], shape signature, shape distribution[3][4], and so on. In this paper, we present a new shape matching method using skeleton and ant colon optimization. The method firstly gets the skeleton of object image and the skeleton is converted a skeleton tree. Secondly, features of each skeleton tree are extracted from skeletons. Based on the features, we can shape matching based on ant colon optimization. In the remainder of this paper we describe the novel shape match method. The next section starts with an overview of shape match. Section 3 introduces the skeletonization method and ant colon optimization. Finally, Section 4 presents the experiment results and Section 5 summarizes our work.
2 Related Work Object matching research in the 1980s culminated in systems that could detect occluded, non-convex shapes from binary edge images[5]. A method to searching all D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 101–108, 2008. © Springer-Verlag Berlin Heidelberg 2008
102
X. Zhu
image locations for matches is to extract features from the image that are at least partially invariant to the image formation process and matching only to those features. Many approaches to object matching [6] represent the object by a set of features. They obtain excellent results for objects which are locally planar and have a distinctive texture[7]. In this section we discuss shape matching methods. We divide shape matching methods in three broad categories: 1) feature based methods, which involve the use of spatial arrangement of extracted features such as edge elements or junctions, (2) graph based methods and (3) brightness based, which make more direct use of pixel brightnesses. There are many feature-based approaches that can be classified into several types. One type method is based on global geometry feature. These methods employ area, circularity, eccentricity, compactness, major axis orientation, Euler number, and so on to get similarity of object images[8]. Other type method is based on transform domain feature. Moment based feature descriptors have evolved into a powerful tool for shape matching applications. Geometric moments present a low computational cost, but are highly sensitive to noise. Furthermore reconstruction is extremely difficult. Although not invariant under rotation, Hu's invariants [9] that are derived from geometric moments present invariance under linear transformations. Moments of orthogonal polynomial basis were proposed by Teague [10]. They have proven less sensitive to noise, are natively invariant to linear transformations and can be effectively used for image reconstruction. Moments of discrete orthogonal basis have been proposed by Mukundan [11] . They are fast to implement, present adequate noise tolerance and very accurate image reconstruction. Feature-based approaches involve the use of spatial arrangements of extracted features such as edges or junctions. Silhouettes have been described and compared using Fourier descriptors, Wavelet descriptors e.g. [12]. However there are many common objects where texture or colour cannot be used as a cue for matching.
3 Ant Colon Optimization for Skeleton Matching In this section, we introduce how to use ACO and skeleton for shape matching. The aim of the skeletonization is to extract a region-based shape feature representing the general form of an object. The skeleton is a nice shape descriptor because it can be utilized in the following ways: part matching, intuitiveness, visualization and articulation. The skeleton has the same topology as the original object. It located at the medial axial of objects and can express topology information and shape information. The steps in the skeleton matching process include: skeletonization, computing a set of skeletal nodes, connecting the nodes into a graph, graph matching. The graph matching is done by assigning to each non-terminal node a vector representing the eigenvectors of the subgraph adjacency matrix rooted at that node[24]. 3.1 Topological Similarity The skeleton tree can be represented as a {0,1} adjacency matrix, with 1’s indicating adjacent nodes in the tree. Give a skeleton tree: T=(V,E), while V is the set of nodes,
Shape Matching Based on Ant Colony Optimization
103
n=|V| is the sum of nodes, E is the set of edges. So we can define adjacency matrix A to be a n×n symmetrix with its (i,j)-th entry Ai,j equal to 1 if (i,j) E,and 0 otherwise. Any skeleton subtree therefore defines a submatrix of the adjacency matrix. If, for a given skeleton subtree, we compute the eigenvalus of its corresponding submatrix, then the sum of the eigenvalues is invariant to any similarity transformation applied to the submatrix, This means that the eigenvalue sum is invariant to any consistent reordering of the subtrees. In terms of our largest subgraph isomorphism problem, finding the two skeleton subtree whose eigenvalue sums are clsest represents an approximation to finding the largest isomorphic subtrees Topological signature vector (TSV) is an important parameter for skeleton similarity. It is defined as following: For and node v V, letδ(v) be the degree of v, and let δ(T) be the maximum degree over all nodes in T. For each node u V, we defineχ(u) to be a vector in Rδ(T)-1,obtained through the following procedure: For any child v of u in T, construct the adjacency matrix Av of the induced subtree rooted at v, and for Av, compute the quantity v 1 v δ(v) v Constructχ(u) as the vector formed by for which v1 v1 vδ(u) vδ(u) For skeleton T1 and skeleton T2, we use the || || as the topological 1 2 similarity between node u1 in T1 and u2 in T2. || . || denots L2-norm.So the distance function : dT(u1,u2)= || 1 2 ||.
∈
.
∈
∈
χ(u )- χ(u )
λ =λ (A )+…+λ (A ). {λ ,…, λ } λ ≥…≥λ . χ(u )- χ(u )
3.2 Shape Similarity In this paper, shape similarity is about the set of joint points, which directly connect two endpoints or crossing points. The set of joint points construct the edge of skeleton. We can employ moment invariants to measure the shape similarity. The approach using invariant features appears to be the most promising. Its basic idea is to describe the objects by a set of features which are not sensitive to particular deformations and which provide enough discrimination power to distinguish among objects from different classes. The edge of skeleton can be expressed by 1D function f(r), while r is variable. If m is defined as the mean of f(r): l
m = ∑ ri f (ri )
(1)
i =1
So the n order moment to the mean is : l
μ n (r ) = ∑ (ri − m) n f (ri )
(2)
i =1
It is not enough if only employing the moment to measure the shape similarity. The reason is moment is only express local shape similarity in the above method. We should use global feature factor into measure function. The ratio of the length of one edge to
104
X. Zhu
total length of skeleton is a good global feature. Thus, the match distance of shape similarity is defined by: dS(g,h)=| μ g where
2
(r ) − μ h 2 (r ) |+|Sg-Sh|
(3)
g is an edge of skeleton T1; h is an edge of skeleton T2. Sg is the ratio of the g’s length to total length of T1; Sh is the ratio of the h’s length to total length of T2;
3.3 Skeleton Similarity Assume there are two skeleton trees T1 ={ui|i=1…m} and T2={vj|j=1..n}. ui and vj respectively denote the set of nodes. m and n are respectively the number of nodes. For each ui and vj, compute their matching distance: ω (i )
χ(u )- χ(u ) ||+| ∑ μ
d(i,j)= ||
i
j
g =1
where
ω( j)
ω (i )
ω( j)
h =1
g =1
h =1
g 2 ( r ) − ∑ μ h 2 ( r ) |+| ∑ S g − ∑ S h |.
ω(i) is the number of the i-node’s adjacent edge. ω(j) is the number of the j-node’s adjacent edge.
(4)
Then, we can get a distance matrix D={d(i,j)}m×n. Assume a {0,1} objective mapping matrix M={mi,j} m×n. If mi,j =1, ui and vj have a match relationship. Assume the i-th row vector of D is Di and Mi is the i-th row vector of M. The matching distance of T1 and T2 is defined as the following: D(T1,T2)=
∑D ×M i
i
(5)
i
So the task is to get the best match. The best match is the match, which has the smallest D(T1,T2) among all possible matches. Furthermore, each node of T1 must be not more than one match node of T2, and the same for the node of T2. Thus, the object function of constraint are defined as the following:
min ∑ Di × M i i
n
s.t. 0 ≤ ∑ mij ≤ 1, i = 1...m j =1 m
0 ≤ ∑ mij ≤ 1, j = 1...n i =1
mi j ∈ {0,1}, i = 1...m, j = 1...n
(6)
Shape Matching Based on Ant Colony Optimization
105
This is a {0,1} linear programming with constraint. There are lots of methods to solve the problem. 3.4 Ant Colon Optimization Ant Colony Optimisation (ACO)[13] is a multi-agent meta-heuristic for combinatorial optimization and other problems. It is inspired by the capability of real ants to find the shortest path between their nest and a food source. The key to this ability lies in the fact that ants leave a pheromone trail behind while walking. Other ants can smell this pheromone, and follow it. When a colony of ants is presented with two possible paths, each ant initially chooses one randomly, resulting in 50% going over each path. It is clear, however, that the ants using the shortest path will be back faster. So, immediately after their return there will be more pheromone on the shortest path, influencing other ants to follow this path. After some time, this results in the whole colony following the shortest path. The {0,1} linear programming can be solved by ACO as like TSP. It associates an amount of pheromoneτ(i,j) with the connection between two nodes i and j. Each ant is placed on a random start node in a skeleton, and builds relationship with a node in another skeleton, until all nodes have match relationship. The probability that an ant k in a node i chooses to go to a node j next in another skeletion is given by equation 7:
⎧ [τ i , j (t )]α .[ηi , j (t )]β ⎪⎪ α β pik, j (t ) = ⎨ ∑ [τ i , s (t )] .[η i , s (t )] ⎪ s∈tabu k ⎪⎩ 0
j ∉ tabu k (7)
j ∈ tabu k
In this equation, τ(i, j)(t) is the pheromone between i and j ,and η(i, j)(t) is a simple heuristic guiding the ant. The value of the heuristic is the inverse of the cost of the connection between i and j. So the preference of ant k in node i for node j is partly defined by the pheromone between i and j, and partly by the heuristic favourability of j after i. It is the parameterαwhich defines the relative importance of the pheromone information and β defines the relative importance of the heuristic information. tabuk is is the set of nodes in another skeleton that have not yet been visited by ant k in node i. Once all ants have built a tour, the pheromone is updated. This is done according to these equations:
τ i , j (t + n) = (1 − ρ ).τ i , j (t ) + Δτ i,j (t + n)
(8)
m
Δτ i , j (t + n) = ∑ Δτ ik, j (t + n) k =1
(9)
ρ
In these equations, the speed of this decay is defined by , the evaporation pak rameter. The amount of pheromone an ant k deposits on an edge is defined by Δτ i, j .
106
X. Zhu
Δτ i, j defines the amount of pheromone by all ants on the edge(i,j) in this tour. Δτ i,k j can be calculated by the followed equation:
⎧Q ⎪ Δτ (t + n) = ⎨ Lk ⎪⎩ 0
if (i, j ) ∈ tour of ant k
k i, j
(10)
otherwise
Where Q is a constant and Lk defines the tour length of the kth ant.
4 Experiments To demonstrate our approach to shape matching, we have used VC++ realizing our approach for matching experiments. To evaluate its performance under occlusion, Table 1. Matching results for object 10
123.1
255.2
53.2
128.
33.5
62.3
2 123.1
110.3
91.3
193.
110.3
78.1
183.
139.2
126.2
215.
329.
2
1
39.6
282.
3 53.2
128.2
33.5
91.3
193.3
139.2
78.1
183.3
126.2
43.9
53.1
62.3
215.2
39.6
152.8
431.5
329.1
282.2
434.2
43.9
45.3
172.
53.1
59.3
23.2
84.9
121.
62.3
35.4
121.2
183. 2
62.3
23.2
35.4
84.9
121.
434.
8
2
172.
523.
121.
183.
4
6
7
2
213.
532.
134.
243.
7
1
2
213.7
232.
53.4
58.2
532.1
232.
228.
342.
6
7
34.5
34.5
45.3
2
3 0
3 134.2
53.4
7 57.3
59.3
152.
6 41.2
89.2
2
4 523.
98.2
5
3 255.2
431.
228. 6
243.0
58.2
342. 7
Shape Matching Based on Ant Colony Optimization
107
articulation of structures, and changes in viewing and imaging conditions, we constructed a database of tool images, which is from Internet and Corel Draw database. Table 1 presents the results of the matching experiments for 10 objects. We computed the similarity between each object in the database, with the results show in Table1. Each data in Table 1 is the matching distance: D(T1,T2)=
∑D ×M i
i
.The more
i
smaller of the value shows the more similarity of shape.
5 Conclusions and Future Work Previous work on shape matching via a shock graph has been very successful for object matching. In this paper, we have introduced a specific matching algorithm that employs the characteristic of skeleton and ACO. The novelty of the matching algorithm lies in characteristic of skeleton. Experiments with a variety of objects demonstrate that the approach is generic, robust in the presence of noise and supports several important notions of similarity. Although the approach is developed for 2-D objects, we can improve our approach in 2-D to a view-based strategy for generic 3-D object matching. Acknowledgments. This work is supported by the Key Science&Technology Project of Zhejiang province under Grant No.2007C13052.
References 1. Paquet, E., Rioux, M.: Content-Based Access of VRML Libraries. In: Ip, H.H.-S., Smeulders, A.M.W. (eds.) MINAR 1998. LNCS, vol. 1464, pp. 20–32. Springer, Heidelberg (1998) 2. Zhang, C., Chen, T.: Efficient Feature Extraction for 2D/3D Objects in Mesh Representation. In: IEEE International Conference on Image Processing. IEEE Press, New York (2001) 3. Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Matching 3D Models with Shape Distributions. In: Shape Modeling International, Genova, Italy (2001) 4. Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape Distributions. ACM Transactions on Graphics 21(4), 807–832 (2002) 5. Grimson, W.E.L.: Object Matching by Computer: the Role of Geometric Constraints. MIT Press, Cambridge (1990) 6. Cyr, C.M., Kimia, B.B.: 3d Object Matching Using Shape Similiarity-based Aspect Graph. In: Proceedings of the Eighth International Conference On Computer Vision (ICCV 2001), pp. 254–261. IEEE Press, New York (2001) 7. Schmid, C., Mohr, R.: Local Grayvalue Invariants for Image Retrieval. PAMI 19(5), 530–534 (1997) 8. Veltkamp, R.C.: Shape Matching: Similarity Measures and Algorithms. In: Proc. Int’l Conf. on Shape Modeling and Applications, Genova, Italy, pp. 188–197 (2001) 9. Hu, M.K.: Visual Pattern Matching by Moment Invariants. IRE Trans. Information Theory IT-8, 179–187 (1962)
108
X. Zhu
10. Teague, M.R.: Image Analysis via the General Theory of Moments. J. Opt. Soc. Amer. 70, 920–930 (1980) 11. Mukundan, R.: Image Analysis by Tchebichef Moments. IEEE Trans. on Image Proc. 10(9), 1357–1364 (2001) 12. Zahn, C., Roskies, R.: Fourier Descriptors for Plane Closed Curves. IEEE Trans. Computers 21(3), 269–281 (1972) 13. Dorigo, M., Maniezzo, V., Colorni, A.: The Ant System: Optimization by a Colony of Cooperating Agents. IEEE Transactions on Systems, Man, and Cybernetics B 26(1), 29–41 (1996)
A Simulation Study on Fuzzy Markov Chains Juan C. Figueroa Garc´ıa1, Dusko Kalenatic2 , and Cesar Amilcar Lopez Bello3 1
3
Universidad Distrital Francisco Jos´e de Caldas, Bogot´ a - Colombia [emailprotected] 2 Universidad de la Sabana, Chia - Colombia Universidad Cat´ olica de Colombia, Bogot´ a - Colombia [emailprotected] Universidad Distrital Francisco Jos´e de Caldas, Bogot´ a - Colombia Universidad de la Sabana, Chia - Colombia [emailprotected]
Abstract. This paper presents a simulation study on Fuzzy Markov chains to identify some characteristics about their behavior, based on matrix analysis. Through experimental evidence it is observed that most of fuzzy Markov chains does not have an ergodic behavior. So, several sizes of Markov chains are simulated and some statistics are collected. Two methods for obtaining the Stationary Distribution of a Markov chain are implemented: The Greatest Eigen Fuzzy Set and the Powers of a Fuzzy Matrix. Some convergence theorems and two new definitions for ergodic fuzzy Markov chains are presented and discussed allowing to view this fuzzy stochastic process with more clarity.
1
Introduction and Motivation
Recently, the use of fuzzy sets for involving uncertainty in the statistical analysis has allowed the appearance of a new discipline called Fuzzy Statistics, where many researchers are dedicating their efforts to define correct expressions for solving different problems of data analysis. An appropriate treatment of the Fuzzy Markov chains approach is given by Sanchez in [1] and [2], Avrachenkov and Sanchez in [3] and Araiza, Xiang, Kosheleva and Skulj in [4] by defining different algorithms, fuzzy relations and compositions to compute their stationary distribution. The main motivation to do this study is that by using preliminary experimental evidence about fuzzy Markov chains, were detected periodical behavior and non-ergodic solutions in many cases. To that effect, a simulation study is done for identifying if Fuzzy Markov chains has tendency toward some behaviors.
2
Basic Definitions for Fuzzy Markov Chains
As in the analysis of crisp Markov chains, a Fuzzy Markov Chain is defined by a square matrix that represents the possibility that any discrete state at the instant t turns into any state at the next time instant t + 1. According to Avrachenkov and Sanchez in [3], the basic definitions about Fuzzy Markov Chains are: D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 109–117, 2008. c Springer-Verlag Berlin Heidelberg 2008
110
J.C. Figueroa Garc´ıa, D. Kalenatic, and C.A. Lopez Bello
Definition 1. A a finite fuzzy set or a fuzzy distribution on S is defined by a mapping x from S to [0, 1] represented by a vector x = {x1 , x2 , · · · , xn }, with 0 xi 1, i ∈ S. The set of all fuzzy sets is denoted by F (s). In this definition, xi is the membership grade that the state i takes on the fuzzy set S, i ∈ S with cardinality of size m, C(S) = m. All relations, operations and compositions are defined by theory of fuzzy sets. Now, a fuzzy relationship matrix P on the cartesian product S × S is defined by a matrix {pij }m i,j=1 , where 0 pij 1, i, j ∈ S. This fuzzy matrix P defines all transitions of the m states of the Markov chain. In other words: Definition 2. At each instant t, t = 1, 2, · · · , n, the state of the stochastic process is described by a fuzzy set1 x(t) ∈ F(S). The transition law of a Markov chain at the instant t, T = 1, 2, · · · , n is given by the fuzzy relation P as follows: x(t+1) = max{x(t) ∧ pij }, j ∈ S.
(1)
i∈S
Where i and j are the initial and final states of the transition i, j = 1, 2, · · · , m and x(0) is the initial fuzzy set also known as its initial distribution. Definition 3 (Markovian Property). Be {X0 , X1 , · · · , Xn } be a secuence of random variables which take values on any contable set S, called State space. Each Xn is a discrete random variable that takes on m possible values, where N = |S|; it may be the case that N = ∞. Then {X} is a Markov Chain if: P (Xn = s |X0 = x0 , X1 = x1 , · · · , Xn−1 = xn−1 ) = P (Xn = s |Xn−1 = xn−1 ) (2) For all n 1 and all {s, x0 , x1 , · · · , xn−1 ∈ S}. For further information see Grimmet & Stirzaker in [5], Ross in [6] and Ching & Ng in [7]. In the crisp Markov chain case, P is a probability matrix where m j=i pij = 1. In the fuzzy Markov chain case, P is a fuzzy matrix defined by the membership degree that xi has regarding a fuzzy set S where max μs (xi ) 1. i∈S
Now, some convergence laws of fuzzy random variables must be given to identify their properties. First, the powers of a fuzzy transition matrix P are:
ptij = max{pik ∧ pt−1 kj } k∈S
(3)
Here p1ij = pij and p0ij = δij . δij is a Kronecker Delta. In a matrix format:
P t = P ◦ P t−1
(4)
Any i, j state on x(t) at the instant t = 1, 2, · · · , n can be calculated as: (t)
(0)
xj = max{xi i∈S
1
∧ ptij }, j ∈ S
This matrix is also known as the Fuzzy Distribution of x.
(5)
A Simulation Study on Fuzzy Markov Chains
111
Or in a matrix format: x(t) = x(0) ◦ P t
(6)
Thomason in [8] shows that the powers of a fuzzy matrix exhibit a stable behavior by using the max-min operator, Chin-Tzong Pang in [9] analyzes their powers by using max-archimedean compositions. Based on their results the following theorem ensures the existence of a stable behavior of a fuzzy matrix. Theorem 1 (Powers of a Fuzzy Matrix). The powers of the fuzzy transition τ m matrix {pij }m i,j=1 converges to be idempotent {pij }i,j=1 , where τ n, or oscillate with a finite period υ starting from some finite power. A definition of the Stationary Distribution of a fuzzy matrix is given next. Theorem 2 (Stationary Distribution). Let the powers of the fuzzy transition matrix P converge in τ steps to a non-periodic solution, then it is called Aperiodic Fuzzy Markov Chain and P ∗ = P τ is their Stationary Fuzzy Transition Matrix. Definition 4 (Ergodicity). A fuzzy Markov chain is called Ergodic if it is aperiodic and its stationary distribution matrix has identical rows. Some Fuzzy matrices exhibit a periodical behavior. These cases are recently treated by Martin Gavalec in [10], [11] and [12] and his results can be applied to Fuzzy Markovian processes to identify the period of a fuzzy Markov chain. 2.1
General Discussion
The main discussion lies in the convergence of P because if P converges to a steady state in any τ power, then the process is clearly stationary. By using fuzzy operations it is possible to obtain stationary fuzzy distributions with no identical rows, then if any matrix is aperiodic, irreducible and has a stationary distribution must be an Ergodic Markov chain2 . In this way, we define two new concepts of Markov processes in a fuzzy environment: Definition 5 (Strong Ergodicity for Markov Chains). A fuzzy Markov chain is called Strong Ergodic if it is aperiodic and its stationary transition matrix has identical rows. Definition 6 (Weak Ergodicity for Markov Chains). A fuzzy Markov chain is called Weakly Ergodic if it is aperiodic and its stationary transition matrix is stable with no identical rows. It means that a Fuzzy Markov chain which has a stationary distribution given by an idempotent matrix P τ with no identical rows, obtained from τ powers of an initial distribution P , is an Ergodic Markov chain on a weak sense. That is: 2
The fact that a Markov Chain has an idempotent distribution ensures their ergodicity, if the chain has a periodic behavior then it is not ergodic.
112
J.C. Figueroa Garc´ıa, D. Kalenatic, and C.A. Lopez Bello
Proposition 3. Denote Piτ as the ith row of the stationary distribution of P obtained from their τ power. If P is Strong Ergodic then: Piτ1 = Piτ2
For all i1 = i2 , i1 , i2 ∈ m,
(7)
For any i1 = i2 , i1 , i2 ∈ m,
(8)
And P is Weak Ergodic iff: Piτ1 = Piτ2
All fuzzy Markov chains agree at least with one of the two previous statements.
3
Computation of the Fuzzy Stationary Distribution
Several methods can be used to compute the limiting distribution of the process. A first method which uses the max-min relation on P is described below: P n = P ◦ P n−1 = P ◦ P ◦ P n−2 = · · · = P ◦ P ◦ · · · ◦ P
(9)
n times
Now, if the stationary distribution of P is given by P ∗ = P τ where lim P n = n→τ P ∗ , then P becomes an idempotent matrix as is described in the Theorem 2. S´ anchez in [13], [2] and [1] defines the necessary conditions to create three efficient algorithms to compute the stationary fuzzy distribution of P . These results are based on the definition of an Eigen Fuzzy Set which is similar to the concept of an Eigenvector or Eigenvalue. These definitions are described below: Definition 7. Let P be a fuzzy relation in a given matrix form. Then x is called an Eigen Fuzzy set of P , iff: x◦P =x (10) Definition 8. The Fuzzy set x ∈ F(S) is contained in the fuzzy set y ∈ F(S), this is, (x ⊆ y), iff xi yi for all i ∈ S. Definition 9. Let X be the set of eigen fuzzy sets of the fuzzy relation P . Namely: (11) X = {x ∈ F(S) | x ◦ P = x} The elements of (X) are invariants of P according to the ◦− (max − min) ∨ ∨ composition. Then, if there exists x ∈ F(S) such that x ⊆ x for any x ∈ (X), it is called the Greatest Eigen Fuzzy Set of the relation P. Now, the idea is to find a max eigen fuzzy set, idempotent and stable: ∨
x = max Pijn i∈S
(12)
It is important to recall that if P is a Strong Ergodic fuzzy Markov chain, then its greatest eigen fuzzy set converges to an idempotent matrix P τ . Both greatest eigenvector and eigen fuzzy set describe the major part of the inertia of the process, each one in different spaces, that is, the direction that envelops the major part of the variability of the matrix, but using different operators on different spaces.
A Simulation Study on Fuzzy Markov Chains
4
113
Methodology of Simulation
Some important aspects about the simulation process are presented next. Size of the Markov Chain: The size of P denoted by m is the cardinality of S, C(S), for any (S) ∈ F(S). Four sizes of P are simulated in this paper: m = {5, 10, 50, 100}. Random Number Generator: All elements {pij } of P are obtained by using the Uniform Generator, xn = (a1 xn−1 + · · ·+ ak xn−k ) mod m , Ui = xn /m where m is the modulus and k is the order of the polynomial. It means that {pij } ∈ [0, 1] ↔ Uij [0, 1]. Algorithms: Two algorithms are applied to find the steady state of a Markov Chain. The first one is presented in (9) and the Theorem 2, and the second one is the Method III proposed by S´ anchez in [1], [2] and [13], which is: 1
(i) Determine first x with the elements corresponding to the greatest element in each column of P . (ii) Compute P 2 = P ◦ P and determine the greatest elements in each k
2
k
column of P 2 . They give x where max Pijk = (x ◦ P k−1 )j = xj , j = 1, n, i∈S
for all k 0. Here k = 2 and j = 1, 5. 2 1 (iii) Compare x with x: If they are different, compute P 3 = P 2 ◦ P to get 3
3
3
x where max Pij3 = (x ◦ P 2 )j = xj , j = 1, 5. i∈S 3
2
(iii) Compare x with x: If they are different, compute P 4 = P 3 ◦ P to get 4
4
4
x where max Pij4 = (x ◦ P 3 )j = xj , j = 1, 5. And so on. Stop when it is i∈S
n+1
n
∨
n
found ∨ such that x = x, that is x = x ◦ P . While the first algorithm shows if P is either Strong or Weak ergodic, the second one does not identify it and only gets their Greatest Eigen Fuzzy set. Number of Runs: 1000 Runs are simulated per each size of P , a total of 4000 simulations were performed. Some interesting statistics are collected and analyzed jointly whose description is presented below. Statistics of Interest: All collected statistics are described next: a) Number of the powers of P τ : It is the amount of powers of P needed to obtain its steady state τ , if P is Periodic then τ does not exist, otherwise the type of Markov chain must be registered. b) Type of Fuzzy Markov Chain: If P is Strong Ergodic (See Definition 5) then is registered as SE. If it is Weak Ergodic (See Definition 6) then is registered as WE, and finally if P is periodic then is registered as such. ∨ c) Amount of iterations to obtain {xj }: It is the amount of iterations n needed to obtain the Greatest Eigen Fuzzy Set {xj } according to the Method III proposed by S´ anchez. d) Computing Time: The computing time to obtain either τ or the needed time to identify a periodical behavior is computed only for m = {50, 100}.
114
J.C. Figueroa Garc´ıa, D. Kalenatic, and C.A. Lopez Bello
All simulations are computed by using MatLab 2007b, an AMD Tl-64 Turion machine with 4 Gb of RAM. See Appendix for examples of both methods to obtain the steady state of P and the Definitions 5 and 6.
5
Simulation Results
Certain behaviors were observed on P inherent to their random nature. The Table 1 shows the amount of Markov chains which has either a Strong (SE), Weak ergodic (WE) or Periodic behavior per each size of P . It is clear that most of chains have a Periodic oscillation and the remaining ones are ergodic. Table 1. Amount of SE,WE or Periodic chains Size SE m=5 33 m=10 7 m=50 2 m=100 –
WE 396 15 43 8
Periodic 571 778 955 992
Avg. Computing Time (in sec). Total N.A. 1000 N.A. 1000 1.542 1000 19.231 1000
Now, the Table 2 shows the amount of powers of P namely τ needed to achieve P τ for either Strong, Weak ergodic or Periodical P . Table 2. Amount of iterations τ τ m=5 τ m=10 τ m=50 τ m=100
3 12 5 1 12 1 77 1
4 105 6 19 16 2 84 2
5 107 7 58 17 3 93 5
6 98 8 61 18 2 – –
7 67 9 31 19 1 – –
8 32 10 16 20 2 – –
9 7 11 7 21 6 – –
10 1 12 11 22 1 – –
– – 13 3 23 2 – –
– – 14 5 24 1 – –
– – 15 1 25 3 – –
– – 16 3 28 1 – –
– – 18 1 29 2 – –
– – 19 5 31 2 – –
– – – – 32 1 – –
– – – – 34 1 – –
– – – – 35 1 – –
– – – – 36 1 – –
– – – – 37 2 – –
– – – – 38 1 – –
– – – – 39 1 – –
– – – – 41 2 – –
– – – – 42 2 – –
– – – – 44 1 – –
– – – – 58 1 – –
– – – – 59 1 – –
– – – – 77 1 – –
Per. 571 Per. 778 Per. 955 Per. 992
We can see that while m is increased then τ and the amount of periodic chains are also increased, excepting for m = 100 which is clearly periodic. In the Table 3 it can be viewed that all chains achieve their greatest eigen fuzzy set around a value smaller than m iterations, for instance, if P is 10 × 10 then n ∨ most part of the chains achieves xj in less than 10 iterations, that is, lim xj < 10. n→∞ A graphical representation of the distribution of ∨ is shown next. In a general context, the Method-III to compute the Greatest Eigen Fuzzy Set is faster than classical P t powers to find P τ . In contrast to, the Method III does not show if the process is either periodic or not. Moreover, a Decision-making process based on this method could not be consistent. Note that while m is increased then P has a tendency to be periodic, that is, when P is large than m = 10 is more probably to find a periodic behavior than a size less than m = 10.
A Simulation Study on Fuzzy Markov Chains
115
Table 3. Amount of iterations ∨ ∨ m=5 ∨ m=10 ∨ m=50 ∨ m=100
2 24 3 63 5 6 7 11
3 387 4 313 6 58 8 14
4 351 5 309 7 115 9 35
5 231 6 200 8 143 10 60
6 7 7 74 9 165 11 67
– – 8 30 10 138 12 82
– – 9 11 11 122 13 97
– – – – 12 80 14 109
– – – – 13 62 15 94
– – – – 14 41 16 89
– – – – 15 27 17 88
– – – – 16 21 18 75
– – – – 17 7 19 67
– – – – 18 6 20 49
– – – – 19 2 21 24
– – – – 20 4 22 19
– – – – 21 2 25 8
– – – – 22 1 31 8
– – – – – – 33 4
m=5 80 60
m = 10 40 20
m = 100
m = 50 0 0
10
20
30
40
50
60
70
Number of Iterations τ
80
90
Frecuency of Ocurrence
Frecuency of Ocurrence
400 100
350
m=5
300
m = 10
250 200
m = 50
150
m = 100
100 50 0 0
5
10
15
20
25
Number of Iterations ∨
30
35
Fig. 1. IT2 FM Stationary Distribution
A cautionary question about both methods is What does the analyst want to obtain from P ? If the analyst only requires a generalized measure of the steady state of P , then the Greatest Fuzzy Eigen set can be proper, but if the idea is to perform a Markovian Decision-Making process, then the powers of P are more proper, but he should keep in mind their size to identify periodic oscillations.
6
Concluding Remarks
Some concluding remarks of the study can be given. 1. The most important conclusion of this study is that the fuzzy approach of the Markov chain process has a strong inclination to be periodic, while the crisp approach does not present this behavior commonly. 2. An important fact is that when P is further large, then P converges to τ with difficulty, and if P is small, then it becomes easily to an ergodic behavior. 3. An important disadvantage of the fuzzy max-min operator is that conduces to periodical distributions of P , but an important advantage is that the fuzzy Markov chain approach is less sensitive to perturbations than the crisp approach. For further references see Sanchez in [1] and [2], Avrachenkov & Sanchez in [3] and Araiza, Xiang, Kosheleva and Skulj in [4]. 4. The study reveals that the proposed method by Sanchez is faster than the computation of P τ to find the stationary distribution of the process. Note that not only is faster in the sense that converges in less iterations than the powers of P , it performs rather less computations to find P τ .
116
J.C. Figueroa Garc´ıa, D. Kalenatic, and C.A. Lopez Bello
5. For large-scale problems, a cautionary issue is: If the Markov process has a ∨ periodical behavior and xj is used as their Stationary Distribution, then the ∨ decision-making process based on xj would be wrong. 6. As always in cases with thousands, millions or inclusive billions of states, the computation of P τ can become into an expensive process. This study alerts on the necessity of designing efficient methods to compute P τ with accuracy. Finally, it is important to emphasize this study gets valuable information about the asymptotic behavior of Discrete-Time Markov chains processes. Acknowledgements. The Authors would like to thank all people who are part of the Laboratory for Automation, Microelectronics and Computational Intelligence LAMIC and Mathematical Modeling Applied to Industry (MMAI) groups of the Universidad Distrital Francisco Jos´e de Caldas in Bogot´ a-Colombia.
References 1. Sanchez, E.: Resolution of Eigen Fuzzy Sets Equations. Fuzzy Sets and Systems 1, 69–74 (1978) 2. Sanchez, E.: Eigen Fuzzy Sets and Fuzzy Relations. J. Math. Anal. Appl. 81, 399– 421 (1981) 3. Avrachenkov, K.E., Sanchez, E.: Fuzzy Markov Chains and Decision-making. Fuzzy Optimization and Decision Making 1, 143–159 (2002) 4. Araiza, R., Xiang, G., Kosheleva, O., Skulj, D.: Under Interval and Fuzzy Uncertainty, Symmetric Markov Chains Are More Difficult to Predict. In: Proceedings of the IEEE NAFIPS 2007 Conference, vol. 26, pp. 526–531 (2007) 5. Grimmet, G., Stirzaker, D.: Probability and Random Processes. Oxford University Press, Oxford (2001) 6. Ross, S.M.: Stochastic Processes. John Wiley and Sons, Chichester (1996) 7. Ching, W.K., Ng, M.K.: Markov Chains: Models, Algorithms and Applications. Springer, Heidelberg (2006) 8. Thomason, M.: Convergence of Powers of a Fuzzy Matrix. J. Math. Anal. Appl. 57, 476–480 (1977) 9. Pang, C.T.: On the Sequence of Consecutive Powers of a Fuzzy Matrix with Maxarchimedean t-norms. Fuzzy Sets and Systems 138, 643–656 (2003) 10. Gavalec, M.: Computing Orbit Period in Max-min Algebra. Discrete Appl. Math. 100, 49–65 (2000) 11. Gavalec, M.: Periods of special fuzzy matrices. 16, pp. 47–60. Tatra Mountains Mathematical Publications (1999) 12. Gavalec, M.: Reaching matrix period is np-complete, vol. 12, pp. 81–88. Tatra Mountains Mathematical Publications (1997) 13. Avrachenkov, K.E., Sanchez, E.: Fuzzy Markov Chains: Specifities and properties. In: 8th IEEE IPMU 2000 Conference, Madrid, Spain (2000)
Appendix: On Computing the Steady State of P Computation of P τ : The Strong Ergodic Case The following example illustrates the Definition 5 and the Proposition 3.
A Simulation Study on Fuzzy Markov Chains
117
Example 1 (Avrachenkov and Sanchez in [3]). Let a Fuzzy Markov Chain have the following transition matrix: ⎡
0.1 ⎢ 0 ⎢ P =⎢ ⎢ 0.3 ⎣ 0.3 0
0.7 0.6 1.0 0.3 0
0.2 0.4 0 0.8 0.7
⎤ 0.8 0.7 0.3 0.5 ⎥ ⎥ 0.1 0.4 ⎥ ⎥ 0.1 0 ⎦ 0.5 0
P τ is obtained by computing P 4 with the following results: ⎡ ⎢ ⎢ Pτ = ⎢ ⎢ ⎣
0.3 0.3 0.3 0.3 0.3
0.6 0.6 0.6 0.6 0.6
0.5 0.5 0.5 0.5 0.5
0.5 0.5 0.5 0.5 0.5
⎤ 0.5 0.5 ⎥ ⎥ 0.5 ⎥ ⎥ 0.5 ⎦ 0.5
By using the Method III proposed by S´ anchez in [1], [2], [3] and [13] the greatest ∨ 3 eigen fuzzy set is obtained in three iterations, that is xj = xj . ∨
xj = [ 0.3 0.6 0.5 0.5 0.5 ]
Such as in the Definition 5, this example is a Strong Ergodic Markov chain since ∨ all rows of P τ are equal and it converges to x. Computation of P τ : The Weak Ergodic Case This section illustrates the Definition 6 and the Proposition 3. Example 2. Let a Fuzzy Markov Chain have the following transition matrix: ⎡
0, 583 ⎢ 0, 424 ⎢ P =⎢ ⎢ 0, 516 ⎣ 0, 334 0, 433
0, 226 0, 580 0, 760 0, 530 0, 641
0, 209 0, 380 0, 783 0, 681 0, 461
0, 568 0, 794 0, 059 0, 603 0, 050
0, 415 0, 305 0, 874 0, 015 0, 768
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
Its stationary fuzzy transition matrix is reached by P τ = P 5 , but it does not show equal rows. That is: ⎡ ⎢ ⎢ P5 = ⎢ ⎢ ⎣
0, 583 0, 516 0, 516 0, 516 0, 516
0, 568 0, 681 0, 760 0, 681 0, 641
0, 568 0, 681 0, 783 0, 681 0, 641
0, 568 0, 681 0, 760 0, 681 0, 641
0, 568 0, 681 0, 783 0, 681 0, 769
⎤ ⎥ ⎥ ⎥ ⎥ ⎦
This a case where the fuzzy transition matrix P is a Weak Ergodic process. Its greatest fuzzy eigen set is: ∨
3
xj = xj = [ 0, 583 0, 760 0, 783 0, 760 0, 783 ] ∨
Note that xj does not converges to any row of P τ .
A Tentative Approach to Minimal Reducts by Combining Several Algorithms Ning Xu1,2, Yunxiang Liu1, and Ruqi Zhou2 1
School of Computer Science and Information Engineering, Shanghai Institute of Technology, Shanghai, 200235, China 2 Dept. of Computer Science, Guangdong Institute of Education, Guangzhou, 510303, China
Abstract. Finding minimal reducts is a NP-hard problem. For obtain a feasible solution, depth-first-searching is mainly used and a feasible reduct always can be gotten. Whether the feasible reduct is a minimal reduct or not and how far it is to minimal reduct, both are not known. It only gives the information that how many attributes it has and it is a reduct. Based on rough sets reduction theory and the data structure of information system, the least condition attributes to describe the system’s classified characteristics can be known. So an area of searching minimal reducts is decided. By binary search in the area, the minimal reducts can be gotten quickly and doubtlessly. Keywords: rough sets, algorithm, attribute reduction, minimal reduct.
1 Introduction Attribute reduction, also called dimensionality reduction or feature selection, has important meaning in data mining, pattern recognition, machine learning, artificial intelligence, and so on. It is one of key techniques in data pre-processing and data compression. After many years research, it gets plentiful results. Rough sets, proposed by Poland mathematic professor Zdzislaw Pawlak[1,2] in 1982, is one of the most important results. The research field of rough sets is the attribute reduction and has set up reduction theory on data class knowledge. The result changes the attribute reduction situation and is widely used to practical fields. Minimal reduct, with least attributes, has the most value in attribute reduction. Experts had proved that to get the minimal reduct is still a NP-hard problem[3] even by rough sets, but the deep research is still going on, especially several methods are united to deal with the problem. This study discusses the set operation of rough sets reduction theory and the data structure of information system. Relying on relation database theory, a discrete and validated data structure can be used to describe limited data objects, which are distinguished one by one. So the algorithm PARA( ) is gained. It will give the low limit, the least attributes to distinguish every two objects in the dataset. The low limit is the lowest bound to search the minimal reduct. And by heuristic algorithm of attribute significance, a feasible reduct can be gotten. It will give the number of D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 118–124, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Tentative Approach to Minimal Reducts by Combining Several Algorithms
119
attribute in a reduct, the number of attribute will be the up bound to search minimal reducts. That a certain area to find the minimum reduct is decided. The two bound decide the binary search. Some examples show that the search is efficient and quickly and more algorithms can be designed when facing high-demensionality reduction. The binary search will greatly improve breadth- first-search for getting minimal reducts. If a heuristic algorithm is enough efficient or is approaching minimal reduct, the area will be more greatly reduced relative to breadth-first-search.
2 Rough Sets Reduction Theory A dataset is called an information system[1], and it is described as: S={U, A, V, f}: U―the universal, U={x1, x2, …, xn}; A—the set of all attributes; V—the set of all values of attributes; f—the map function, f: U×A→V. Generally, A=C D, C is condition attribute set and D is decision attribute set. For any P ⊆ A, the ∩P gives an equivalence relation, denoted with ind(P), called indiscernibility relation:
∪
ind(P)={(x,y)∈U2 |
∀ a∈P, a(x)= a(y)}.
The ind(P) generates a partition on U, usually is a group of equivalent classes. They are denoted by U/ind(P). A set X U and a R A rough sets defines lower approximation of X in R is:
⊆,
∈,
∪
⊆
R_(X) = {Y∈U/R | Y X} And upper approximation of X in R is:
∪
R—(X) = {Y∈U/R | Y∩X≠Ø } The lower approximation is also expressed by posR (X), posR(X)=R_(X), called positive region. If U/ind(D)={Y1,Y2,…,Yk} is a equivalent relation given by decision attributes, P ⊆ C, then the positive region of P in C with respect to D is: k
,and
If c∈C
pos ind ( P ) ( D ) = U pos ind ( P ) (Yi )
(2-1)
pos ind ( C − c ) ( D ) = pos ind ( C ) ( D )
(2-2)
i =1
then c is dispensable (can be reduced or reducted) with respect to D; else c is necessary. The reduction is defined as: P C if every c in P is necessary with respect to D, then P is considered independent with respect to D. If P is independent about D, and
⊆,
pos ind ( P ) ( D ) = pos ind ( C ) ( D )
,
。
(2-3)
then call P is a reduct in C with respect to D denoted by redD(C) Generally, there are several reducts meet (2-3) in an information system, the intersection of all these reducts is called core, denoted by coreD(C)= ∩redD(C). (2-3) shows that: some redundant attributes can be reduced from dataset, only the information system S maintains the positive region unchanged.
120
N. Xu, Y. Liu, and R. Zhou
3 A Discussion of Set Theory Every indiscernibility relation on U is an equivalence relation and gives some equivalence classes. U/ind(P) also is called quotient set. It can be easily proved that if R1∈C, R2∈C, and R1≠R2, then |U/(R1∩R2)| ≥ |U/R1| and |U/(R1∩R2)| ≥ |U/R2|. It will be true that: P C, |U/ind(C)| ≥ |U/ind(P)|. If |U/ind(C)|≠|U|, it means that the system S has some objects which are indiscernible by the condition attribute. These are the same objects on the condition set C (suppose no incompatible data in S). These same objects can be reduced beside one, and it is necearrsy in objects reduction. So, when the same objects are reduced, the system S with |U’| objects, and alwayse will is |U/ind(C)|=|U’|. For simple express, the system S is thought as |U/ind(C)|=|U|. This result shows every two objects can be distinguished by C, also the posind(C) (D)={U}. Because general information system has |U/ind(D)|≤|U|, so find P, P C, and
⊆
⊆
posind(P) (D)= posind(C) (D) is possible. When R∉P, and have posind(P-R) (D)≠ posind(C) (D), the P is a reduct. Increasing equivalence relation to an indiscernibility relation, the cardinality of the new indiscernibility relation will be increased or does not decrease. Heuristic algorithm of attribute significance is increasing attribute one by one to attribute core to fractionize their equivalence classes till meeting the classified knowledge of decision attributes. It always can get a feasible solution.
4 Data Structure of Information System Information system S={U, A, V, f} is a discrete and crisp dataset. The V is a finite set. From relation database, if the number of attributes and the number of every attribute values are decided, then number of objects which can be distinguished by them is decided. This is the structure of the relation table. If U={x1,x2,…,xn}, and C={c1,c2,…,cm}, Vci={ci1, ci2,…, ci,mi}, the data structure can distinguish the objects is: m
N=
∏m
i
i =1
The objects of an information system S, which meets |U/ind(C)|=|U|, must have: N≥ |U| . If N>>|U|=n, the information structure has enough attributes and different attribute values to discernible all objects, so that, in the attribute reduction, more attributes can be reduced when maintaining its classes. Else, if |U|≈N, it means only finite attributes can be reduced from the information system because they just enough for discernible all objects.
A Tentative Approach to Minimal Reducts by Combining Several Algorithms
121
Number N gives much information on attribute reduction. On the other sides, when C={c1,c2,…,cm}, Vci={ci1, ci2,…, ci,mi} and |U|=n have been known, whether how many attributes are needed to distinguish the n objects also can be probably known. Information system S, if there are m condition attributes, firstly, if among {c1,c2,…,cm}, t1 attributes have s1 different values ( |U/c|=s) t2 attributes have s2 different values t3 attributes have s3 different values … and tr attributes have sr different values then the system can describe N different objects:
, ,
, ,
,
r
N=
:2≤s ≤|U|, ∑ t r
In (4-1)
i
i
i =1
=m
∏
s it i
(4-1)
i =1
。
: :
If |U|=n, ci∈C, |U/ci|=si, Indexing the si from big to small s1≥s2≥s3≥…≥sm, p0∈I I={1, 2, 3, …, m} makes the two formulas to be true
,
,
p 0 −1
∏s i =1
i
≤n
,
∃p , 0
(4-2)
p0
∏ si ≥ n . i =1
Formulas (4-2) show: at least p0 attributes are needed to describe the different objects in the system. If less than p0 the system certainly does not arrive |U/ind(C)|=n. Because a reduct must meet (2-3), so the attribute number in a reduct generally has
:
| redD(C)|≈p0 or | redD(C)|≥p0. The result displays the low limit: p0, it is the least attributes to discern any two objects in the system, also it is the least number to search minimum reducts.
5 Algorithm and Examples
(Pre-Analysis of attribute :
Following is the algorithm of reduction analysis PARA Reduction Algorithm). Information system S={U, A, V, f}
①. Getting t 、s of C, and computing N by (4-1); ②. Indexing s descent, computing p by (4-2); ③. IF |C|-p =0 THEN stop and exit, ELSE; ④. Computing ind(C) of the system; ⑤. Computing pos (D) of the system; ⑥. ∀ c∈C:IF pos (D)≠pos (D) THEN core(C)= core(C)∪{c}; i
i
i
ind(C)
ind(C-c)
ind(C)
122
N. Xu, Y. Liu, and R. Zhou
⑦. C’=C- core(C); ⑧. Output N , n, p , |C|-p , core(C), p -|core(C)|, C’. 0
This is a classical CTR(Car Test Result) dataset[6] and is discreted as table 1 reduction analysis by PARA is as follows
:
,its
Table 1. Classified CTR Dataset
U 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
a 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 0 0 0 1 0 0
b 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0
c 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0
d 1 1 1 1 1 0 1 0 0 0 1 1 0 1 0 1 1 1 1 1 0
e 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 1 1 1 0 0 0
f 1 0 1 1 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 1 1
g 1 1 1 1 0 0 1 2 2 0 2 0 0 1 2 1 1 1 0 0 0
h 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0
i 0 0 0 1 0 2 2 1 0 0 1 0 0 0 0 0 0 0 0 0 0
D 1 1 1 2 1 0 0 2 1 1 2 2 1 2 2 1 1 1 2 2 1
①. Getting:t =2, s =3; t =7, s =2; by (4-1):N=1152; ②. s : 3,3,2,2,2,2,2,2,2, use (4-2): n=21, p = 4; ③. m=9, |C|-p = 5 ④. |U/ind(C)|=|{1},{2},{3},{4},…,{21}| = |U| ; ⑤. pos (D)={1,2,3, 4,…,21}={U}; ⑥. pos (D)≠{U} and pos (D)≠{U}, core (C)= {d,i}; ⑦. C’= C- core (C)={a,b,c,e,f,g,h}; ⑧ . Output: N=1152, n=21, p =4, |C|-p =5, core (C) ={d,i}, p -|core(C)|=2, 1
1
2
2
i
ind(C)
ind(C-d)
ind(C-i)
D
D
D
|C’|=7 Because N >>n (1152>>21), so the system has enough condition attributes, and may half of them are redundant. Because p0=4, so four attribute could be a reduct. As the core has two attributes, other two attributes are needed to a minimal reduct.
A Tentative Approach to Minimal Reducts by Combining Several Algorithms
123
So now, the low bound to search minimal reduct is 4 attributes. Because |coreD(C)| = 2, in fact to find a reduct from 511 subsets of 7 attributes is reduced to 21 subsets C 72 = 21 firstly. If a reduct can be decided from them, the other depth-first-search is no meanings; it is the minimal reduct of the IS. If no reduct in the 21 subsets, it tells a low bound of attribute number in a reduct. Then by a depth-first-search gets a reduct. The attribute number in the reduct will be the up bound, the binary search can begin between the two attribute numbers. To CTR dataset, the minimum reduct is: redD(C)={d,i,a,e}. Another example is from [7]. The dataset comes from medical treatment records. There are 20 inspective attributes, and 568 cases. Five exports divided the cases to 5 classes. By use of the reduction analysis algorithm PARA, the out put is: N≈8.0×1014, N>>n, p0=4, and |coreD(C)|=1, |C’|=19. 3
The low bound of finding minimum reducts is check C19 =969 attribute subsets. Any heuristic algorithm of attribute significance, for example the dependency of attribute as significance, can get a reduct which has 14 attributes. This will be the up bound to search minimal reduct. The next step is searching subsets, several reducts are obtained. Then is searching no reduct in them. It will be the last step to search
8 C19 = 75582 attribute
6 C19 = 27132 attribute subsets,
7 C19 = 50388 attribute subsets, 13
reducts are obtained; all of them are minimal reducts. Till now, the search number of attribute subsets is 154071; it is greatly reduced than 19! ≈1.2×1017 attribute subsets, and less than 1/(7.89×1011) times. A quickly heuristic algorithm gets a reduct of 8 attributes. By binary search, only the searches of
5 6 C19 + C19 =38760 attribute subsets are needed, it is less than
12
1/(3.13×10 ) times of 19!.
6 Conclusion Attribute reduction, especially high-dimensionality reduction, has many important meanings. This paper discusses a reduction analysis algorithm from indiscernibility relation and equivalence classes. When the attributes is increased in an indiscernibility relation, its equivalence classes will be increased. Ind(C) always arrive the most number of equivalence classes and has |U/ind(C)|=|U|. From this point, the data structure of information system can be used to analysis its ability to distinguish any two objects. From the data structure of information system, a reduction preanalysis algorithm PARA gives much information and determines the low limit of searching minimal reducts. Any heuristic algorithm can get up limit to search minimal reducts. Some examples show the algorithm, it is efficient and quickly. It may play some role in dimensionality reduction and finding minimal reducts. The study tries to give a simple think route on dimensionality reduction.
124
N. Xu, Y. Liu, and R. Zhou
Acknowledgements. The project is supported by China Guangdong Natural Science Foundation (No.06301299) and Professor & Doctor Special Research Funds of Guangdong Institute of Education.
References 1. Pawlak, Z.: Rough Sets, Int. J. Comput. Inform. Sci. 11(5), 341–356 (1982) 2. Pawlak, Z.: Rough Sets and Their Applications. Microcomputer Applications 13(2), 71–75 (1994) 3. Wong, S.K.M., Ziarko, W.: On Optimal Decision Rules in Decision Tables. Bullet. Polish Acad. Sci. 33, 693–696 (1995) 4. Xu, N.: The Theory and Technique Research of Attribute Reduction in Data Mining Based on Rough Sets, PhD dissertation, Guangdong University of Technology (2005) 5. Ni, Z., Cai, J.: Discrete Mathematics. Science Publishes (2002) 6. Zhang, W., Wu, W., Liang, J., Li, D.: Theory and Method of Rough Sets. Science Publishes (2001) 7. Guo, J.: Rough set-based approach to data mining, PhD dissertation, Department of Electrical Engineering and Computer Science, Case Wester University, USA (2003) 8. Hu, X.: Knowledge Discovery in Database: An Attribute-oriented Rough Set Approach (Rules, Decision Matrices), PhD dissertation, The University of Regina, Canada (995) 9. Wang, J., Miao, D.: Analysis on Attribute Reduction Strategies of Rough Set. J. Comput. Sci. Technol. 13(2), 189–193 (1998) 10. Shi, Z.: Knowledge Discovery. Tsinghua University Press, Beijing (2002) 11. Duntsch, I., Gediga, G., Orlowska, E.: Relation Attribute Systems II: Reasoning with Relations in Information Structures. In: Peters, J.F., Skowron, A., Marek, V.W., Orłowska, E., Słowiński, R., Ziarko, W. (eds.) Transactions on Rough Sets VII. LNCS, vol. 4400, pp. 16–35. Springer, Heidelberg (2007)
Ameliorating GM (1, 1) Model Based on the Structure of the Area under Trapezium Cuifeng Li Zhejiang Business Technology Institute, 315012, Zhejiang Ningbo [emailprotected] Abstract. According to the research on the structure of background value in the GM(1,1) model, the structure method of background value, a exact formula about the background value of x (1) (t ) in the region [k , k + 1] ,which is used when establishing GM(1,1), is established by integrating x (1) (t ) from k to
k + 1 .The modeling precision and prediction precision of the ameliorating background value can be advanced. Moreover, the application area of GM(1,1) model can be enlarged. At last, the model of Chinese per-power is set up. Simulation examples show the effectiveness of the proposed approach. Keywords: grey theory, background value, precision.
1 Introduction The grey system theory has been caught great attention by researchers since 1982 and has already been widely used in many fields, such as industry, agriculture, zoology, market economy and so on. GM(1,1) has been high improved by many scholars from home and abroad. The grey system theory can effectively deal with incomplete and uncertain information system. The background value is an important factor in the fitting precision and prediction precision. According to the research on the structure of background value in the GM(1,1) model, the background value’s structure method of GM(1,1) model, a exact formula about the background value of x (1) (t ) in the region [k , k + 1] ,which is used when establishing GM(1,1), is established by integrating x (1) (t ) from k to k + 1 .The modeling precision and prediction precision of the ameliorating background value can be advanced. Moreover, the application area of GM (1, 1) model can be enlarged. At last, the model of Chinese per-power is set up. Simulation examples show the effectiveness of the proposed approach.
2 Modeling Mechanism of the Ameliorating GM (1, 1) Model 2.1 GM(1,1) Model
Let the non-negative original data sequence be denoted by:
X (0) = {x (0) (1), x (0) (2),..., x (0) (n)} . D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 125–131, 2008. © Springer-Verlag Berlin Heidelberg 2008
(1)
126
C. Li
Then the 1-AGO (accumulated generation operation) sequence X (1) can be gotten as follow: X (1) = {x (1) (1), x (1) (2),..., x (1) (n)} .
(2)
Where k
x (1) (k ) = ∑ x (0 ) (i ), k = 1,2,..., n .
(3)
i =1
The grey GM(1,1) model can be constructed by establishing a first-order differential equation for x (1) (t ) as:
d x (1 ) ( t ) / d t + a x (1 ) ( t ) = u .
(4)
Where a and u are the parameters to be estimated to calculate the integral of (3), we can get the follow equation:
∫
k +1
k
dx (1) (t ) + a ∫
k +1
k
x (1) (t )dt = u ∫
k +1
k
dt .
(5)
Then k +1
∫
k
dx(1) (t) = x(1) (t) kk+1 = x(1) (k +1) − x(1) (k) = x(0) (k + 1) .
(6)
Suppose
z ( 1 ) ( k + 1) =
∫
k +1
k
x (1 ) ( t ) dt .
(7)
is the background value of x (1) (t ) in the region [k , k + 1] . Thus, (3) can be rewritten into the following form: x (0) (k + 1) + az (1) (k + 1) = u .
(8)
Form (7), it is observed that the value of Z (1) (k + 1) can be established by integrating x (1) (t ) from k to k + 1 . Solve a and u by means of LS (least square): ⎛ aˆ ⎞ T −1 T ⎜ ˆ ⎟ = [B B] B Y . ⎝u ⎠
Where
⎡ − 2 x (1) (t )dt 1 ⎤ ⎢ ∫1 ⎥ ⎢ 3 (1) ⎥ − ∫ x (t )dt 1 ⎥ ⎢ 2 M Y = [ x (0) (2), x (0) (3),..., x (0) (n)]T B= ⎢ ⎥ M ⎢ ⎥ ⎢ k +1 (1) ⎥ ⎢⎣ − ∫k x (t )dt 1 ⎥⎦
(9)
Ameliorating GM (1, 1) Model Based on the Structure of the Area under Trapezium
127
Therefore, we can obtain the time response function by solving (3) as follow: uˆ uˆ xˆ (1) ( k + 1) = [ x (1) (1) − ]e − aˆk + . aˆ aˆ
(10)
2.2 The Improved Structure of the Background Value
Z (1) (k + 1) is the average of x (1) (k ) and x (1) (k + 1) in the traditional GM(1,1) model. We can see from Fig.1 that Z (1) (k + 1) using the traditional background value can be also regarded as the area of the trapezium abcd , but the real background value Z (1) (k + 1) is the background value of x (1) (t ) in the region [k , k + 1] .Thus we can know that using the traditional background value to build the model will bring lower precision and higher error. The background value which is reconstructed by the method of rectangle is proposed by Tan Guanjun. The method has made better precision, but it also has rather bigger error which we can see form fig.2. A new background value which using high precision interpolation formula and trapezium method is proposed. This method can improve the prediction precise of GM(1,1). The thought of this method is sorted out as follows. The interval of k to k + 1 is and let the divided into N space equivalently with the length named Δt is 1 N
,
values of the function x (1) (t ) in the points is x (1) (k ), x1 , x2 , x3 ,..., xN −1 , x (1) (k + 1) correspondingly as fig.3.
Fig. 1. Z (1) (k + 1) using the tradi-tional background value
Fig. 2. Z (1) (k + 1) reconstructed by the method of rectangle
Fig. 3. Z (1) (k + 1) reconstructed by the method of trapezium
The total of N areas under trapezium is regard as an approximation of the actual area. Obviously, the bigger N is, the total of N areas is much closer to the actual area as fig.3.Thus, the background value with this method proposed by this paper is nearer the actual area than the traditional method. Now, the total of N areas named S N is deduced as follows.
128
C. Li
The area of narrow trapezium is substituted for rounded line of narrow trapezium in every space. According to the formula of the area under trapezium, we can obtain as follows. k +1
SN = ∫ x(1) (t)dt k
1 1 1 1 ≈ [x(1) (k) + x1 ]Δt + (x1 + x2 )Δt + (x2 + x3 )Δt +... + [xN −1 + x(1) (k +1)]Δt . 2 2 2 2 1 (1) (1) = [x (k) + 2x1 + 2x2 + 2x3 +... + 2xN −1 + x (k +1)] 2N
(11)
Suppose
zN(1) (k +1) = SN =
1 (1) [x (k) + 2x1 + 2x2 + 2x3 + ... + 2xN −1 + x(1) (k +1)] . 2N
(12)
k = 1, 2,..., n − 1
Where
,
i=k+
i ,(i=1,2,L ,N-1) , thus N
xi is the ordinate value of the corresponding curve
xi = x (1) (k +
i ), i = 1, 2,..., N − 1 . N
when
(13)
Obviously, the following equality could be gotten when N = 1 .
1 zN(1) (k +1) = SN == [ x(1) (k ) + x (1) (k +1)] . 2
(14)
2.3 Calculate the Background Value
From above, if the new background value is to be restructured, we should get the value xi firstly. But the value xi is not exit. Now Newton-Cores interpolation formula is introduced to get it. Suppose Y ( k ) = k , k = 1,2,..., n , let [Y ( k ), x (1) ( k )], k = 1,2,..., n be the point of the corresponding curve, then using Newton-Cores interpolation formula to get the value
i i x(1) (k + ) in light of its corresponding abscissa Y(k+ ) i =1,2...,n −1 . N N Definition 3.1 [6]. The function f [ x 0 , x k ] = f ( x k ) − f ( x 0 ) is defined as a first-order xk − x0
mean-variance of f (x ) about x0 , xk .
Ameliorating GM (1, 1) Model Based on the Structure of the Area under Trapezium
The function f [ x 0 , x1 , x k ] =
129
f [ x 0 , x k ] − f [ x 0 , x1 ] is defined as a second-order meanx k − x1
variance of f (x ) about x0 , xk . The function f [x0 , x1,...,xk−1] =
f [x0 ,...,xk−3, xk−1] − f [x0 , x1,...,xk−2 ] is defined as a (k-1) order xk−1 − xk−2
mean-variance of f (x ) about x0 , xk . The function
f [x0 , x1,...,xk ] =
f [x0 ,...,xk−2 , xk ] − f [x0 , x1,...,xk −1] is defined as a k order xk − xk−1
mean-variance of f (x ) about x0 , xk . Newton-Cores interpolation formula in [6] is as follow: Suppose x is a point in [ a, b] , then we can get: f ( x ) = f ( x 0 ) + f [ x , x 0 ]( x − x 0 ) f [ x , x 0 ] = f [ x 0 , x1 ] + f [ x , x 0 , x1 ]( x − x1 )
.
(15)
... f [ x , x 0 , x1 ,..., x n −1 ] = f [ x 0 , x1 ,..., x n ] + f [ x , x 0 ,..., x n ]( x − x n )
As long as the latter formula has been taken into the former formula, we can get: f ( x ) = f ( x0 ) + f [ x0 , x1 ]( x − x0 ) + f [ x0 , x1 , x 2 ]( x − x0 ) ( x − x1 ) + ... + f [ x0 , x1 ,..., xn ]( x − x0 ) ( x − x1 )...( x − x n ) + f [ x , x0 ,..., xn ]ω n +1 ( x )
.
(16)
= N n ( x ) + Rn ( x )
Where R n ( x ) = f ( x ) − N n ( x ) = f [ x , x 0 , x1 ,..., x n ]ω n +1 ( x )
(17)
ω n +1 ( x) = ( x − x0 )( x − x1 )...( x − x n ) . The polynomial of Newton-Cores interpolating formula is as follow: N n ( x ) = f ( x 0 ) + f [ x 0 , x1 ]( x − x 0 ) + f [ x 0 , x1 , x 2 ]( x − x 0 )( x − x1 ) + ... + f [ x 0 , x1 ,..., x n ]( x − x 0 )( x − x1 )...( x − x n )
.
(18)
Then the new background value is held easily as follow z (1) (k +1) =
1 (1) [ x ( k ) + 2 x1 + 2 x2 + 2 x3 + ... + 2 xN −1 + x (1) ( k + 1)] . 2N
(19)
Generally, the bigger N is, the More accurate GM(1,1) model is.
3 Example Per-power is the measure of economic development level and people's living standards Thus, it is necessary to build the model of per-power and to predict developmental tendency. Now, using the method proposed by this paper to build
130
C. Li Table 1. Comparison of two modeling methods
Year
Real
Method proposed in [2]
value
Method proposed in this paper
1980
306.35
Model value 306.35
1981
311.2
303.04
2.62
300.07
3.58
1982
324.9
325.16
ˉ0.07
321.83
0.94
1983
343.4
348.89
ˉ1.60
345.18
ˉ0.52
1984
361.61
374.35
ˉ3.52
370.22
ˉ2.38
1985
390.76
401.67
ˉ2.79
397.08
ˉ1.62
1986
421.36
430.99
ˉ2.29
425.88
ˉ1.07
1987
458.75
462.45
ˉ0.81
456.78
0.43
1988
494.9
496.2
ˉ0.26
489.91
1.01
1989
522.78
532.42
ˉ1.81
525.45
ˉ0.51
1990
547.22
571.28
ˉ4.40
563.57
ˉ2.99
1991
588.7
612.97
ˉ4.11
604.45
ˉ2.68
1992
647.18
657.71
ˉ1.63
648.30
ˉ0.17
1993
712.34
705.71
0.93
695.33
2.39
1994
778.32
757.22
2.71
745.78
4.18
1995
835.31
812.49
2.79
799.87
4.24
1996
888.1
871.79
1.84
857.90
3.40
1997
923.16
935.42
ˉ1.33
920.14
0.33
1998
939.48
1003.7
ˉ6.83
986.89
ˉ5.05
1999*
988.60
1076.9
ˉ8.94
1058.49
ˉ7.07
2000*
1073.62
1155.5
ˉ7.65
1135.27
ˉ5.74
2001*
1164.29
1239.9
ˉ6.49
1217.63
ˉ4.58
(predict value with*).
Relative error˄%˅ 0
Model value 306.35
Relative error˄%˅ 0
Ameliorating GM (1, 1) Model Based on the Structure of the Area under Trapezium
131
China per-power form1980 to 1998 and to predict per-power form 1999 to 2001.The model by using the method proposed in this paper is as follow: Table 1 gives comparison of two modeling methods.
xˆ (1) (k ) = 4136.36e0.070033( k −1) − 3830.00
+
xˆ (k 1) = 279.77e (0)
xˆ (1) = 306.35
0.070033 k
,k ≥ 1
,k ≥ 1
.
(0)
The error inspection of post-sample method can be used to inspect quantified approach .The post-sample error c = S1 / S 0 (where S1 is variation value of the error and S 0 is variation value of the original sequence) of the model proposed by this paper is c1
= 0.0867 , while the post-sample error proposed in [2] is c2 = 0.1186 . Then
we can come to conclusion that the method proposed by this paper has improved the fitted precision and much better than the method proposed in [2]. The small error probability is p = P{| e (0) (i ) − e −(0) |< 0.6745S 0 } = 1 . Thus, the practical application results show the effectiveness of the proposed approach.
4 Conclusion According to the research on the structure of background value in the GM(1,1) model, the structure method of background value, a exact formula about the background value of x (1) (t ) in the region [k , k + 1] ,which is used when establishing GM(1,1), is established by integrating x (1) (t ) from k to k + 1 .The modeling precision and prediction precision of the ameliorating background value can be advanced. Moreover, the application area of GM(1,1) model can be enlarged. At last, the model of Chinese perpower is set up. Simulation examples show the effectiveness of the proposed approach.
References 1. Liu, S.F., Guo, T.B., Dang, Y.G.: Grey System Theory and Its Application. Science Press, Beijing (1999) 2. Tan, G.J.: The Structure Method and Application of Background Value in Grey System GM (1, 1) Model (I). Systems & Engineering-Theory, 98–103 (2000) 3. Chen, T.J.: A New Development of Grey Forecasting Model. Systems Engineering, 50–52 (1990) 4. Fu, L.: Grey Systematic Theory and Application. Technical Document Publishing House, Beijing (1992) 5. Shi, G.H., Yao, G.X.: Application of Grey System Theory in Fault Tree Diagnosis Decision. Systems Engineering theory & Practice 144, 120–123 (2001) 6. Gong, W.W., Shi, G.H.: Application of Gray Correlation Analysis in the Fe-spectrum Analysis Technique. Journal of Jiangsu University of Science and Technology (Natural Science) 1, 59–61 (2001)
Comparative Study with Fuzzy Entropy and Similarity Measure: One-to-One Correspondence Sanghyuk Lee, Sangjin Kim, and DongYoup Lee School of Mechatronics, Changwon National University #9 sarim-dong, Changwon, Gyeongnam 641-773, Korea {leehyuk,aries756,dongyeuplee}@changwon.ac.kr
Abstract. In this paper we survey the relation between fuzzy entropy measure and similarity measure. Each measure has data uncertainty and similarity. By the one-to-one correspondence, distance measure and similarity measure have complementary characteristics. First we construct similarity measure using distance measure. Verification of usefulness is proved. Furthermore analysis of similarity measure from fuzzy entropy measure is also discussed. Keywords: Similarity measure, distance measure, fuzzy entropy, one-to-one correspondence.
1 Introduction Fuzzy entropy and similarity measure are both used for the quantifying uncertainty and similarity measure of data [1,2]. Data uncertainty and certainty are usually expressed through probability point of view, probability of event denotes, which lies within. That probability value has the meaning of certainty and uncertainty simultaneously. Degree of similarity between two or more data has central role for the fields of decision making, pattern classification, or etc. [3-8]. Until now the research of designing similarity measure has been made by numerous researchers [8-12]. Two design methods are introduced through fuzzy number approach [8-11] and distance measure [12]. Method by fuzzy number make easy to design similarity measure. However considering similarity measures are restricted within triangular or trapezoidal membership functions [8-11]. Whereas similarity measure based on the distance measure is applicable to general fuzzy membership function including non-convex fuzzy membership function [12]. For fuzzy set, uncertain knowledge is contained in fuzzy set itself. Hence uncertainty of the data can be also obtained from analyzing the fuzzy membership function. Mentioned uncertainty is described fuzzy entropy. Characterization and quantification of fuzziness are important issues that affect the management of uncertainty in many system models and designs. The fact that the entropy of a fuzzy set is a measure of the fuzziness of that fuzzy set has been established by previous researchers [14-16]. Liu proposed the axiomatic definitions of entropy, distance measure, and similarity measure, and discussed the relations between these three concepts. Kosko considered D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 132–138, 2008. © Springer-Verlag Berlin Heidelberg 2008
Comparative Study with Fuzzy Entropy and Similarity Measure
133
the relation between distance measure and fuzzy entropy. Bhandari and Pal provided a fuzzy information measure for discrimination of a fuzzy set relative to some other fuzzy set. Pal and Pal analyzed classical Shannon information entropy. In this paper we try to analyze relations between fuzzy entropy and similarity. With the help of distance measure, we design the similarity measure. Obtained similarity measure produce fuzzy entropy based on one-to-one correspondence between distance measure and similarity measure. Fuzzy entropy, from similarity measure, is proved by verifying definition of fuzzy entropy. We also continue discussion of similarity from fuzzy entropy. In the following chapter, we discuss the definition of fuzzy entropy and similarity measure of fuzzy set. We also introduce the previous obtained fuzzy entropy and similarity measure. In Chapter 3, fuzzy entropy is induced with similarity measure and vise versa. Conclusions are followed in Chapter 4.
2 Fuzzy Entropy and Similarity Measure Analysis Fuzzy entropy represents the fuzziness of fuzzy set. Fuzziness of fuzzy set is represented through degree of ambiguity, hence the entropy is obtained from fuzzy membership function itself. Liu presented the axiomatic definitions of fuzzy entropy and similarity measure [13], and these definitions have the meaning of difference or closeness for different fuzzy membership functions. First we introduce fuzzy entropy. We design fuzzy entropy based on distance measure satisfying definition of fuzzy entropy. Notations of Liu are used in this paper [13]. Definition 2.1 [13]. A real function: e : F ( X ) → R + is called an entropy on F ( X ) , if e has the following properties: (E1) e( D) = 0 , ∀D ∈ P( X ) (E2) e ([1 2]) = max A∈F ( X ) e( A) (E3) e( A* ) ≤ e( A) , for any sharpening A* of A (E4) e( A) = e( Ac ) , ∀A ∈ F ( X ) where [1 2] is the fuzzy set in which the value of the membership function is 1 2 , R + = [ 0, ∞ ) , X is the universal set, F ( X ) is the class of all fuzzy sets of X , P( X ) is the class of all crisp sets of X and D c is the complement of D . A lot of fuzzy entropy satisfying Definition 2.1 can be formulated. We have designed fuzzy entropy in our previous literature [1]. Now two fuzzy entropies are illustrated without proofs. Fuzzy Entropy 1. If distance d satisfies d ( A, B ) = d ( AC , BC ) , A, B ∈ F ( X ) , then e ( A ) = 2 d ( ( A ∩ Anear ) , [1]) + 2 d ( ( A ∪ Anear ) , [ 0 ]) − 2
is fuzzy entropy.
134
S. Lee, S. Kim, and D. Lee
Fuzzy Entropy 2. If distance d satisfies d ( A, B ) = d ( AC , BC ) , A, B ∈ F ( X ) , then e ( A ) = 2d
(( A ∩ A ) , [0]) + 2d (( A ∪ A ) , [1]) far
far
is also fuzzy entropy. Exact meaning of fuzzy entropy of fuzzy set A is fuzziness of fuzzy set A with respect to crisp set. We commonly consider crisp set as Anear or A far . In the above fuzzy entropies, one of well known Hamming distance is commonly used as distance measure between fuzzy sets A and B , d ( A, B ) =
1 n ∑ μ A ( xi ) − μB ( xi ) 2 i =1
where X = { x1 , x2 ,L xn } , k is the absolute value of k . μ A ( x) is the membership func-
tion of A ∈ F ( X ) . Basically fuzzy entropy means the difference between two fuzzy membership functions. Next we will introduce the similarity measure, and it describes the degree of closeness between two fuzzy membership functions. It is also found in literature of Liu.
Definition 2.2 [13]. A real function s : F 2 → R + is called a similarity measure, if s has the following properties:
(S1) s( A, B) = s( B, A) , ∀A, B ∈ F ( X ) (S2) s( D, D c ) = 0 , ∀D ∈ P ( X ) (S3) s(C , C ) = max A, B∈F s( A, B) , ∀C ∈ F ( X ) (S4) ∀A, B, C ∈ F ( X ) , if A ⊂ B ⊂ C , then s ( A, B) ≥ s ( A, C ) and s ( B, C ) ≥ s( A, C ) . With Definition 2.2, we propose the following theorem as the similarity measure. Similarity Measure 1. For any set A, B ∈ F ( X ) , if d satisfies Hamming distance
measure and d ( A, B ) = d ( AC , B C ) , then s( A, B ) = 1 − d
( ( A ∩ B ) , [0]) − d (( A ∪ B ) , [1]) C
C
(1)
is similarity measure between set A and set B . We have proposed the similarity measure that is induced from distance measure. The similarity is useful for the non interacting fuzzy membership function pair. Another similarity is also obtained, and it can be found in our previous literature [2]. Similarity Measure 2. For any set A, B ∈ F ( X ) if d satisfies Hamming distance measure, then s ( A, B ) = 2 − d
( ( A ∩ B ) , [1]) − d ( ( A ∪ B ) , [0])
(2)
is also similarity measure between set A and set B . To be a similarity measure, similarity (1) and (2) do not need assumption d ( A, B ) = d ( A C , B C ) . Liu also pointed out that there is an one-to-one relation between all
Comparative Study with Fuzzy Entropy and Similarity Measure
135
distance measures and all similarity measures, d + s = 1 . In the next chapter, we derive similarity measure that is generated by distance measure. Furthermore entropy is derived through similarity measure by the properties of Liu. It is obvious that Hamming distance is represented as d ( A, B ) = d
( ( A ∩ B ) , [1]) − (1 − d ( ( A ∪ B ) , [0]) .
(3)
Where A ∩ B = min ( μ A ( xi ) , μ B ( xi ) ) and A ∪ B = max ( μ A ( xi ) , μ B ( xi ) ) are satisfied. With the Proposition 3.4 of Liu [13], we generate the similarity measure or distance measure from distance measure or similarity measure [13]. Proposition 2.1[13]. There exists an one-to-one correlation between all distance measures and all similarity measures, and a distance measure d and its corresponding similarity measure s satisfy s + d = 1 . With the property of s = 1 − d , we can construct the similarity measure with distance measure d , that is s < d > . From (3) it is natural to obtain following result. d ( A, B ) = d ( ( A ∩ B ) , [1]) + d ( ( A ∪ B ) , [ 0 ]) − 1 = 1 − s ( A, B )
Therefore we propose the similarity measure with above expression. s < d >= 2 − d ( ( A ∩ B ) , [1]) − d ( ( A ∪ B ) , [ 0 ])
(4)
This similarity measure is exactly same with (2). At this point, we verified the oneto-one relation of distance measure and similarity measure. In the next chapter, we verify that the fuzzy entropy is derived through similarity (2).
3 Entropy Derivation with Similarity Measure Liu also suggested propositions about entropy and similarity measure. He also insisted that the entropy can be generated by similarity measure and distance measure, those are denoted by e < s > and e < d > . 3.1 Entropy Generation by Similarity
Proposition 3.5 and 3.6 of reference [13] are summarized as follows. Proposition 3.1 [13]. If s is a similarity measure on F , define
(
e ( A) = s A, AC
)
, ∀A ∈ F .
Then e is an entropy on F . Now we check whether our similarities (1) and (2) satisfy Proposition 3.1. Proof can be obtained by checking whether s ( A, AC ) = 2 − d
(( A ∩ A ) , [1]) − d (( A ∪ A ) , [0]) C
satisfy from (E1) to (E4) of Definition 2.1.
C
136
S. Lee, S. Kim, and D. Lee
For (E1), ∀D ∈ P ( X ) , s ( D, D C ) = 2 − d
(( D ∩ D ) , [1]) − d (( D ∪ D ) , [0]) = 2 − d ([0] , [1]) − d ([1] , [0]) = 0 C
C
(E2) represents that crisp set entropy e ([1 2]) satisfies s ([1 2] , [1 2] ) = 2 − d C
has the maximum entropy value. Therefore, the
12
(([1 2] ∩ [1 2] ) ,[1]) −d (([1 2] ∪ [1 2] ) ,[0]) = 2 − d ([1 2] , [1]) − d ( ⎡⎣1 2 , [0]⎤⎦ ) C
C
= 2 −1 2 −1 2 = 1
In the above equation, [1 2]
C
= [1 2]
is satisfied.
(E3) shows that the entropy of the sharpened version of fuzzy set than or equal to e ( A ) . s ( A* , A*C ) = 2 − d
( )
A , e A*
, is less
( ( A ∩ A ) , [1]) − d (( A ∪ A ) , [0]) ≤ 2 − d ( ( A ∩ A ) , [1]) − d ( ( A ∪ A ) , [ 0 ]) = s ( A, A *
*C
*
*C
C
C
C
)
Finally, (E4) is proved directly s ( A, AC ) = 2 − d
(( A ∩ A ) , [1]) − d (( A ∪ A ) , [0]) = 2 − d ( ( A ∩ A ) , [1]) − d ( ( A C
C
C
C
)
)
∪ A , [ 0 ] = s ( AC , A)
From the above proof, our similarity measure s ( A, AC ) = 2 − d
(( A ∩ A ) , [1]) − d (( A ∪ A ) , [0]) C
C
generate fuzzy entropy. Next another similarity (1) between A and AC s ( A, AC ) = 1 − d
( ( A ∩ A) , [0]) − d ( ( A ∪ A ) , [1]) = 1 − d ( ( A ) , [0]) − d ( A, [1]) .
is also satisfied and proved easily. 3.2 Relation of Similarity and Distance
With the property of one-to-one correspondence between similarity and distance, we have derived similarity measure with distance measure. Furthermore with the similarity measure we also obtained fuzzy entropy. For the derivation of similarity measure, s = 1 − d is also used. If we use distance measure (3) d ( A, B ) = d
( ( A ∩ B ) , [1]) − (1 − d ( ( A ∪ B ) , [0]) ) ,
We obtain the corresponding similarity measure s < d >= 2 − d ( ( A ∩ B ) , [1]) − d ( ( A ∪ B ) , [ 0]) .
then this similarity is identical to (2).
Comparative Study with Fuzzy Entropy and Similarity Measure
137
From another similarity (1) s ( A, B ) = 1 − d
is
d ( A, B ) = d
(( A ∩ B ) , [0]) − d (( A ∪ B ) , [1]) , C
C
(( A ∩ B ) , [0]) + d (( A ∪ B ) , [1]) satisfied ? C
C
By the definition of distance measure of Liu [13], d ( A, B ) = d =d
=d
( ( A ∩ B ) , [0]) + d (( A ∪ B ) , [1]) C
C
(( A ∩ B ) ,[0] ) + d (( A ∪ B ) , [1] ) C
(( A
C
C
C
) )
∪ B , [1] + d
C
(( A
C
)
C
∩ B , [0]
C
)
= d ( B, A) .
d ( A, A) = d
( ( A ∩ A ) , [0]) + d (( A ∪ A ) , [1]) C
(
C
)
(
)
= d [ 0 ] , [ 0 ] + d [1] , [1] = 0 .
For d ( A, B ) = d ( ( A ∩ B C ) , [ 0]) + d ( ( A ∪ B C ) , [1]) ≤d
(( D ∩ D ) , [0]) + d (( D ∪ D ) , [1]) CC
CC
= d ( D , [ 0 ]) + d ( D , [1]) = 1 .
Hence it is natural that distance between crisp set and its complement become maximal value. Finally, d ( A, B ) = d
( ( A ∩ B ) , [0]) + d ( ( A ∪ B ) , [1]) ≤ d ( ( A ∩ C ) , [ 0 ]) + d ( ( A ∪ C ) , [1] ) = d ( A, C ) C
C
C
and d ( B, C ) = d
( ( B ∩ C ) , [0]) + d (( B ∪ C ) , [1]) C
C
C
≤d
( ( A ∩ C ) , [0]) + d (( A ∪ C ) , [1]) = d ( A, C )
are satisfied because of inclusion property,
C
C
A⊂ B⊂C.
4 Conclusions We have discussed the similarity measure that is derived from distance measure. The proposed similarity usefulness is proved. Furthermore with the relation between fuzzy entropy and similarity measure, we also verified that the fuzzy entropy is induced through similarity measure. In this paper our proposed similarity measures are provided for the design of fuzzy entropy. Among the proposed similarity measure, a similarity satisfies fuzzy entropy trivially. Even though there are similarity measure satisfying similarity definition, there can exist trivial fuzzy entropy. Finally, proposed similarity measure can be applied to the general types of fuzzy membership functions.
138
S. Lee, S. Kim, and D. Lee
Acknowledgments. This work was supported by 2nd BK21 Program, which is funded by KRF(Korea Research Foundation).
References 1. Lee, S.H., Cheon, S.P., Kim, J.: Measure of Certainty with Fuzzy Entropy Function. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 134– 139. Springer, Heidelberg (2006) 2. Lee, S.H., Kim, J.M., Choi, Y.K.: Similarity Measure Construction Using Fuzzy Entropy and Distance Measure. In: Huang, D.-S., Li, K., Irwin, G.W. (eds.) ICIC 2006. LNCS (LNAI), vol. 4114, pp. 952–958. Springer, Heidelberg (2006) 3. Yager, R.R.: Monitored Heavy Fuzzy Measures and Their Role in Decision Making under Uncertainty. Fuzzy Sets and Systems 139(3), 491–513 (2003) 4. Rébillé, Y.: Decision Making over Necessity Measures through the Choquet Integral Criterion. Fuzzy Sets and Systems 157(23), 3025–3039 (2006) 5. Sugumaran, V., Sabareesh, G.R., Ramachandran, K.I.: Fault Diagnostics of Roller Bearing Using Kernel Based Neighborhood Score Multi-class Support Vector Machine. Expert Syst. Appl. 34(4), 3090–3098 (2008) 6. Kang, W.S., Choi, J.Y.: Domain Density Description for Multiclass Pattern Classification with Reduced Computational Load. Pattern Recognition 41(6), 1997–2009 (2008) 7. Shih, F.Y., Zhang, K.: A Distance-based Separator Representation for Pattern Classification. Image Vis. Comput. 26(5), 667–672 (2008) 8. Chen, S.M.: New Methods for Subjective Mental Workload Assessment and Fuzzy Risk Analysis. Cybern. Syst. 27(5), 449–472 (1996) 9. Hsieh, C.H., Chen, S.H.: Similarity of Generalized Fuzzy Numbers with Graded Mean Integration Representation. In: Proc. 8th Int. Fuzzy Systems Association World Congr., vol. 2, pp. 551–555 (1999) 10. Lee, H.S.: An Optimal Aggregation Method for Fuzzy Opinions of Group Decision. In: Proc. 1999 IEEE Int. Conf. Systems, Man, Cybernetics, vol. 3, pp. 314–319 (1999) 11. Chen, S.J., Chen, S.M.: Fuzzy Risk Analysis Based on Similarity Measures of Generalized Fuzzy Numbers. IEEE Trans. Fuzzy Syst. 11(1), 45–56 (2003) 12. Lee, S.H., Kim, Y.T., Cheon, S.P., Kim, S.S.: Reliable Data Selection with Fuzzy Entropy. In: Wang, L., Jin, Y. (eds.) FSKD 2005. LNCS (LNAI), vol. 3613, pp. 203–212. Springer, Heidelberg (2005) 13. Liu, X.: Entropy, Distance Measure and Similarity Measure of Fuzzy Sets and Their Relations. Fuzzy Sets and Systems 52, 305–318 (1992) 14. Bhandari, D., Pal, N.R.: Some New Information Measure of Fuzzy Sets. Inform. Sci. 67, 209–228 (1993) 15. Kosko, B.: Neural Networks and Fuzzy Systems. Prentice-Hall, Englewood Cliffs (1992) 16. Pal, N.R., Pal, S.K.: Object-background Segmentation Using New Definitions of Entropy. IEEE Proc. 36, 284–295 (1989)
Low Circle Fatigue Life Model Based on ANFIS Changhong Liu1, Xintian Liu1, Hu Huang1, and Lihui Zhao1,2 1
College of Automobile Engineering, Shanghai University of Engineering Science, 201620, Shanghai, China [emailprotected] 2 School of Mechanical Engineering, Shanghai Jiao Tong University, 200240, Shanghai, China
Abstract. With the adaptive network fuzzy inference system (ANFIS), this paper presents a method of building a model of the low circle fatigue life. According to real experiment data got in the low circle fatigue experiment, a fatigue life model for low fatigue experiment is built. Finally, comparing with the Manson-Coffin equation, it can be concluded that the model of ANFIS is accurately and effectively.
1 Introduction Fuzzy Inference System (FIS) is based on expertise expressed in terms of ‘IF–THEN’ rules [1, 2]. FIS can be used to predict uncertain systems and its application dose not require knowledge of the underlying physical process as a precondition [3]. ANNs are inspired from the biological sciences by attempting to emulate the behavior and complex functioning of the human brain in recognizing patterns; they are based on a schematic representation of biological neurons in the human brain and attempt to emulate the processes of thinking, remembering and problem solving [4,5]. ANNs have many inputs and outputs and allow nonlinearity in the transfer function of the neurons; therefore they can be used to solve multivariate and nonlinear modeling problems. In recent years, the two methods were combined one anther, and then a pop research field appeared. In 1993, a hybrid ANFIS algorithm based on the Sugeno system, which was improved by Jang, was used on acquiring optimal output data in the study. ANFIS is an outstanding method in the pop research field. At present, ANFIS applications are generally encountered in the areas of function approximation, fault detection, medical diagnosis and control, and so on. Material’s low circle fatigue life estimate is a frequent and important problem in engineering field. Simultanously, it is always an important fields attracting the Science’s and Engineering’s attention. In the engineering field, there are many effective formulae on the low circle fatigue life estimate, for example, Manson-Coffin formula etc. The paper presents a method on Low circle fatigue life estimate which is built through ANFIS.
2 Adaptive Network Based Fuzzy Inference Systems (ANFIS) Adaptive network based fuzzy inference systems (ANFIS) is a FIS implemented in the framework of an adaptive fuzzy neural network. Such framework makes the D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 139–144, 2008. © Springer-Verlag Berlin Heidelberg 2008
140
C. Liu et al.
ANFIS modeling more systematic and less reliant on expert knowledge. The main aim of ANFIS is to optimize the parameters of the equivalent FIS by applying a learning algorithm using input-output data sets [6~8]. The parameter optimization is done in a way such that the error measure between the target and the actual output is minimized [9]. To present the ANFIS architecture, a fuzzy if–then rule based on a complex learning process is considered [10, 11]. Rule 1: if ( x is A1 ) and ( y is B1 ),
f 1 = p1 x + q1 y + r1 ); (1) Rule 2: if ( x is A2 ) and ( y is B2 ), (2) then ( f 2 = p 2 x + q 2 y + r2 ) where x and y are the inputs, Ai and Bi are the fuzzy sets, f i are the outputs within then (
the fuzzy region specified by the fuzzy rule,
pi , q i and ri are the design parameters
that are determined during the training process. The ANFIS architecture to implement these two rules is shown in figure 1 in which a circle indicates a fixed node whereas a square indicates an adaptive node. ANFIS has a 5 layer feed-forward neural network [8].
Fig. 1. This is the Architecture of ANFIS
Layer 1: All the nodes are adaptive nodes. The outputs of layer 1 are the fuzzy membership grades of the inputs, which are given by:
Oi1 = μAi ( x)
i = 1,2
Oi1 = μBi − 2 ( y ) i = 3,4 where
μA (x) i
and
(3) (4)
μB ( y ) can adopt any fuzzy membership function (MF) . Oi1 i − 2
indicates the output of layer 1. For example, if the bell-shaped membership function is employed, μAi (x ) is given by
μA ( x) = i
1 1 + {[( x − ci ) / a i ] 2 }bi
(5)
Low Circle Fatigue Life Model Based on ANFIS
where
141
ai , bi and ci are the parameters of the membership function, governing the
bell-shaped functions accordingly. Layer 2: Every node in this layer is a fixed node with the task of multiplying incoming signals and sending the product out. This product represents the firing strength of a rule. For example, in Fig. 1
Oi2 = wi = μ Ai ( x) μ Bi ( y ) i = 1,2
(6)
Layer 3: The nodes are fixed nodes. They play a normalization role to the firing strengths from the previous layer. The outputs of this layer can be represented as
Oi3 = w i =
wi w1 + w2
i = 1,2
(7)
which are the so-called normalized firing strengths. Layer 4: The nodes are adaptive nodes. The output of each node in this layer is simply the product of the normalized firing strength and a first-order polynomial (for a first-order Sugeno model). Thus the outputs of this layer are given by
O4i = w i f i = w i ( pi x + qi y + ri ) i = 1,2
(8)
Layer 5: There is only one single fixed node, which performs the summation of all incoming signals. Hence the overall output of the model is given by 2
2
∑w f
i =1
w1 + w2
O = ∑ wi f i = 5 i
i
i
i =1
(9)
It can be observed that there are two adaptive layers in this ANFIS architecture, namely the first layer and the fourth layer. In the first layer, there are three modifiable parameters {a i , bi , c i } , which are related to the input membership functions. These parameters are the so-called premise parameters. In the fourth layer, there are also three modifiable parameters { pi , qi , ri } , pertaining to the first-order polynomial. These parameters are so-called consequent parameters. In a word, the model based on the arithmetic is built and looked upon as corrective parameters which needed to be input. So that the model will export the corresponding simulation system of the low circle fatigue life. It is no need for the users to know the task principle and to have possession of fuzzy theory. In other works, the favorable precondition is provided, which is conventiency.
3 Low Circle Fatigue Life Estimate Model Based on ANFIS Low circle fatigue belongs to the fatigue problem of short life and has higher stress level [13]. Break stress always exceeds scale limitation. In every circle largish plastic deformation maybe happens [14,15]. Because the material lies in plastic yielding period, stress is one of important controls parameters in the low circle fatigue test
142
C. Liu et al.
[16,17]. According to literature [18], it can be concluded that experiment result of low circle fatigue is on the 2.25 Cr-1Mo steel in the 500 . Using Manson-Coffin formula,
℃
Δε p
2 where
= CN df
(10)
Δε p is plastic stress process, C and d are material constant, N f is circle life.
According to table 2, some parameters can be confirmed, C =1.566×105 , d =−0.6576. Table 1. Experiment results of low circle fatigue
( Δε )( με ) 2
(
1280 1454 1600 1746 1790 1868 1935 2032
Δε p 2
)( με )
387 768 1456 2143 2654 3668 5287 8801
Nf 9437 3664 1000 609 514 323 175 84
Using ANFIS, the model of low cycle fatigue life is built. First, plastic strain process is regarded as input sample and corresponding fatigue cycle life is looked upon as output result. Membership function of input variable adopts gbellmf which uses nine fuzzy rules, and membership function of output chooses Constance type that generates fuzzy inference system by grid method. Using hybrid learning algorithm to train network, error accuracy is zero. And then, by making comparation between low cycle fatigue life model and Manson-Coffin model got by training, the difference of them is small in Table 2. In addition, the thesis makes use of two parameters in table 1, named elastic strain process and plastic strain process as input parameters, to gain result which is close to the both mentioned above. In essence, from table 1, it indicates that elastic strain part increases with test load enlarging, which has less influence on cycle fatigue life than plastic strain process. Therefore, the low cycle fatigue life model built by using ANFIS is feasible. According to table 2, the results of Manson-Conffin are similar to the results of ANFIS, it can be concluded that the model of ANFIS is accurately and effectively. Table 2. The results from two low circle fatigue experiments
Δε p
)( με )
1112
1799.5
2398.5
3171
Manson-Coffin
1851
890
575
376
ANFIS
1817
828
573
384
(
2
Low Circle Fatigue Life Model Based on ANFIS
143
4 Discussions To sum up, characteristics based on low cycle fatigue life model are described as follows.Fatigue life model is built easily because ANFIS is in need of training input and output data merely, so it is not necessary to analyzing internal mechanism. But this model is a black box about input and output, that is to say, internal mechanism is dim.Although this model refers to membership function of fuzzy variables and other fuzzy concepts, it’s not necessary to understand relational fuzzy knowledge deeply in practical running. Generally speaking, bell form membership function is suitable for non-fuzzy parameters; the number of fuzzy rules is related to times of iteration and training precision. Commonly, the more fuzzy rules used are, the fewer times of iteration become and the higher training precision is, but the more time every training cost. Especially, it is more obvious with multivariable input.The more variable number is, the longer time needed by training model will be. Sometimes, if a variable added, the time will increase greatly, so it had better reduce the number of input variable as soon as possible.The thesis puts forward a method to build a model as a black box, which is different from traditional establishment of constitutive relationship. It is not necessary to analyze internal mechanism. Indeed, only relative parameters are considered as input data and it is not in need of knowing which parameters are main variables.Although spline function can be used to fit out the relationship of low cycle fatigue life, ANFIS model is better to fit out fluctuant condition with interrelated present data. ANFIS model has very good adaptability and precision among spline function, ANN and ANFIS model. Acknowledgment. This work supported by the Research Fund for the University Excellent Young Teachers in Shanghai (GJD-07021) and Shanghai Leading academic discipline project (P1045).
References 1. Kazazian, H.H., Phillips, J.A., Boehm, C.D., Vik, T.A., Mahoney, M.J., Ritchey, A.K.: Prenatal Diagnosis of Beta-thalassemia by Amniocentesis: Linkageanalysis using Multiple Polymorphic Restriction Endonuclease Sites. Blood 56, 926–930 (1980) 2. Esragh, F., Mamdani, E.H.: A General Approach to Linguistic Approximation. In: Fuzzy Reasoning and Its Application, London (1981) 3. Kazeminezhad, M.H., Etemad-Shahidi, A., Mousavi, S.J.: Application of Fuzzy Inference System in the Prediction of Wave Parameters. Ocean Engin. 32, 1709–1725 (2005) 4. Haykin, S.: Neural Networks: A Comprehensive Foundation. Macmillan Publishing, New York (1999) 5. Fu, J.Y., Liang, S.G., Li, Q.S.: Prediction of Wind-induced Pressures on a Large Gymnasium Roof using Artificial Neural Networks. Computers and Structures 85, 179–192 (2007) 6. Guler, I.: Adaptive Neuro-fuzzy Inference System for Gap Discontinuities in Coplanar Waveguides. Int. J. Electron. 92, 173–188 (2005) 7. Übeyli, E.D., Güler, İ.: Adaptive Neuro-Fuzzy Inference Systems for Analysis of Internal Carotid Arterial Doppler Signals. Comput. Biol. Med. 35, 687–702 (2005)
144
C. Liu et al.
8. Shalinie, S.M.: Modeling Connectionist Neuro-Fuzzy Network and Applications. Neural Comput. Applic. 14, 88–93 (2005) 9. Jang, J.S.R., Sun, C.T., Mizutani, E.: Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence. Prentice-Hall, Upper Saddle River (1997) 10. Roger, J.J.S.: ANFIS: Adaptive Network-based Fzzy Inference Systems. IEEE Tran Systems, Man and Cybernatics 23(3), 665–685 (1993) 11. Stepnowski, A., Mo szyński, M., Tran, V.D.: Adaptive Neuro-Fuzzy and Fuzzy Decision Tree Classifiers as Applied to Seafloor Characterization. Acoust. Physics 49(2), 193–202 (2003) 12. Ertuğrul, Ç., Osman, Y.: Prediction of Wind Speed and Power in the Central Anatolian Region of Turkey by Adaptive Neuro-Fuzzy Inference Systems (ANFIS). J. Eng. Env. Sci. 30, 35–41 (2006) 13. Miyano, Y., Nakada, M., McMurray, M.K., Muki, R.: Prediction of Flexural Fatigue Strength of CFRP Composites under Arbitrary Frequency, Stress Ratio and Temperature. Journal of Composite Materials 31, 619–638 (1997) 14. Miyano, Y., McMurray, M.K., Enyama, J., Nakada, M.: Loading Rate and Temperature Dependence on Flexural Fatigue Behavior of a Satin Woven CFRP Laminate. Journal of Composite Materials 28, 1250–1260 (1994) 15. Qi, H.Y., Wen, W.D., Sun, L.W.: Fatigue Life Prediction and Experiment Research for Composite Laminates with Circular Hole. J. Cent. South Univ. Techno. 11(1), 19–22 (2004) 16. Caprino, G., Amore, A.: Fatigue Life of Draphite/Epoxy Laminates Subjected to TensionCompression Loadings. Mechanics of Time-dependent Materials 4, 139–154 (2000) 17. Novozhilov, N.I.: Prediction of Fatigue Life and the Technicoeconomic Efficiency of High-Strength Steel Railway Bridge Structures. Strength of Materials 10(1), 43–47 (1978) 18. Dai, Z.Y.: Fatigue Damage Critical and Damage Locality. In: Wang, G.G., Gao, Q. (eds.) Solid Damage and Destroy, pp. 75–81. Chengdu University Science and Technology Press, Chengdu (1993)
New Structures of Intuitionistic Fuzzy Groups Chuanyu Xu Department of Math, Zhejiang Gongshang University 310035 Hangzhou, China [emailprotected]
℉) set is a generalization of the concept ‘fuzzy set’. Intuitutionistic fuzzy group is ℐ℉ set with a kind of operation. However, few of structure of Intuitutionistic fuzzy groups (ℐ℉Gs) are known. Aimed at this, this paper gives and proves four theorems about some structures as follows: 1.Caushy theorem of ℐ℉ groups. 2. The sufficient and necessary condition of an ℐ℉ p-group is that the order of ℐ℉ group is a power of p. 3.The number of elements of conjugate class in ℐ℉G group equals the number of cosets in ℐ℉ quotient group. And 4.The condition that there exist fixed element in conjugate class in ℐ℉G group and the number of fixed elements. Compared with relative works, The sets and operations of classical groups are classical. In this paper, the sets are ℐ℉Ss and the operations are based on ℐ℉ relation. The similar works in this paper have not be seen in available ℐ℉ groups. Abstract. Intuitutionistic fuzzy (ℐ
1 Introduction
℉
After Intuitutionistic fuzzy sets (simply, ℐ℉Ss) were presented, [1,2] a new type ℐ groups were forwarded[3-5]. However, in their structures, only homomorphism were studied. Other structures have not be seen in report. Some important structures should be studied, for example,
℉ groups; How many are the number of elements of conjugate class in ℐ ℉ groups? Is there any fixed element in the sets on which ℐ ℉ groups act? How many are the relation between the structure and the number of order of ℐ
they? In order to solve these problems, this paper gives and proves four theorems about some structures as follows: 1. Caushy theorem of ℐ groups.
℉
2. The sufficient and necessary condition of an ℐ ℐ
℉ group is a power of p.
℉ p-group is that the order of
3. The number of elements of conjugate class in ℐ of cosets in ℐ
℉ quotient group.
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 145–152, 2008. © Springer-Verlag Berlin Heidelberg 2008
℉G group equals the number
146
C. Xu
4. The condition that there exist fixed element in conjugate class in ℐ and the number of fixed elements.
℉G group
Compared this paper with relative works is as follows: 1. The difference between classical groups and this paper is as follows. The sets and operations of classical groups are classical. In this paper, the sets are ℐ℉Ss and the operations are based on ℐ relation.
℉ ℉ groups and this paper is as follows. The similar works in this paper have not be seen in available ℐ℉ groups.
2. The difference between available ℐ
The rest of paper is organized as follows. 2.preliminaries, 3. Some structures of ℐ℉ groups and 4.conclusion.
2 Preliminaries Definition 2.1 [1,2] (Intuitionistic Fuzzy Set, ℐ℉S). Let a set E be fixed. An IFS A in E is an object having the form A={<x, μA(x), ν A (x)>⏐x∈E}, where the functions μA(x):E→[0,1] and ν A (x) :E→[0,1] define the degree of membership and the degree of nonmembership of the element x∈E to the set A, which is a subset of E, respectively, and for every x∈E: 0≤μA(x),+ν A (x) ≤1. Note. Obviously every fuzzy set has the form
┃
{<x, μA(x),1−μ A (x)>⏐x∈E}. In Definition 2.2-2.4, 0≤μ +ν≤1 .
Definition 2.2 (Intuitionistic Fuzzy mapping, ℐ℉ mapping). Let X and Y be two nonvoid sets, (x, y)∈X×Y, and ∃ θ1>0, θ2 >0. If (1) ∀x∈X, ∃y∈Y, such that μ (x, y)>θ1 and ν(x, y)>θ2 (2) ∀x∈X, ∀y1, y2∈Y, μ(x,y1)>θ1 and ν(x,y1)>θ2, μ(x,y2)>θ1 and ν(x, y2) >θ2⇒y1=y2, then the vector function (μ ,ν) is called an ℐ℉ mapping (μ ,ν):X Y, x⊢→y, denoted as (μ ,ν)(x)=y, or for simplicity, f(x)=y
→
┃
Definition 2.3. If (μ ,ν) satisfies that ∀ y∈Y, ∃ x∈X and ∃ θ1>0, θ2 >0 such that μ (x, y)>θ1, ν(x, y)>θ2 , then (μ ,ν) is called ℐ℉ surjection. If ∀x1, x2∈X, ∀y∈Y, μ(x1,y)>θ1, ν(x1,y)>θ2, and μ(x2, y)>θ1, ν(x2, y)> θ2⇒x1=x2, then (μ ,ν) is called ℐ℉ injection. If (μ ,ν) is both ℐ℉ surjection and ℐ℉ injection, then (μ ,ν) is called ℐ℉ bijection.
┃
New Structures of Intuitionistic Fuzzy Groups
147
Definition 2.4 [4,5 9-11](ℐ℉ Binary operation). Let G be a nonvoid set, and ℐ℉ mapping, (μ ν): G×G×G→<θ 1,θ 2>, θ1, θ2∈ [0,1] . If (1)∀x, y ∈G, ∃z∈G, such that μ (x, y, z)>θ1 and ν(x, y, z) >θ2; (2) ∀x, y∈G, ∀z1, z2∈G, μ(x, y, z1) >θ1 , ν(x, y, z1)> θ2; and μ(x, y, z2) >θ1, ν(x, y, z2)>θ2⇒ z1=z2, then the vector function (μ ,ν) is called an ℐ℉ binary operation on G Denote (x○y)(z)≜<μ(x, y, z) ,ν operator.
. (x, y, z)>, here‘○’ is called the ℐ℉ binary ┃
In Definition 2.5, 2.6 and 2.10 0≤*μ +*ν≤1 and 0≤μ* +ν*≤1 . Definition 2.5. The ℐ℉ composition operation between elements in G is defined as follows:
=
<∨
=<
∨
((x○y) ○z)(a)
b∈G
(μ(x,y,b)
∧μ(b,z,a)), ∧
c∈G
(ν(x,y,c)
(x○(y ○z))(a)
b∈G
(μ(y,z,b)
∧μ(x,b,a)), ∧
c∈G
(ν(x,y,c)
∨ν(c,z,a))> ≜< μ, ν > *
*
∨ν(x,c,a))> ≜<μ , ν > *
*
Definition 2.6 [1,2,4,5](ℐ℉ group). Let G be a nonvoid set and ∃ θ1>0, θ2 >0. If
┃
,
(1) ((x○y)○z)(a1)= <*μ, *ν>, (x○(y○z))(a2)=<μ*, ν*>, *μ, μ*>θ1, *ν, ν*>θ2⇒ a1=a2 ’ ○’ is called to satisfy the association law; (2) ∀x∈G, ∃e∈G, (e○x)(x) =<μ(e, x, x), ν(e, x, x)>, (x○e)(x) =<μ(x, e, x), ν(x, e, x)>, μ(•, •, •)>θ1, ν(•, •, •)>θ2, e is called an identity element; (3) ∀x∈G, ∃y∈G, (x○y)(e)=<μ(x, y, e), ν(x, y, e)> , (y○x)(e)=< μ(y, x, e), ν(y, x, e)>, μ(•, •, •)>θ1, ν(•, •, •)>θ2, y is called an inverse element of x, and denoted as x−1 ;
┃
then G is called a ℐ℉ group.
℉ group G is a ℉ subgroup of G,denoted by ┃ Definition 2.8 [1,2,4,5]. Suppose that H is an ℐ℉ subgroup of ℐ℉ group G,x, z∈ G,define μ(x, h, z),∧ ν(x, h, z)> (xH)(z)=<∨ (Hx)(z)=<∨ μ(h, x, z),∧ ν(h, x, z)> Definition 2.7. [1,2,4,5] (ℐ℉ subgroup) If the nonempty subset H of ℐ
℉
group about the operation’○’, then H is called a ℐ ℐ H
h∈H
h∈H
h∈H
h∈H
xH and Hx are called the ℐ℉ left coset and the ℐ℉ right coset of H in x, respectively.
┃
148
C. Xu
,
Definition 2.9. Suppose H is an ℐ℉ subgroup of ℐ℉ group G a, b∈G. aH bH⇔∃h∈H, (a−1○b)(h)=<μ(a−1, b, h), ν(a−1, b, h)>, μ(a−1, b, h)>θ1, ν(a−1, b,
∽
“∽”is called the ℐ℉ equivalent relation on ∑={aH⏐∀a∈G}, simply ┃ the ℐ℉ equivalent relation. Definition 2.10 [1,2,4,5]. Suppose H is an ℐ℉ subgroup of ℐ℉ group G, if ∀a, b∈G, h)> θ2, then
∀h∈H,
(a○(h○a 1))(b) =<μ*, ν*>, μ*>θ1, , ν*>θ2⇒ b∈H, –
℉ normal subgroup of G, it is denoted H⊳G. ┃ Definition 2.11. Suppose H is an ℐ℉ normal subgroup of ℐ℉ group G, ∀x∈G,
then H is called an ℐ
G/H≜{xH⏐∀x∈G}. Let the operation on G/H be
∨μ((x′,y′,z′),∧ν((x′′,y′′,z′′ )> , where x’H∽x′′H∽xH, y′H∽y′′H∽yH, z′H∽z′′H∽zH, then G/H is an ℐ℉ group about the operation,G/H is called ℐ℉ quotient group. ┃ (xH○yH)(zH)=<
Definition 2.12 (Homomorphism and isomorphism of ℐ℉groups). Suppose G1 and G2 be two ℐ℉ groups, and ϕ: G1→G2 be an ℐ℉ mapping if when (x○y)(z) =<μ, ν>
,
, there is (ϕ(x)○ϕ(y))(ϕ(z))=<μ, ν>, μ>θ1, ν>θ2, then ϕ is called an ℐ℉ homomorphism. If ϕ is the ℐ℉ injection, surjection, and bijection, respectively then ϕ is called
,
ℐ℉ injection homomorphism, surjection homomorphism, and isomorphism, respectively. Lemma 2.1 [6-8]. If an action of a ℐ a finite set S, and
┃
℉ group H with the order of pn (p is a prime) on
S0={x∈S|hx=x for all h∈H}, then |S|≡|S0| (mod p)
┃
3 Some Structures of ℐ ℉ Groups In the section, θ1*>0, θ2*>0, 0≤θ1*+θ2*≤1.
℉
Definition 3.1 (Order of element). For an element in an ℐ group G, denoted a, if there is a positive integer p such that (…(a1○a2)○…○ap)(e)=<θ 1*, θ2*>, then the order of element a is called p. If there is no such p, then the element a is called the element with infinite order. Definition 3.2. Suppose the action of ℐ
,such that (a,x) ⊢→a(x), then
℉ group upon an nonempty set is: G×X→X
┃
Gx={a(x)⏐a∈G, (a○x)(a(x)) =<θ *1,θ *2>] is called the orbit of x. If Gx={x}, then x is called the fixed element of G. If X=G, and
New Structures of Intuitionistic Fuzzy Groups
((a○x)○a-1)(a(x))=<θ*1,θ*2>,
149
┃
a,x∈G, then the orbit of x is called the conjugate class of x. Because e∈G, and (e○x)○e-1)(x) <θ*1,θ*2> therefore , the orbit includes x. Denote x~y⇔∃a∈G such that –
(a○x)○a 1)(y) <θ*1,θ*2>,
“~” is an equivalent relation
,the orbit Gx is just the equivalent class determined
by“~”and x is its representative element. Remark. The notation “~” is different from the notation “ posets.
∽”of equivalent relation of
℉ centralizer) Suppose G is an ℐ ℉ group, for any element x in G, Stab x={a∈G⏐(a○(x○a ))(x) =<θ ,θ >} is a ℐ ℉ –subgroup, it is called the stable ℐ ℉ subgroup, also centralizer, it is denoted by Z (x). ┃ Definition 3.4. (ℐ ℉ index) The number of the left (right) ℐ ℉ coset about H is called as the index H in G, denoted as [G:H]. ┃ Definition 3.5. For an ℐ ℉ group, if its order of each element is the power of some ┃ constant prime p, then the group is called ℐ ℉ p-subgroup. Definition 3.3. (ℐ
-1
* 1
G
*
2
G
Caushy theorem describes the relation between the structure and the number of order of ℐ
℉ groups.
℉
℉ group, p⏐|G|,
groups) If G is a finite ℐ Theorem 3.1. (Caushy theorem of ℐ here p is a prime, then there is an element whose number of order is p. Proof Let n=|G|. Construct a set of n-dimensional vectors S= (…(a1○a2)○…○ap {(a1, a2,…, ap)⏐ai∈G, 1 i =<θ*1,θ*2>}, where =<θ*1,θ*2> (…(a1○a2)○…○ap -1 ⇔ ((…(a1○a2) ○…○ap-1 ○ a p) * * =<θ 1,θ 2>.
≤ ≤p,
)(e)
∴
It is known that
p⏐n. ∴
) e)(
|S|=np-1. |S|≡0(mod p).
)(e)
150
C. Xu
Suppose Zp to be a residue class modulo-p additive group where the set of elements in Zp is denoted by {0,1,2,…,p-1}. For k∈Zp, (a1,a2,…,ap)∈S, let the action of Zp upon the set S be the following cyclic permutation: (k, (a1,a2,…,ap)) ⊢→k((a1,a2,…,ap)) =(ak+!,ak+2,…,ak+p) ∈S
,
the action satisfies: 0((ai,ai+1,…, ap, a1, …,ai-1 )) =(ai,ai+1,…, ap, a1, …,ai-1 ) where the unit element 0∈Zp; (k+k’) ((ai,ai+1,…, ap, a1, …,ai-1 )) =k(k’((ai,ai+1,…, ap, a1,…,ai-1 ))), k,k’ ∈Zp. where (ak+!,ak+2,…,ak+p) ∈S can be verified. Because each element in ℐ its inverse element, thus (a1○(a2○…○(ap-1○ ap)…)) ) = (a2○ (○…○(ap○a1) … ) =… =(ak+1○(ak+2○…○(ak-1○ ak)…)) ) =<θ *1,θ *2>, ⇒(ak+!,ak+2,…,ak+p)∈S. the last step is duo to the definition of S On the other hand, S0={x∈S⏐hx=x,∀h∈Zp}, where x= (a1,a2,…,ap),
,
,
∵
(e ))(e (e
℉ group has
(e,e,…,e)∈S0 ⇒|S0|≠0. (a1,a2,…,ap)∈S0 ⇔ a1=a2=…=ap.
From lemma 2.1, 0≡|S|≡|S0|(mod p).
<∵|S |≠0,∴|S |≥p, ∴∃ a≠e, such that (a, a, …, a) ∈S . ∵S ⊂S, ∴ (a○(a○…(a○a) … ))(e) =[θ ,θ ] . ┃ ∴|a|=p. Theorem 3.2. A finite ℐ℉ group G is an ℐ ℉ p-group⇔⏐G⏐ is a power of p. Proof. If G is an ℐ℉ p-group, and q is a prime divided through by |G|. By ℐ℉ Caushy theorem, G contains an element with order of q. Because the order of is the power of p each element in G, therefore p=q, thus ⏐G⏐ is a power of p. ┃ Theorem 3.3. Suppose x to be an element in ℐ ℉ group, and H= Z (x), then the number of elements of conjugate class of x in G, |Gx|, equals the number of aH in ℐ ℉ quotient group G/H={aH⏐∀a∈G}. 0
*
1
* 2
G
New Structures of Intuitionistic Fuzzy Groups
151
Proof Let x, y, y’∈G. If ∃a, b∈G, they satisfy ((a○x)○a-1)(y) =<θ*1,θ*2> and ((b○x) ○b-1) ○(y’) =<θ*1,θ*2> then x and y are conjugate elements, and x and y’ are also conjugate elements. The two conjugate elements are equal to a○x○a-1=b○x○b-1 ⇔ a○x○a-1=b○x○b-1 -1 ⇔ (b ○(a○x)○a-1)○b)(x) =<θ*1,θ*2> ⇔ (((b-1○a)○x)○(a-1○b))(x) = <θ*1,θ*2> ⇔(((a-1○b)-1○x)○(a-1○b))(x) = <θ*1,θ*2> that is,∴∃z∈G, such that, (a-1○b)(z) =<θ*1,θ*2>
(i)
and ((z-1○x)○z)(x) =<θ*1,θ*2>
(ii)
From Definition 2.9, (i) (ii) is that b ZG(x), z∈ ZG(x) ⇔aZG(x) i.e., aH bH where H= ZG(x). The equal conjugate elements make a, b to be in the same aH, that is, in the defini-
∽
∽ ,
℉quotient group G/H={aH⏐∀a∈G}. ┃ Definition 3.5.′ (ℐ ℉p-group) When the order of finite ℐ ℉ group is the power of prime p, G is called as p- ℐ ℉ group. ┃ Note. Definition 3.5 is equivalent to Definition 3.5′. Theorem 3.4. Suppose the action of ℐ ℉p- G is on finite set X, and |X|=n, then tion 3.11 the number of aH in ℐ
(1) when n and p are coprime, (n, p)=1, there is fixed element in X. (2) if there are t fixed elements in X, then t≡n (mod p). Proof. Assume that there are t conjugate classes Gx1,Gx2,…,Gxr , where Gxt ,I≤t≤r, includes only one element xi ,that is, xi is the fixed element in G.. Therefore, n=|X|=∑ r i =1|Gxi|=t+∑ r i =t+1|Gxi|
When t+1 where
≤i ≤r, Stab x ≨G. G i
H=Z (x ), and G
x
|Gxi|=[G/StabG i] =| G/H |
i
x
|G|=| G/H | ×| [StabG i] |
∵G is ℐ ℉ p-group, i.e., |G|=p , ∴p⏐| G/H | , that is, p⏐|Gx |. r
i
┃
From the above process, t≡n mod p. Specially, when (n,p)=1, t≠0. It means that the number of fixed elements is greater than or equal to 1.
152
C. Xu
4 Conclusions This paper gives and proves four theorems about some structures of ℐ
℉
works have not be seen in available ℐ Gs.
℉Gs. Similar
References 1. Atanassov, K.T.: Intuitutionistic Fuzzy Sets. Fuzzy Sets and Systems 20, 87–96 (1986) 2. Atanassov, K.T.: More on Intuitutionistic Fuzzy Sets. Fuzzy Sets and Systems 33, 37–45 (1989) 3. Li, X.P., Wang, G.J.: Intuitutionnistic Fuzzy Group and its Homomorphic Image. Fuzzy Systems and Mathematics 14(1), 45–50 (2000) 4. Ban, X.G., Liu, F.: Intuitutionistic Fuzzy Group based on Bi-factor Operation. Fuzzy Systems and Mathematics 20(4), 16–21 (2006) 5. Ban, X.G., Liu, F.: The Sub-intuitutionistic Fuzzy Group and Normal Sub-intuitutionistic of Intuitutionistic Fuzzy Group. Fuzzy Systems and Mathematics 20(3), 43–46 (2006) 6. Cao, X.H., Ye, J.C.: Representation Theory of Groups. Peking University Press, Beijing (1998) 7. Cao, X.H.: Basis of Finite Groups Theory. Higher Education Press, Beijing (1992) 8. Wang, E.F.: Basis of Finite Groups. Qinghua University Press, Beijing (2002) 9. Yuan, X.H., Ren, Y.H., Lin, L.: A New Kind of Fuzzy Group. Journal of Liaoning Normal University (Natural Science Edition) 25(1), 3–6 (2002) 10. Yuan, X.H., Zhang, Y.H., Yang, J.H.: Homomorphism of Fuzzy Group. Journal of Liaoning Normal University (Natural Science Edition) 25(4), 340–342 (2002)
An Illumination Independent Face Verification Based on Gabor Wavelet and Supported Vector Machine Xingming Zhang1, Dian Liu2, and Jianfu Chen3 School of Computer Science and Engineering, South China University of Technology 381#, Wushan road, Guang zhou, Guangdong, China, 510640 1 2 3 [emailprotected], [emailprotected], [emailprotected]
Abstract. Face verification technology is widely used in the fields of public safety, e-commerce and so on. Due its characteristic of insensitive to the varied illumination, a new method about face verification with illumination invariant is presented in this paper based on gabor wavelet. First, ATICR method is used to do light preprocessing on images. Second, certain gabor wavelet filters, which are selected on the experiment inducing different gagor wavelet filter has not the same effect in verification, are used to extract feature of the image, of which the dimension in succession is reduced by Principal Component Analysis. At last, SVM classifiers are modeled on the data with reduced dimension. The experiment results in IFACE database and NIRFACE database indicate the algorithm named “Selected Paralleled Gabor Method” can achieves higher verification performance and better adaptability to the variable illumination. Keywords: gabor wavelet, Supported Vector Machine, Face Verification, Illumination.
1 Introduction Face verification is widely used in the field of public security, e-commerce, and access control and so on. It is a 1-to-1 problem, where a user’s biometric data are compared to his/her corresponding biometric template in order to verify whether or not the person he/she claim to be. And supported vector machine has been studied about deeply for face verification. But prior to verification, it is crucial how to extract feature. In recently years, gabor wavelet is used in face recognition due of its characteristic of being insensitive to the varying illumination condition and image texture. And PCA or LDA is always united for reducing dimension because of the huge amount of gabor data on face identification, he performance of LDA is better than PCA. However, YangFei proposed that a algorithm which has good performance in face identification may not have good performance in face verification[1]. Constructing group of classifiers is a good method for better face verification performance, which can improve generalization effect. [2] proposed a paralleled gabor method to make up classifiers, which extracts gabor feature by 40 gabor filters, and uses PCA to reduce dimension. But the method uses too much gabor wavelets to process, generating too much data and wasting long time against real-time application. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 153–160, 2008. © Springer-Verlag Berlin Heidelberg 2008
154
X. Zhang, D. Liu, and J. Chen
A new verification algorithm based on SVM and gabor is presented in this paper, which combines light pretreatment method and constructs group of special classifiers by selected gabor filters. And it has good performance on the experiments. The remainder of this paper is organized as follows. In Section 2, we present the flow of algorithm. In Section 3, we propose pretreatment of ATICR method. In Section 4, PCA and SVM is introduced. In section 5, we explain how to select the gabor filters to construct group of classifiers and the strategy of fusing classifiers. In section 6, we present and discuss our results. Section 7 summarizes the conclusions.
2 Algorithm Design First, an independent sample set is selected as negative for SVM training, which and enrolling sample set as positive form a 2-label question. In the flow of training, first, the training images are pretreated using ATICR method to reduce affect of illumination. Second, the images processed are extracted features by 16 gabor wavelet filters selected. Third, the features are processed with PCA to reduce dimension. At last, the features processed are trained with features of negative sample set to construct 16 classifiers. In the flow of testing, the testing images are processed as first, second and third steps in training flowing, then the features are judged by these 16 classifiers corresponding to the claimed to obtain 16 results, which decide whether accepting or rejecting by majority strategy. The detailed is as followed.
Fig. 1. Flow of algorithm for training and testing
An Illumination Independent Face Verification
155
3 ATICR Illumination Normalization In order to achieve better performance, an illumination normalization is designed. For any to be processed image P, first do the histogram equalization(HE) to improve grey distribution, and get image Ph, Then the image Ph must be processed by two paths, one is to use Affine Transformation(AT) illumination model[6] to get AT effect image Pa, another is to utilize ICR algorithm[3] to process Ph, the processed image is done by AT algorithm to get effect image Pb, finally, We calculate the mean of image Pa and Pb, and the mean is used as the weight that synthesize image Pa and Pb, the result image is illumination normalized image P. It is described as flows.
Fig. 2. Flow of AT-ICR algorithm
The effect of face sets in YaleB database processed by the method is show as Fig 3. It is concluded that images with bad illumination can turn to one of good quality and no degradation by ATICR processing.
Fig. 3. The effort of ATICR processing (YaleB database)
4 Modeling for Support Vector Machine SVM is a linear machine which is able to separate positive and negative examples using decision surface constructed by an optimal separating hyperplane. In theory,
156
X. Zhang, D. Liu, and J. Chen
SVM achieves good generalization performance by produce zero training-error rate and minimizes the Vapnik-Chervonenkis(VC) confidence. The decision surface function can be written as follow:
f ( x) = sgn(∑ yiαi K ( x, si ) + b)
(1)
i∈S
where x is the input to be classified and S = {i | α i ∈ R } is a set of positive coefficients known as Lagrange multipliers. The support vectors, i ∈ S constitutes a small subset of the training data extracted during optimization process. K is a userspecified kernel function. For SVM training, 1-to-1 problem is constructed. An independent sample set can be selected as negative set. Enrolling sets are viewed as positive sets. model of 16 SVM classifiers is proposed for training and verifying. +
5 Gabor Wavelets Selection 5.1 Gabor Feature Gabor wavelets exhibit optimal joint frequency and spatial locality; these properties enable gabor wavelets to capture salient visual features [11]. The gabor wavelets are defined as follows, δ x = δ y = δ as suggested:
<(x, y,Z0,T )
@
1 ((xcosT y sinT )2(xsinT ycosT )2)/2V 2 i(Z xcosT Z ysinT ) Z 2V 2 /2 e 0 e 0 2 ue 0 2SV
>
(2)
Where x, y is pixel position in the spatial domain, ω 0 is the radial center frequency, θ is the orientation of the gabor wavelet, and σ is the standard deviation of the
Gaussian function along the x- and y- axes. And σ = K / ω 0 , K = 2 ln 2 ( 2 − 1) . Here, φ =1, so σ K / ω0 ≈ π / ω 0 .40 different frequency and spatial gabor wavelets can be obtained 2φ + 1
=
φ
by combining ω0 and θ . ω 0 can be selected as ω 05 =
π 8
π
. And θ can be selected as θ = 0 , θ = 8 ,
ω 01 =
π 2
,
ω 02 =
3π π θ= , θ = 8 2
π 2 2
5π ,θ = 8
,ω
3 0
=
π 4
3π ,θ = 4
,ω or
4 0
=
π 4 2
7π θ= 8
or
.
5.2 Gabor Feature Selection As above, we can construct 40 different gabor wavelet filters to extract features which will produce 40 different classifiers using SVM. But the method has many disadvantages such as it wastes too much time and memory and quality of the image has much effect to the result. in farther study, it is discovered that feature extracted by different kernel function has not the same effect in verification. The experiment is done on IFACE database. IFACE is a database with 106 persons with each more than 50 images. And those images are under three different illuminating conditions, A, B, C, as shown in Fig 4. 8 persons with each 10 images under A condition are picked as negative set. Except 8 persons as negative set, 10
An Illumination Independent Face Verification
157
persons with each 10 images under A condition are picked as enrolling set, and 60 persons containing 10 persons enrolling with each 30 images separately under B, C conditions are picked as testing set. Table 1 stats FAR and FRR of 40 SVM classifiers corresponding to different frequency and directional kernel function.
Fig. 4. Images of IFACE database under A,B,C condition Table 1. FAR and FRR of SVM with different frequency and direction
T
Z 01 Z
2 0
Z
3 0
Z 04
Z
5 0
S 2
T
S
T
8
S
T
4
3S 8
T
S
T
2
5S 8
T
3S 4
T
7S 8
FRR
FAR
FRR
FAR
FRR
FAR
FRR
FAR
FRR
FAR
FRR
FAR
FRR
FAR
FRR
FAR
25
0.5
21
0.4
20
0.8
24.1
0.9
25
0.7
25
0.2
25
0.3
22
0.4
13
0.2
12
0.1
18.7
0.3
19.2
0.8
23.3
0.7
22
0.4
14
0.3
15
0.2
12.5
0.2
2.5
0.1
15.1
0.1
17.5
0.3
19.5
0.5
17
0.5
17.5
0.2
13
0.0
12.5
0.3
9
0.2
11.1
0.1
14.3
0.3
16.8
0.4
14
0.2
13
0.4
10
0.2
10.4
0.4
8.5
0.3
12.2
0.4
13.3
0.2
15.1
0.4
13
0.2
12
0.0
10
0.3
S 2 2
S 4 S 4 2
S 8
In table 1, ω0 is frequency, θ is direction of gabor wavelet, the results of FAR and FAR for the test table are presented. A conclusion is drawn that not all gabor functions have the same effect in face verification, of which some have higher FRR and FAR. Considering number of PCA kernels, restrict of memory and processing speed, 16 gabor functions which have better identifying rate and lower FAR are picked in the algorithm, as well as 16 SVM classifiers corresponding with. 5.3 Fusion Strategy There are many methods of fusing classifiers, such as average, Maximum, Majority, Borda, Nash and so on[4]. This algorithm selects 16 classifiers with better performance and almost same cost. But because of yawp of image and information compressibility of PCA projecting, the scores of SVM are not accurate weighting value, which induce majority strategy selected to fuse.
158
X. Zhang, D. Liu, and J. Chen
6 Experiments and Discussion Two database, the IFACE and NIRFACE, are used to demonstrate the effectiveness of SPGM comparing with Parallel Gabor method[2], Eigenflow[25], IESM[25], MACE[8] proposed by Carnegie Mellon university and FLD+SVM method[9]. In IFACE, 8 persons with each 10 images under A condition are selected as negative set. Except the 8 persons, 30 persons with each 10 images under A condition as register set, and 60 persons containing 30 enrollers with each 30 images separately under A,B,C condition as test set . The result is shown as table 2. NIR camera is used to capture images to construct NIRFACE database in order to better reduce the effect of varied illumination [10]. It is a database with 50 persons with more than 50 images per person under one condition. And there are three
Fig. 5. Images under A,B,C illumination condition in NIRFACE Table 2. FAR and FRR on IFACE databace
algorithm
Paralleled gabor
Light pretreat ment
----
Gabor number
40
IFACE database
fusion
A average
B
FRR
FAR
10.00
FAR
FRR
FAR
17.6
60.0
7 ATICR
40
average
6.67
10.0
0 0
SPGM
ATICR
16
average
2.11
C
FRR
7.44
41.3
3 0
30.5
6
16 FLD+SVM Eigenflow IESM MACE
majority
ATICR
_______
0.5
2.22
0.01
9.56
0.02
1.67
2.88
8.67
7.56
30.3
17.45
30.4
14.5
50.5
4
6
4
. 18.44
25.33
15.44
7.64
6.48
8.43
3
40.3
10.5
62.4
3
6
3
20.4
15.6
40.4
4
6
3
30.44
18.44
20.67
An Illumination Independent Face Verification
159
Table 3. FAR and FRR on NIRFACE database
algorithm
FRR
SPGM FLD+SVM
NIRFACE database B C
A 0 1.25
FAR 0 1.44
FRR 0.25 7.00
FAR 0 3.56
FRR
FAR
2.25
0.01
17.25
10.44
different illuminating conditions, A, B, C, as IFACE. The images captured are shown as Fig 6. 8 persons with each 10 images under A condition is selected as negative set. Except the 8 persons, 20 persons with each 10 images under A condition selected as enrollers, while 40 persons containing 20 enrollers with each 40 images under A, B, C condition as tester. The result is shown as table 3. Table 2 shows the evolution of FAR and FRR for IFACE image tests and the five methods considered here, while Table 3 for NIR image tests and two methods. In Table 2, we can see that ATICR method is effective to reduce effect of varied illumination. And for selected gabor method, the raw average always produces the lowest error. However, for majority principle of selected gabor method, FRR reduces greatly to yield the lowest error while FAR almost not rises up. Especially in NIR test sets, the algorithm has a better performance suitable for application.
7 Conclusion By analysis of existing face verification algorithm with SVM, this paper proposes a new method based on Gabor and SVM with insensitive illumination. And 16 Gabor wavelets are selected to construct group of classifiers on stat of FAR and FRR for better performance. Referring to the experiment results above, the algorithm perform well under varying illumination in contrast to standard algorithms.
References 1. Yang, F., Shan, S., Ma, B., Chen, X., Gao, W.: Using Score Normalization to Solve the Score Variation Problem in Face Authentication. In: Li, S.Z., Sun, Z., Tan, T., Pankanti, S., Chollet, G., Zhang, D. (eds.) IWBRS 2005. LNCS, vol. 3781, pp. 31–38. Springer, Heidelberg (2005) 2. Serrano, A., Diego, I., Conde, C., Cabello, E., Bai, L., Shen, L.: Fusion of Support Vector Classifiers for Parallel Gabor Methods Applied to Face Verification. In: Haindl, M., Kittler, J., Roli, F. (eds.) MCS 2007. LNCS, vol. 4472, pp. 141–150. Springer, Heidelberg (2007) 3. Han, J., Bhanu, B.: Statistical Feature Fusion for Gait-based Human Recognition. CVPR (2004) 4. Kamel, M., Wanas, N.: Data Dependence in Combining Classifiers. In: Windeatt, T., Roli, F. (eds.) MCS 2003. LNCS, vol. 2709, pp. 1–14. Springer, Heidelberg (2003) 5. Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Neuroscience 3, 72–86 (1991)
160
X. Zhang, D. Liu, and J. Chen
6. Zhu, J., Liu, B., Schwartz, S.: General Illumination Correction and its Application to Face Normalization. In: Proceeding of AMFG (2003) 7. Ko, J., Kim, E., Byun, H.: A Simple Illumination Algorithm for Face Recognition. In: Ishizuka, M., Sattar, A. (eds.) PRICAI 2002. LNCS (LNAI), vol. 2417. Springer, Heidelberg (2002) 8. Venkatargmani, K., Qidwai, S., Vijayakumar, B.: Face Authentication form Cell Phone Camera Images with Illumination and Temporal Variations. IEEE trans. on SMC, 411–418 (2005) 9. Zhang, X., Li, H.: A Face Verification Based on Negative Independent Sample Set and SVM. J. Comput. Res. Dev. 2138–2143 (2006) 10. Li, S., Chu, R., Liao, S., Zhang, L.: Illumination Invariant Face Recognition Using NearInfrared Images. IEEE Trans. Pattern Anal. Mach. Intel. 29(4) (2007) 11. Daugman, J.G.: Complete Discrete 2-D Gabor Transforms by Neural Networks for Image Analysis and Compression. IEEE Trans., Acoustics, Speech, Signal Processing 36(7), 1169–1179 (1988)
Hardware Deblocking Filter and Impact Hao Lian and Mohammed Ghanbari Department of Computing and Electronic Systems, University of Essex, UK [emailprotected], [emailprotected]
Abstract. Due to the use of the digital video, video block error including watermarking error in the digital video may cause serious error propagation and quality degradation. A deblocking technique is applied for digital watermarking by graphics process. In the study the processes of constructing and reconstructing a digital video frame are described on the basis of the theory of video coding. The conditions for deblocking the digital video frames are investigated in detail. The validity of the present method is verified by experimental data and the simulation result shows with GPU deblocking support us to improve the video quality effectively. In this paper we will introduce a new deblocking filter which based on hardware GPU which applied in our watermarking system. Keywords: Copyright, Hardware, Debclokging, watermarking.
1 Introduction Nowadays due to the graphics processing is getting more complicated, the hardware designer invented a new concept which is call GPU (Graphics processing unit) it is designed to handle the most graphics process in applications including digital video process of course. On the other hand many error recovery techniques have been applied to video coding. Due to that we analyzed those typical video errors and then applied the GPU deblocking filter on watermarked video which contains coding errors. According to them we applied our deblocking filter which base on GPU techniques on Pay-tv system in order to taking edge. The filter is intended for recovering quality of compressed video like digital TV program. The filter automatically determines the block strength on the frame and it will remove the blocks effectively.
2 Video Error Origins The process of data compression by redundancy removal or reduction is called source encoding. There are two basic types of redundancy: statistical, which is spatial and temporal redundancy and is present because certain spatial patterns are probable than others; and psycho visual where the human eye is insensitive to certain spatial frequencies. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 161–168, 2008. © Springer-Verlag Berlin Heidelberg 2008
162
H. Lian and M. Ghanbari
In the video coding system, if any error has been produce then the error spread in both spatial domain and temporal domain (as below figure shown), so the video quality would be degraded. And the errors may cause serious error propagation and quality degradation. In order to protect video frames against encoding errors we need find the origin of errors firstly.
Fig. 1. Error propagate in the video sequence
After analyzing different kinds of error blocks, we sort them into two different categories as different origin. In one category the error blocks can be tracked, and those error blocks that could not be tracked belong to another category [1-2]. *Error could be tracked category 2.1 Wrong Macro Blocks Number Occurred (Testing Foreman video as example)
No error
Error occurred but corrected
Fig. 2. Macroblocks Error
2.2 False Alarms
No error
Error occurred but un-corrected
Fig. 3. False Alarms Error
Hardware Deblocking Filter and Impact
163
2.3 Incorrect Code Words For a example, If cannot find a exact word in VLC table which is matched with the VLC word in the video coding, that means that word must be wrong. 2.4 The Range of Quantization Coefficients According to H.263 rules, we usually dentate quantization coefficients with five binary words, and the word range is 1~31.so firstly if there is one coefficient out of the range then this one must be an error word. Secondly the five binary words can not be all zeros. 2.5 Unusual DC Coefficients According to intra-mode table, we can see that the DC coefficients can not be 1000 0000 or 0000 0000. If we find any coefficient like that, that means error occurred. 2.6 Coefficients Number The coefficients number must be 64 if 8x8 standards have been used. 2.7 Wrong MB Blocks Each MB contains four luminance blocks and two colour blocks. So if the blocks number can not amount to six’s multiple, then error occurred. *Error could not be tracked category 2.8 Wrong Macro Blocks Number (in GOP) For example in a 176x144 video sequence, there are 11 Macro Blocks in one GOP, if the MB number get wrong after coding, it means error occurred but untraceable. 2.9 Illogical GOP Lengths Once the length between GOP start code and the code after coding, we can judge that at least error occurred in the coding process. Usually error correction codes could be used so these errors could be detected and corrected. However it may cause bit rate increase. After this errors analysis, let is see what the GPU deblocking can do with these errors.
3 Deblocking Ability of GPU A Graphics Processing Unit or GPU is a dedicated graphics rendering device for a personal computer or game console. Modern GPUs are very efficient at manipulating
164
H. Lian and M. Ghanbari
and displaying computer graphics, and their highly-parallel structure makes them more effective than typical CPUs for a range for complex algorithms. Because the GPU is processing transform and lighting, those calculations are offloaded from the CPU and result in decreased CPU load, for example digital video develped in VLC such as MPEG2, MPEG-4, geometry transformations are accomplished via hardware acceleration and is mathematically intensive work.
Fig. 4. Basic structure of deblocking process
As the above shown, the process is for whole MPEG2 decoding process, and GPU can accelerate all the processes in green parts, including IDCT (Inverse Discrete Cosine Transformation) and MC (Motion Compensation). Both of them are very important in MPEG2 processing. In video coding process, transform process is very necessary part as well. According to the theory, to transform is to change in composition or structure. In its simplest terms the word transform, as it applies to 3D graphics, is to handle scenes changing from one frame to the next. Moving an object is a transformation referred to as translation. Other types of transformations include moving the point of view, zooming, scaling (changing an object size) and rotation. As objects are transformed in a digital video, their positions must be calculated at rates of million of times per minute. The calculation is based on linear algebra (in particular matrix multiplication operations) The math required to compute transforms is straightforward, but the number of operations need to compute a single transform consists of 16 multiplication and 12 addition operations. The concept of the video deblocking is the graphics card features hardware video deblocking, reducing the appearance of pixel blocks in real time when displaying highly compressed video streams. Deblocking Filter Mode: This mode introduces a deblocking filter inside the coding loop. Unlike in post processing filtering, predicted pictures are computed based on filtered versions of the previous ones. A filter is applied to the edge boundaries of four luminance blocks two Chrominance blocks. The filter is applied to a window of four edge pixels in the horizontal direction, and it is then similarly applied in the vertical direction. The weight of the filter’s coefficients depends on the quantization step size for a given macro block, where stronger coefficients are used for a coarser quantization. This mode also allows the use of four motion vectors per macro block,
Hardware Deblocking Filter and Impact
165
as specified in the advanced prediction mode of H.263, and also allows motion vectors to point outside picture boundaries, as in the unrestricted motion vector mode. The above techniques, as well as filtering, result in better prediction and a reduction in blocking artifacts. The computationally expensive overlapping motion compensation operation of the advanced prediction mode is not used here in order to keep the additional complexity of this mode minimal. [5] The deblocking filter mode improves subjective quality by removing blocking common to block based video coding as low bit rates. Many applications make use of a post process filter to reduce these artifacts. This post process filter is usually present at the decoder and is outside the coding loop. In windows operating system, VMR (Video Mixing Rendered) could be support by DirectX very well. Due to different graphic card has different VMR (Smoothing and Deblocking) ability, then we classify as: VMR7 is always requiring high frequency for CPU. VMR9 is a product released after DirectX 9.0 and it support Pixel shader technique. VMR9 is design for not only image but also digital video, so it has better precision than provious ones. All in one we can get much better effection if run the VMR9 option on the graphic card which is support with Diector 9.0C.
4 Deblocking Impact The below example shown the impact of hardware deblocking on video frame. The deblocking is reducing the apperence of pixel “blocks” in real time when displayging noisy quality video including watermarked video certainly.
Fig. 5. Deblocking effection
Also there is another special function be supported by most recently graphic cards, we call FSAA (Full scene anti-aliasing). It also effectually works on the watermarked video. [12] From all the examples about we can seen that once the deblocking function is active the quality of the process video is enhanced a lot. Now we link this hardware deblocking with our watermarking algorithm, see the exactly effection from the experimental results as:
166
H. Lian and M. Ghanbari
Deblocking disabled (107th frame)
Deblocking enabled
Fig. 6. Deblocking Filter Impact on Video Image
On the other hand, the GPU deblocking is also impact watermark which has been embedded into video.
Deblocking disabled
Deblocking enabled
Fig. 7. Deblocking Filter Impact on Copyright Watermark
As the example shown, the detected watermark is acceptable and we still can recognize the detected watermark even more noise bit caused by deblocking filter. Simulation is performed by using MPEG-2 for DCT based video watermarking. The H.263 quantization method is adopted. Each test video sequence has 50 frames and only the first frame is coded as an intra frame. The proposed deblocking filter is applied for the entire block boundaries along the horizontal edges first, and if a pixel value is changed by the previous filtering operation, the updated pixel value is used for the next filtering. [11] [9] [5] [2] From the above flow chart the blue line represents the PSNR of the original FOREMAN video sequence with deblocking function; the red line represents the
Fig. 8. Flow chart of deblocking effection
Hardware Deblocking Filter and Impact
167
PSNR of the same video sequence without deblocking filter. As we can see that the video sequence could be enhanced with deblcoking, the max hence is more than 5db (PSNR). Under the effecting of the deblocking, the quality of our watermarked video could be improved. And on the other hand, the impact of the watermark signature is still acceptable. It requires lots of computing power to decode MPEG2 streams to uncompressed video. Graphics chips manufactures integrated some function of the MPEG2 decoding algorithm in their graphics chip to aid decoding of the MPEG2 streams. So Microsoft Corp made a common API that MPEG2 decoding programs can make use of the Graphics chips capability of MPEG2 decoding regardless of the graphics chips used. This driver is called DxVA (DirectX video acceleration) Actually not all graphics chips and or driver combination support DxVA. According to DxVA standard, there are three acceleration levels for graphics chips. Level 1: Motion Compensation (MC) Acceleration Level 2: Inverse DCT (IDCT) acceleration + Motion Compensation (MC) Acceleration Level 3: Variable Length Decoding (VLD) acceleration + Inverse DCT (IDCT) acceleration + Motion Compensation Acceleration At present even the most current graphic cards just support to level 2 and only few types can support level 3. As we can image that some unfashionable cards support level 1, such as Geforce Ti 3 or Geforce Ti 4 from Nvidia Company.
5 Conclusions And using a better graphic card which is support Microsoft Director 9.0C standard would create an brilliant effection on MPEG process, and as well as watermarking. Best choices as: ATI X800 Graphic card, NVIDIA Geforce 6600 Graphic card and S3 Deltachroms S8 Graphic card. Many deblocking methods for video watermarking were discussed, along with this affect of GPU deblocking method id easy to computer and impressive.
References 1. ITU-T Recommendation H.263: Video Coding for Low Bit Rate Communication, ITU (May 1996) 2. ITU-T Recommendation H.263+: Video Coding for Low Bit Rate Communication, ITU (1997) 3. MPEG-2 Video and System International Standard, ISO-IEC/JTC1/SC29/WG11 MPEG94 (November 1996) 4. ITU-T/SG15, Video Codec Test Model, TMN5 (January 1995) 5. Andreadis, A., Benelli, G., Garzelli, A., Sudini, S.: FEC Coding for H.263 Compatible Video Transmission. In: International Conference on Video Processing, vol. 3, pp. 579– 581 (1997)
168
H. Lian and M. Ghanbari
6. Shyu, H.C., Leou, J.J.: Detection and Concealment of Transmission Errors in MPEG-2 Genetic Algorithm Approach. IEEE Trans. Circ. Syst. Vid. Technol. 9(6), 937–948 (2000) 7. Pickering, M.R., Frater, M.R., Amold, J.F.: A Statistical Error Detection Technique for Low Rate Video. In: TENCON 1997, IEEE Region 10 Annual Conference, Speech and image Tech for Computing and Telecommunication, vol. 2, pp. 773–776 (1999) 8. Aign, S., Fazel K.: Temporal and Sptail Error Concealment Techniques for Hierarchical MPEG-2 Video Codes. In: Process. Inform. 1778–1783 (1995) 9. Lee, T.H., Chang, P.C.: Error Robust H.263 video Coding with Video Segment Regulation and Precise Error Tracking. Conditionally accepted by IEICE Trans on Communications (2000) 10. Kim, C.S., Kim, R.C., Lee, S.U.: Robust Transmission of Video Sequence over Noisy Channel Using Parity-Check Motion Vector. IEEE Trans. Circ. Syst. Vid. Technol. 9(7), 1063–1074 (2000) 11. Chen, M.J., Chen, L.G., Weng, R.M.: Error Concealment of Lost Motion Vectors with Overlapped Mption Compensation. IEEE Trans. Circ. Syst. Vid. Technol. 7(3), 564–568 (1999) 12. Chang, P.C., Lee, T.H.: Precise and Fast Error Tracking for Error-Resilient Transmission of H.263 Video. IEEE Trans. Circ. Syst. Vid. Technol. 10(4) (2000)
Medical Image Segmentation Using Anisotropic Filter, User Interaction and Fuzzy C-Mean (FCM) M.A. Balafar1, Abd. Rahman Ramli1, M. Iqbal Saripan1, Rozi Mahmud2, and Syamsiah Mashohor1 1
Dept of Computer & Communication Systems, Faculty of Engineering, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia [emailprotected], [emailprotected], [emailprotected], [emailprotected] 2 Faculty of Medicine, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia [emailprotected]
Abstract. We proposed a new clustering method based on Anisotropic Filter, user interaction and fuzzy c-mean (FCM). In the postulated method, the color image is converted to grey level image and anisotropic filter is applied to decrease noise; User selects training data for each target class, afterwards, the image is clustered using ordinary FCM. Due to in-homogeneity and unknown noise some clusters contain training data for more than one target class. These clusters are partitioned again. This process continues until there are not such clusters. Then, the clusters contain training data for a target class assigned to that target class; Mean of intensity in each class is considered as feature for that class, afterwards, feature distance of each unsigned cluster from different class is found then unsigned clusters are signed to target class with least distance from. Experimental result is demonstrated to show effectiveness of new method. Keywords: Anisotropic filter, medical image segmentation, user interaction, fuzzy c-mean (FCM)
1 Introduction Due to advances much attempted thus achieved in computer technologies, the number of application for digital image processing is increasing specially through recent years [2]. Medical images almost store, represent digitally [2]. Medical image types mostly are ultrasound images, X-ray computed tomography, digital mammography, magnetic resonance image (MRI), and so on. [3]. Data acquisition, processing and visualization techniques facilitate diagnosis. Medical image segmentation has very important rule in many computer aided diagnostic tools. These tools could save clinicians time by simplifying the time consuming process [4]. Main part of these tools is to design an efficient segmentation algorithm. Medical images mostly contain unknown noise, inhomogeneity and complicated structure. Therefore, segmentation of medical images is a challenging and complex task. Medical image segmentation has been an active research area for a long time. There are many segmentation algorithms [15] but there is D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 169–176, 2008. © Springer-Verlag Berlin Heidelberg 2008
170
M.A. Balafar et al.
not a generic algorithm for totally successful segmentation of medical images. Fuzzy Clustering is most popular unsupervised learning. Expectation-maximization (EM) and fuzzy c-mean (FCM) are the most popular fuzzy clustering algorithms. EM algorithm is used for segmentation of brain MR [16]. EM algorithm models intensity distribution as normal distribution of image, which is untrue, especially for noisy images [16]. FCM just consider intensity of image and in noisy images, intensity is not trustful. Therefore, this algorithm has not good result in low contrast, inhomogeneity and noisy images. Many algorithms introduced to make FCM robust against noise but nevertheless most of them were and are flawless to some extent [5, 6, 7, and 8]. In this paper, a new clustering method based on FCM is proposed. Combination of smoothing image by Anisotropic Filter, user interaction and FCM is used to propose a clustering method more robust against noise and in-homogeneity. In the rest of this paper detain of new method, anisotropic filter and ordinary FCM are explained then experimental results are used to demonstrate effectiveness of our method.
2 Methodology Noise is decreased in image; image is converted to grey level image and grey level of pixels is used for clustering. Some times, due to inequality of content with semantic, in-homogeneity, low contrast or noise, automatic clustering methods fail to segment image correctly and clustered image by automatic methods either has two or more target class in one cluster or one target class in two or more clusters. Fig. 1 demonstrates a PD brain image; user selected training data for each class and its different clusters using ordinary FCM. As it is obvious, FCM clustered target classes 1 and 2 in one cluster. In this paper, user interaction is used to solve this problem. User selects several training data interior of each target class. For each target class, the number of pixels, to be selected by user, is specified by in-homogeneity in target class and requested accuracy by user. After user interaction, ordinary FCM is applied to image. To solve the problem of two or more target class in one target class, clusters with training data for more than one target cluster, partitioned again using FCM. Fig. 2 demonstrates cluster 1 and its two partitions. Still, sub cluster in fig. 2. (c) has training data for two classes 1, 2. Therefore, again, it is partitioned using FCM. Fig. 3 demonstrates sub cluster in fig. 2. (c) and its two partitions using FCM. This process continues until there are not any cluster contain training data for more than one target class. Then, clusters contain training data for a target class assigned to that target class. The sub clusters in fig. 2. b
Fig. 1. A PD image, user selected data for each class and different clusters of image using FCM
Medical Image Segmentation Using Anisotropic Filter, User Interaction and FCM
(a)
(b)
171
(c)
Fig. 2. Part (a)-(c) demonstrate cluster 1 of image in fig. 1. (a) and its two partitions using FCM
(a)
(b)
(c)
Fig. 3. Part (a)-(c) shows sub cluster in fig. 2. (c) and its two partitions using FCM
Fig. 4. From left to right demonstrates combination of sub cluster in fig. 2. b and fig. 3. b; sub cluster which is partitioned from cluster 1 in fig. 1. (a); combination of clusters 2, 3 in fig. 1.
and fig. 3. b contain training data for class 1. Therefore, they are assigned to class 1 (first image in fig. 4). The sub clusters 2 and 3 in fig. 1. b contain training data for class 3. Therefore, they are assigned to class 3 (second image in fig. 4). The clusters 1, 2 in fig. 1 contain training data for class 3. Therefore, they are assigned to class 3 (third image in fig. 4). Sometimes, some of clusters are very small. Therefore, user doesn’t select them as pattern. To solve this problem, mean of intensity in clusters of each class is considered as feature for the class. Feature distances of each unsigned cluster from different classes are found. Unsigned clusters are signed to target class with least distance from. The steps of our method are as follow: 1. Anisotropic filter is used to decrease noise in image. Output is image with less noise. 2. User selects several training data interior of each target class. The number of training data, for each target class, depends on in-homogeneity in image and requested segmentation accuracy.
172
M.A. Balafar et al.
3. Ordinary FCM is applied to image. 4. Sometimes due to in-homogeneity, automatic methods fail to separate target classes. To solve this problem, user selected data is used. Clusters in which there are training data for more than one target cluster, partitioned again by FCM. 5. The previous process continues until there is not such cluster. 6. Clusters contain training data for a target class assigned to that target class. 7. It is possible, after partitioning in step 4, there were clusters without any training data. These clusters would be unassigned clusters. To solve this problem, mean of intensity in each class is considered as feature for that class; Feature distance of each unsigned cluster, from different classes, is found; then, unsigned clusters are signed to target class with least distance from. 2.1 Noise Reduction Perno and Maik [10] proposed anisotropic diffusion Process. The equation is I t ( x , y , t ) = ∇ .C (| ∇ I ( x , y , t ) |) ∇ I ( x , y , t ), .
(1)
Where I(x, y, t) is intensity of input image, t is the iteration number and C ( ∇ I ( x , y , t )) is a monotonically decreasing diffusion function of the image gradient magnitude. The gradient magnitude in boundary of region is higher than interior, and diffusion function is monotonically decreasing. Therefore, the diffusion process happens in regions’ interior faster. You et al. [11] proposed the following equation for diffusion function: ⎧1 / T , x < T . C (x) = ⎨ ⎩1 / x , x >= T
(2)
And Zhigeng et.al [12] used the following gauss function as diffusion function: C ( ∇ I ) = e |∇ I |
2
/ 2K
2
(3)
.
Where parameter K, is the average gradient magnitude in the neighbour of each pixel and specify degree of diffusion. Catte et al [9, 13] used ∇ | Gσ * u | as input for diffusion function which cause smoothing image using Gaussian filter. Jijun Ren and Mingyi He [14] proposed following equation C ( s ) = 1 /( 1 + K )
(4)
Where parameter K, is the average of difference of gradient magnitude and maximum gradient magnitude in the neighbor of each pixel. 2.2 FCM FCM is a clustering algorithm introduced by Bezdek based on minimizing an object function as follow [8] J
q
=
n
m
i=1
j =1
∑ ∑
u ijq d ( x i , θ j )
.
(5)
Medical Image Segmentation Using Anisotropic Filter, User Interaction and FCM
Where d is distance between data membership of data
173
xi and centre of the cluster j, θ j and U is fuzzy
xi to cluster with centre θ j
u ij ∈ [ 0 ,1 ],
∑
n j =1
u ij = 1 & 0 <
∑
n j =1
u ij < n .
(6)
The membership function and centre of each cluster obtained as follow U
ij
= 1/
m
∑
( d ( x i , θ j ) / d ( x i , θ k ))
θ
=
k =1
j
N
∑
i =1
N
U
q ij
xi /
∑
i =1
U
q ij
.
( 2 / 1−1)
.
(7) (8)
Where q specifies degree of fuzziness in clustering. FCM optimizes object function by continuously updating membership function and centres of clusters until optimization between iteration is less than a threshold.
(a)
(b)
(c) Fig. 5. Experimental results of applying ordinary FCM (b) and our algorithm (c) on a PD image (a). Part (b) from left to right shows different parts of image with applying ordinary FCM algorithm and part (c) from left to right shows different parts of image with applying our algorithm.
174
M.A. Balafar et al.
3 Implementation We simulated our algorithm using the matlab. The simulated brain images from Brain Web [1] are used to evaluate our algorithm. Figure 1 shows experimental result using a PD brain image. Part a demonstrate original image and the same image after conversion to grey level, part b, from left to right demonstrates different clusters of image with applying ordinary FCM algorithm and part b, from left to right shows segmented image and its 4 different clusters with applying our algorithm. Our algorithm do better in this experiment and ordinary FCM fails to segment image properly. In fig. 5. (b), it is obvious that ordinary FCM fails to separate different parts of brain and they are clustered jointly in one cluster. The reason for that is low contrast. Our algorithm solves this problem using user selected data to more separate clusters with training data for more than one target class. Of curse the quality of separation depends on accuracy of training data. Results are without de-noising. Figure 2 demonstrates another experimental result using a PD brain image. Order is the same as previous
(a)
(b)
(c) Fig. 6. Experimental results of applying our ordinary FCM (b) and our algorithm (c) on a PD Image (a). Part (b) from left to right shows different parts of image with applying ordinary FCM and part (c) from left to right shows different parts of image with applying our algorithm.
Medical Image Segmentation Using Anisotropic Filter, User Interaction and FCM
175
experiment. Our algorithm does better in this experiment too and ordinary FCM fails to segment image properly. In fig. 6. (b), it is obvious that ordinary FCM fails to separate different parts of brain and they are clustered jointly in one cluster. Our algorithm solves this problem using user selected data to more separate clusters with training data for more than one target class and to join cluster with training data for same target class.
4 Conclusion A new clustering method based on anisotropic filter, user interaction and FCM is proposed. Smoothed image is used as input to FCM. User selects training data for each target class. FCM is applied to pre cluster the image. Due to in-homogeneity and unknown noise some clusters contain training data for more than one target class. User selected training data is used to specify this clusters. Specified clusters partitioned again then clusters are mapped to target classes based on training data. Experimental result shows the effectiveness of new method. In future, we consider to apply new method to different types of medical images and to compare its effectiveness over other clustering methods. Also, we consider doing segmentation based on the mixture of our method with different methods like active control, Multi scale FCM, Statistical methods and mix the results to have more accurate segmentation in abnormal diagnosis or different important matters in medical images. Also we consider adding useful aspects of other methods to our method.
References 1. BrainWeb [Online], www.bic.mni.mcgill.ca/brainweb/ 2. Chang, P.L., Teng, W.G.: Exploiting the Self-Organizing Map for Medical Image Segmentation. In: CBMS, pp. 281–288 (2007) 3. Jan, J.: Medical Image Processing Reconstruction and Restoration Concepts and Methods. CRC, Taylor (2005) 4. Jiang, Y., Meng, J., Babyn, P.: X-ray Image Segmentation using Active Contour Model with Global Constraints. 240–245 (2007) 5. Hall, L.O., Bensaid, A.M., Clarke, L.P., Velthuizen, R.P., Silbiger, M.S., Bezdek, J.C.: A Comparison of Neural Network and Fuzzy Clustering Techniques in Segmenting Magnetic Resonance Images of The Brain. IEEE Trans. Neural Netw. 3(5), 672–682 (1992) 6. Acton, S.T., Mukherjee, D.P.: Scale Space Classification Using Area Morphology. IEEE Trans. Image Process. 9(4), 623–635 (2000) 7. Zhang, D.Q., Chen, S.C.: A Novel Kernelized Fuzzy C-means Algorithm With Application in Medical Image Segmentation. Artif. Intell. Med. 32, 37–52 (2004) 8. Dave, R.N.: Characterization and Detection of Noise in Clustering. Pattern Recognit. Lett. 12, 657–664 (1991) 9. Catte, F., Lions, P.L., Morel, J.M.: Col and Edge Detection by Nonlinear Diffusion. 92(12), 182–193 (1992) 10. Perona, P., Malik, J.: Scale-Space and Edge Detection Using Anisotropic Diffusion. IEEE Trans. Pattern Anal. Mach. Intel. 12(7), 629–639 (1990)
176
M.A. Balafar et al.
11. You, Y.L., Xu, W., Tannenbaum, A., Kaveh, M.: Behavioral Analysis of Anisotropic Diffusion in Image Processing. IEEE Trans. Image Process. 5(11), 1539–1553 (1996) 12. Pan, Z.G., Lu, J.F.: A Bayes-Based Region-Growing Algorithm for Medical Image Segmentation. IEEE Computing in Science & Engineering 9(4), 32–38 (2007) 13. Cattle, F., Coll, T., Lions, P.L., Morel, J.M.: Image Selective Smoothing and Edge Detection by Nonlinear Diffusion. SIAM J. Number. Anal. 92(12), 182–193 (1992) 14. Ren, J.J., He, M.Y.: A Level Set Method for Image Segmentation by Integrating Channel Anisotropic Diffusion Information. Second IEEE conf. IEA. pp. 2554–2557 (2007) 15. Pohle, R., Toennies, L.D.: Segmentation of Medical Images using Adaptive Region Growing. Proc. SPIE, Medical Imaging. 4322 (2001) 16. Shen, S., Sandham, W., Granat, M., Sterr, A.: MRI Fuzzy Segmentation of Brain Tissue Using Neighbourhood Attraction with Neural-Network Optimization. IEEE Trans. Inform. Tech. Biomedicine 9(3), 459–467 (2005)
Medical Image Segmentation Using Fuzzy C-Mean (FCM), Learning Vector Quantization (LVQ) and User Interaction M.A. Balafar1, Abd. Rahman Ramli1, M. Iqbal Saripan1, Rozi Mahmud2, and Syamsiah Mashohor1 1
Dept of Computer & Communication Systems, Faculty of Engineering, University Putra Malaysia, 43400 Serdang, Selangor, Malaysia [emailprotected], [emailprotected], [emailprotected], [emailprotected] 2 Faculty of Medicine, Universiti Putra Malaysia, 43400 Serdang, Selangor, Malaysia [emailprotected]
Abstract. Accurate segmentation of medical images is very essential in medical applications. We proposed a new method, based on combination of Learning Vector Quantization (LVQ), FCM and user interaction to make segmentation more robust against inequality of content with semantic, low contrast, in homogeneity and noise. In the postulated method, noise is decreased using Stationary wavelet Transform (SWT); input image is clustered using FCM to the n clusters where n is the number of target classes, afterwards, user selects some of the clusters to be partitioned again; each user selected cluster is clustered to two sub clusters using FCM. This process continues until user to be satisfied. Then, user selects clusters for each target class; user selected clusters are used to train LVQ. After training LVQ, image pixels are clustered by LVQ. Segmentation of simulated and real images is demonstrated to show effectiveness of new method. Keywords: Learning Vector Quantization (LVQ), medical image segmentation, user interaction.
1 Introduction Image segmentation is an essential stage and a fundamental task in many computer vision applications [5]. It is very important in the object oriented coding, intelligence video surveillance, robotic vision and so on [18]. Lots of researches have been done in field of image segmentation and different methods suggested for image segmentation which most of them are not flawless [12]. Segmentation is done in preliminary stage of most of computer aided diagnosis. Medical image segmentation is very important step in many medical applications such as 3D visualization, quantitative analysis and image guided surgery [6], quantification of tissue volumes, diagnosis based on anatomical structures, tissue characterizations [7], medical diagnosis and computer aided surgical operation. Despite improving the treatment and diagnosis of D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 177–184, 2008. © Springer-Verlag Berlin Heidelberg 2008
178
M.A. Balafar et al.
disease due to use of medical imaging techniques, the accurate segmentation is major obstacle in medical applications [10]. Usually medical image have unknown noise, inhomogeneity and low contrast which make image segmentation a challenging and complex task. The brain images segmentation is a complicated and challenging task. However, accurate segmentation of these images is very important for detecting tumors, edema, and necrotic tissues. Accurate detecting of these tissues is very important in diagnosis systems. Magnetic resonance imaging (MRI) is an important imaging technique for the detecting abnormal changes in different part of brain in early stage. MRI imaging is popular to obtain image of brain with high contrast. MRI acquisition parameters can be adjusted to give different grey level for different tissue and various types of neuropathology [2]. MRI images have good contrast in compare to computerized tomography (CT). Therefore, most of researches in medical image segmentation use MRI images. Clustering methods widely used for medical image segmentation. Expectationmaximization (EM) and fuzzy c-mean (FCM) are the most popular clustering algorithms. EM algorithm is used for segmentation of brain MR [12, 17]. EM algorithm models intensity distribution as normal distribution of image which is untrue, especially for noisy images [12].Applying fuzzy c-mean algorithm had good result on images without Noise But accuracy of it in noisy images isn’t enough [9] and Medical images mostly are noisy. FCM just consider the intensity of image and in noisy images intensity is not trustful. Therefore, this algorithm has not good result in noisy images [12]. The FCM is used by many researchers for medical image segmentation [12]. Accuracy of FCM and neural network are compared in [9]. FCM had better result in normal images but worse in abnormal images. Medical images mostly contain noise, low contrast and in-homogeneity and like other intensity based segmentation methods, the FCM is very sensitive to these problems. Therefore, for segmentation of medical images, FCM should be improved to be robust against these problems. Many algorithms proposed to make FCM robust against noise, low contrast and in homogeneity [3, 4, 11, 12, 13, 14, 15, and 16] but most of them still are not robust [12]. Sometimes, due to in-homogeneity, low contrast, noise and inequality of content with semantic, automatic methods fail to segment image correctly. Therefore, for these images, it is necessary to use user interaction to correct method’s error. However, robust semi-automatic methods can be developed in which user interaction is minimised. When user interaction is necessary, Segmentation would be supervised. Supervised methods need training data consist of data with known class. The disadvantage of supervised method is the need for user interaction and advantage of it, is reducing error in clustering. Learning Vector Quantization (LVQ) is a supervised competitive learning which, based on training data, learns existing classes in image. Then LVQ cluster image based on training data. We proposed a new method based on FCM, user interaction and LVQ. In the rest of this paper, algorithm and methods, are used in this work, are explained. Then experimental results, conclusion and references are presented.
Medical Image Segmentation Using FCM, LVQ and User Interaction
179
2 Methodology Noise is decreased in image; image is converted to grey level image and grey level of pixels is used for clustering, afterwards, FCM clusters input image to the n clusters where n is the number of target classes. Figure.1 demonstrates (a) a real brain image and (b) its 4 different clusters using FCM.
(a)
(b)
(c)
Fig. 1. (a) A real brain image, (b) its 4 clusters using FCM and (c) two sub clusters of Cluster 3
Some times, due to in-homogeneity, low contrast or noise, clustered image either has two or more target class in one cluster (white matter and grey matter of brain in cluster number 3) or one target class in two or more clusters(white matter in clusters number 1, 2 and 3). For solving this problem, user selects clusters contain several classes (cluster number 3) to be partitioned again, afterwards, FCM clusters each user selected cluster to two sub clusters. Fig. 1. (c) demonstrates sub clusters of class number 3. The cluster number 3 is clustered to two sub clusters number 31 and 32. This process continues until user is satisfied. That means quality of segmentation depends on user. Then, to solve problem of several clusters for one class, user selects clusters for each target class (clusters 1, 2 and sub cluster 32 are selected for white matter). User selected clusters are existing patterns for each class. Sometimes, some of the clusters are very small. Therefore, user doesn’t select them as pattern. To solve this problem, first LVQ is used to train selected clusters then assign pixels of unselected clusters to most similar cluster. The LVQ is popular in supervised clustering of input data. The user selected clusters are used to train LVQ. Each output class of LVQ will be corresponded to one user selected cluster. For example LVQ will have three output classes for white matter correspond to clusters 1, 2 and sub cluster 32. After training, image pixels are clustered using LVQ to patterns (the user selected clusters for each target class, white mater and so on). Patterns have been assigned to target classes by user. Therefore, clustered image is labeled to target classes based on User selected patterns for each target class. Steps of our method are as follow: 1. 2. 3.
Noise is decreased in input image. The output is noise reduced image. The image converted to grey level image. The output is grey level image. FCM is applied to grey level of pixels to cluster input image to the n clusters where n is the number of target class. The output is clustered image by FCM.
180
M.A. Balafar et al.
4.
5.
6.
7. 8.
Sometimes, some clusters contain more than one target class (under segmentation). User selects such clusters to be partitioned more; FCM clusters each user selected cluster to two sub clusters. This process continues till user is satisfied. The output is clustered image without under segmentation. Sometimes, several clusters correspond to one target class (over segmentation). User selects clusters (patterns) for each target class. The output is mapping of clusters to target classes. Patterns are used to train LVQ; each output class of LVQ will be corresponded to one pattern. The output is trained LVQ to assign pixels to user selected clusters The image pixels are clustered by LVQ to patterns (user selected clusters). The output is clustered image just contain patterns Clustered image is labeled to target classes based on mapping of patterns to target classes (the output in step 5). The output is segmented image.
2.1 Noise Reduction We use Stationary wavelet Transform (SWT) for noise reduction in image. For this purpose, we used R. R. coifman et al [19] work. Their algorithm is as follow: 1. 2. 3.
The image is transformed to wavelet coefficients. Soft or hard threshold is applied to detail coefficients, and coefficients smaller than threshold is eliminated. Inverse Stationary wavelet transform is applied to approximation and detail coefficients.
2.2 LVQ Learning Vector Quantization (LVQ) is a supervised competitive learning. It is a supervised version of vector quantisation network. Vector Quantisation approximate density function of classes, but LVQ obtains decision boundaries, in input space, based on training data. LVQ defines class boundaries prototypes -- a nearestneighbour rule and a winner-takes-it-all paradigm. LVQ have three layers: input layer, competitive layer and out put layer. Each target class has several patterns. The number of neurons in competitive layer is equal to number of patterns and, relatively, the number of neurons in out put layer is equal to number of target class. The centre of each neuron in competitive layer is called a codebook vector (CV). In learning stage, Euclidean distance of input vector with codebook vector of each neuron, in competitive layer, is calculated. The neuron in competitive layer, with less distance, is winner. LVQ networks, with enough neurons in competitive layer (patterns) and enough neurons in competitive layer (patterns) for each class, can classify any sets of input vectors. A neuron, in competitive layer, is part of -- physically speaking; member of -- just one target class but a target class has an arbitrary number of neurons in competitive layer. The space of CVs is partitioned by hyper planes perpendicular to the linking line of two CVs. The competitive layer learns to classify input data in way the same as self-organization Map and the output layer maps competitive layer classes to target classes.
Medical Image Segmentation Using FCM, LVQ and User Interaction
181
The learning means, adjusting weights of neurons based on training data. Through learning stage, training data -- consisting of input data and their target class in given to the network and number of neurons in competitive layer for each target class, is specified. The winner neuron, in competitive layer, is specified based on Euclidean distance and weight of winner neuron is adjusted, thereafter. There are several algorithms to learn LVQ networks. We use LVQ1 [2] in this paper. 2.3 FCM FCM is a clustering algorithm introduced by Bezdek based on minimizing an object function as follow [8] J
q
=
n
m
i=1
j =1
∑ ∑
Where d is distance between data membership of data
u ijq d ( x i , θ j )
.
(1)
xi and centre of the cluster j, θ j and U is fuzzy
xi to cluster with centre θ j u ij ∈ [ 0 ,1 ],
∑
n j =1
u ij = 1 & 0 <
∑
n j =1
u ij < n .
(2)
The membership function and centre of each cluster obtained as follow U
ij
= 1/
m
∑
k =1
( d ( x i , θ j ) / d ( x i , θ k ))
θ
j
=
N
∑
i =1
N
U
q ij
xi /
∑
i =1
U
q ij
( 2 / 1−1)
.
.
(3)
(4)
Where q specifies degree of fuzziness in clustering. FCM optimizes object function by continuously updating membership function and centres of clusters until optimization between iteration is less than a threshold.
3 Implementation We simulated our algorithm on the mat lab. The Simulated Brain Images from Brain Web [1] and real image from internet are used to evaluate our algorithm. Fig. 2 demonstrates two experimental results using a real brain (a) and a simulated brain image (b). Part (c) from top to down demonstrates 4 different classes of real image with applying ordinary FCM algorithm. Part (d) demonstrates 4 different parts of real image with applying our algorithm. Our algorithm do better in first experiment and ordinary FCM fails to segment image properly. It is obvious in fig. 2. (c) that ordinary FCM fails to separate grey matter of brain and it is clustered jointly with white matter (first image in fig. 2.(c)). The reason for that is low contrast. Our algorithm solves this problem using user interaction to more separate joint clusters of white and grey matters. Of curse the quality of separation
182
M.A. Balafar et al.
(a)
(c)
(b)
(d)
(e)
(f)
Fig. 2. Experimental results of applying ordinary FCM and our algorithm on a real image (a) and a simulated image (b). Part (c) demonstrates 4 different clusters of real image (a) using ordinary FCM and part (d) demonstrates 4 different clusters of real image using our algorithm. Part (e) demonstrates 4 different clusters of simulated image (b) using ordinary FCM and part (f) demonstrates 4 different clusters of real image using our algorithm.
depends on user. Moreover, ordinary FCM assigns white matter of brain to three different clusters (First, second and third images in fig. 2. (c)). The reason for which ordinary FCM assigns one target class (white matter) to two or more different clusters (white matter in clusters number 1,2 and 3) is in-homogeneity in white matter. Our method separates white matter more correctly (second image in fig. 2. (d)). Moreover,
Medical Image Segmentation Using FCM, LVQ and User Interaction
183
to solve problem of existing several clusters for one target class, in our method, user interaction helps assign several clusters to one target class. The second experiment uses a PD brain image. Part (e) demonstrates 4 different classes of PD image with applying ordinary FCM algorithm. Part (f) demonstrates 4 different parts of PD image with applying our algorithm. Our algorithm does better in this experiment too. The reason for which FCM fails is weakness of contrast in image. As it is obvious in fig. 2.(e), Ordinary FCM fails to separate white matter of brain and part of it, is clustered jointly with grey matter (first image in fig. 3.(e)). The reason for that is low contrast. Again, our algorithm solves this problem by user interaction to more separate joint clusters of white and grey matters. Moreover, user interaction helps assign several clusters to one target class.
4 Conclusion Image segmentation is very important process in most of computer vision and image processing tools. Segmentation of medical images is challenging due to low contrast, unknown noises and in-homogeneity. FCM is one of the most popular clustering methods for image segmentation. FCM considers intensity of pixels as input for clustering. Therefore it fails in images with inequality of content and semantic, low contrast, unknown noises and in-homogeneity. Several researches have been done to make FCM more robust which none of them is flawless. We proposed a new method for image segmentation based on FCM, user interaction and LVQ. To demonstrate the effectiveness of our method, it is applied to several Medical images. For comparing, FCM and our method are applied to same images. Experiments demonstrate effectiveness of our method in compare to ordinary FCM. In future, we consider doing segmentation based on different methods like active control and our method, afterwards; mix the result to have more accurate segmentation for diagnosis abnormal or different important matters in medical images. Also we consider adding useful aspect of other methods in FCM. For example SOM has good ability to detect topology and distribution in the same time. We consider changing our method to add this feature of SOM to that. Other method, for this purpose, is statistical method. We work to add good feature of statistical methods to our method.
References 1. Brain Web [Online], http://www.bic.mni.mcgill.ca/brainweb/ 2. Dan, T., Linan, F.: A Brain MR Images Segmentation Method Based on SOM Neural Network. In: ICBBE, pp. 686–689 (2007) 3. Pham, D.L.: Spatial Models for Fuzzy Clustering. Comput. Vis. Imag. Understand 84, 285–297 (2001) 4. Zhang, D.Q., Chen, S.C.: A novel Kernelized Fuzzy C-means Algorithm with Application in Medical Image Segmentation. Artif. Intell.Med. 37–52 (2004) 5. Farhang, S., Tizhoosh, H.R., Salama, M.M.A.: Application of Opposition-Based Reinforcement Learning in Image Segmentation. In: ADPRL, pp. 246–251 (2007)
184
M.A. Balafar et al.
6. Foued, D., Abdelmalik, T.-A., Azzeddine, C., Fethi, B.-R.: MR Images Segmentation Based on Coupled Geometrical Active Contour Model to Anisotropic Diffusion Filtering. In: ICBBE, pp. 721–724 (2007) 7. Yu, J.-H., Wang, Y.-Y., Chen, P., Xu, H.-Y.: Two-Dimensional Fuzzy Clustering for Ultrasound Image Segmentation. In: ICBBE, pp. 599–603 (2007) 8. Bezdek, C.a., Bezek, J.C.: Pattern Recognition with Fuzzy Object Function Algorithms, Stanford Research Institute, Menlo Park. Plenum, New York (1981) 9. Hall, L.O., Bensaid, A.M., Clarke, L.P., Velthuizen, R.P., Silbiger, M.S., Bezdek, J.C.: A Comparison of Neural Network and Fuzzy Clustering Techniques in Segmenting Magnetic Resonance Images of The Brain. IEEE Trans. Neural Netw. 3, 672–682 (1992) 10. Ceccarelli, M., De Luca, N., Morganella, A.: Automatic Measurement of the Intima-Media Thickness with Active Contour Based Image Segmentation. In: IEEE International Workshop on Medical Measurement and Applications, Sannio Univ., Benevento, pp. 1–5 (2007) 11. Ahmed, M.N., Yamany, S.M., Mohamed, N., Farag, A.A., Moriarty, T.: A Modified Fuzzy C-means Algorithm for Bias Field Estimation and Segmentation of MRI Data. IEEE Trans. Med. Imag. 21, 193–199 (2002) 12. Shen, S., Sandham, W., Granat, M., Sterr, A.: MRI Fuzzy Segmentation of Brain Tissue Using Neighbourhood Attraction with Neural-Network Optimization. IEEE Trans. Inform. Tech. Biomedicine 9, 459–467 (2005) 13. Acton, S.T., Mukherjee, D.P.: Scale Space Classification Using Area Morphology. IEEE Trans. Image Process. 9, 623–635 (2000) 14. Dave, R.N.R.N.: Characterization and Detection of Noise in Clustering. Pattern Recognit. Lett. 12, 657–664 (1991) 15. Krishnapuram, R.R., Keller, J.M.: A Possibilistic Approach to Clustering, IEEE Trans. Fuzzy Syst. 1, 98–110 (1993) 16. Tolias, Y.A., Panas, S.M.: On Applying Spatial Constraints in Fuzzy Image Clustering Using a Fuzzy Rule-based System. IEEE Signal. Process. Lett. 5, 245–247 (1998) 17. Wells III, W.M., Grimson, W.E.L., Kikinis, R., Jolesz, F.A.: Adaptive Segmentation of MRI Data. IEEE Trans. Med. Imag. 15, 429–442 (1996) 18. Zhang, J., Liu, J.: Image Segmentation with Multi-Scale GVF Snake Model Based on BSpline Wavelet ACIS. pp. 259–263 (2007) 19. Coifman, R.R., Donoho, D.L.: Translation Invariant De-noising. Lecture Notes in Statistics, vol. 103, pp. 125–150. Springer, New York (1995)
New Data Pre-processing on Assessing of Obstructive Sleep Apnea Syndrome: Line Based Normalization Method (LBNM) Bayram Akdemir1, Salih Güneş1, and Şebnem Yosunkaya2 1
Department of Electrical and Electronics Engineering, Selcuk University, 42075 Konya, Turkey {bayakdemir,sgunes}@selcuk.edu.tr 2 Faculty of Medicine, Sleep Laboratory, Selcuk University, 42080 Konya, Turkey [emailprotected]
Abstract. Sleep disorders are a very common unawareness illness among public. Obstructive Sleep Apnea Syndrome (OSAS) is characterized with decreased oxygen saturation level and repetitive upper respiratory tract obstruction episodes during full night sleep. In the present study, we have proposed a novel data normalization method called Line Based Normalization Method (LBNM) to evaluate OSAS using real data set obtained from Polysomnography device as a diagnostic tool in patients and clinically suspected of suffering OSAS. Here, we have combined the LBNM and classification methods comprising C4.5 decision tree classifier and Artificial Neural Network (ANN) to diagnose the OSAS. Firstly, each clinical feature in OSAS dataset is scaled by LBNM method in the range of [0,1]. Secondly, normalized OSAS dataset is classified using different classifier algorithms including C4.5 decision tree classifier and ANN, respectively. The proposed normalization method was compared with min-max normalization, z-score normalization, and decimal scaling methods existing in literature on the diagnosis of OSAS. LBNM has produced very promising results on the assessing of OSAS. Also, this method could be applied to other biomedical datasets. Keywords: Obstructive Sleep Apnea Syndrome; Data Scaling; Line Based Normalization Method; C4.5 Decision Tree Classifier; Levenberg Marquart Artificial Neural Network.
1 Introduction Obstructive Sleep Apnea Syndrome (OSAS) is a very common sleep disorder among people. OSAS is a syndrome characterized by lack of oxygen saturation and repetitive upper respiratory tract obstruction events during full night sleep. OSAS is taken into considered as clinically significant when the breath stops take 10 sec. or more during the full night sleep and occur more than five times per sleep hour. Breath breakings may up to 300 times in a night. Due to decrease of the oxygen D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 185–191, 2008. © Springer-Verlag Berlin Heidelberg 2008
186
B. Akdemir, S. Güneş, and Ş. Yosunkaya
level, brain was aware of the situation and takes the breath control until the oxygen level goes to normal level. This event repeats every period. In morning the subject suffers from OSAS never remembers this awakenings and so subject may think about the night that he slept in restful. But in fact it was not a restful sleeping due to brain awakening. It can be explained two kinds of apnea events that may cause inadequate pulmonary ventilation during the sleep. Apnea explained as a total absence of airflow and lack of oxygen level in arterial blood circulation. Although lack of oxygen, there is no breath stopping (it is possible reducing the breath volume over 50%) during the sleep, this episode is called hypo-apnea. SAS is present mainly in adult age and 11% of children especially among the male [1-3]. In literature related to OSAS, there are some papers. Among these, Al-Ani et al. used ANN and respiratory and cardiac activities (Nasal Airway Flow (NAF) and Pulse Transit Time (PTT) obtained by using polisomography to diagnose OSAS [3]. Haitham et al. used combination of entropy approach and heart rate variability and obtained 72.9% classification accuracy [4]. Campo et al. achieved the validity of approximate entropy (ApEn) analysis of oxygen saturation (SaO2) data obtained from pulse oximetric recordings as a diagnostic test for OSAS obtained from patients who suffers from OSAS [5]. Kwiatkowska et al. studied the obstructive sleep apnea syndrome using Pulse Oximetry and Clinical Prediction Rules with a fuzzy logic approach [6]. In this paper, we have combined the LBNM, classification methods comprising C4.5 decision tree classifier and ANN to diagnose the OSAS. The clinical features are Arousals Index (ARI), Apnea and Hypo-apnea Index (AHI), SaO2 minimum value in stage of REM, and Percent Sleep Time (PST) in stage of SaO2 intervals bigger than 89%. In our experiments, a total of 83 patients including 58 patient persons with a positive OSAS (AHI>5) and 25 healthy persons with a negative OSAS were examined. Firstly, each clinical feature in OSAS dataset is scaled by LBNM method in the range of [0,1]. Secondly, normalized OSAS dataset is classified using different classifier algorithms including C4.5 decision tree classifier and ANN, respectively. The proposed normalization method was compared with min-max normalization, z-score normalization, and decimal scaling methods existing in literature on the diagnosis of OSAS. While the combining of C4.5 decision tree classifier and minmax normalization, z-score normalization, and decimal scaling, have been obtained the classification accuracy of 95.89% using 10-fold cross validation, combining of C4.5 decision tree classifier and LBNM has been achieved the accuracy of 100% on the same condition.
2 Subjects In this paper, 83 subjects (59 men and 24 women) who were referred for clinical suspicion of OSAS were studied. The patients were consecutively recruited from the outpatient clinic. Subjects who suffer from OSAS are ranged from 17 to 67 and nonOSAS from 17 to 70. The mean body mass index (BMI) was 36.83 kg/m2. In our experiments, a total of 83 patients (58 with a positive OSAS (AHI>5) and 25 with a negative OSAS such that normal subjects) were examined. The Review Board on Human Studies at our institution approved the protocol, and each patient gave his or
New Data Pre-processing on Assessing of Obstructive Sleep Apnea Syndrome
187
Table 1. Mean value of the statistical measures of clinical features and characteristics of the subjects The used features Age BMI (kg/m2) ARI index AHI index Sat-O2 minimum value in stage of REM PST in SaO2 intervals bigger than 89%
Non-OSAS 49 30.85 24.666 4.05 87.24 94.81
OSAS 49 38.15 150.45 33.51 79.35 62.92
her informed consent to participate in the study. Table 1 presents mean value of the statistical measures of used clinical features and subjects characteristics [7]. The readers can refer to [7] to get more information about OSAS dataset.
3 The Proposed Method In this work, we have proposed a data normalization method called Line Based Normalization Method and combined with classifier methods including C4.5 decision tree and LM artificial neural network on the diagnosis of OSAS. The data normalization method (LBNM) is used as data pre-processing method. And then classifier has run to classify the normalized OSAS dataset. Both two processes are run as offline. The used method is shown in Figure 1.
Fig. 1. Block diagram of the proposed method
The proposed method consisted of two stages: in order to pre-process the data, LBNM was used to transform the OSAS dataset to values of the range of [0,1], and as classifier algorithm, the C4.5 decision tree and ANN trained with LM were used to classify the normalized OSAS dataset. 3.1 Line Based Normalization Method (LBNM) and Data Scaling Methods All the attributes in any dataset may not always have a linear distribution among classes. If the non-linear classifier system is not used, data scaling or cleaning methods are needed both to transform data from original format to another space to
188
B. Akdemir, S. Güneş, and Ş. Yosunkaya
improve the classification performance in pattern recognition applications. In this study, we have proposed a new data pre-processing LBNM in pattern recognition and medical decision making systems. In this method, the proposed data scaling method consists of two steps. In the first step, we have weighted data using following equation (1). In the second step, weighted data is normalized in the range of [0,1]. By this way, data is scaled in the basis of features used in dataset. The advantage of LBNM is that this method can be used in the dataset with missing class labels. Also, this normalization method can be used to find the missing value on features. Figure 2 shows the pseudo code of Attribute Based Data Normalization.
Input: d matrix with n row and m column Output: weighted d matrix though column based data weighted method 1. Data is weighted by means of following equation. for i=1 to n (such that n is the number of row in d matrix) for j=1 to m (such that m is the number of features (attributes) in d matrix) d i, j (1) D _ column = (d i ,1 ) 2 + (d i , 2 ) 2 + ... + (d i , j ) 2
end end 2. Apply data normalization process to 'D_column' matrix.
process
after
weighted
Fig. 2. The pseudo code of LBNM
3.2 C4.5 Decision Tree Classifier A decision tree is a hierarchical data structure using the divide and conquers method. Decision trees can be used for both classification and regression and also are nonparametric methods. Here, we have used C4.5 decision tree that has pruning and working of ability of missing data as a type of decision trees. C4.5 decision tree learning is one of the most often used and practical methods for inductive inference. It is a method for approximating discrete-valued functions that is powerful to noisy data and capable of learning disjunctive expressions [8, 9]. C4.5 Decision tree learning is a method for approximating discrete-valued functions, where a C4.5 Decision tree represents the learned function. Learned trees structure can be explained as sets of ifthen rules to improve human readability. These learning methods are among the most popular of inductive inference algorithms and have been successfully applied to wide range of problems [10]. C4.5 Decision tree learning is a heuristic inquiry, hill climbing, non-backtracking search by way of the space of all possible C4.5 Decision trees [7, 8]. The objective of C4.5 Decision tree learning is recursively partitioning data into sub-groups. End of the learning, C4.5 generates if then rules to achieve to classification. Consecutively, if-then rules make Tree Classifier be fast and simple.
New Data Pre-processing on Assessing of Obstructive Sleep Apnea Syndrome
189
3.3 Levenberg Marquart Artificial Neural Network (ANN) An ANN is constructed for a specific application, such as pattern recognition or data classification, by way of a learning process. ANN is inspired from human brain activity. It has a number of nodes named neurons and theirs connections after applied any data to inputs, ANN tries to obtain best result reducing the output error level via adjusting the weights. The back propagation (BP) algorithm is a most widely used training procedure that adjusts the connection weights of a Multi Layer Perceptions (MLP) [11]. The LM algorithm is a least-squares estimation algorithm uses maximum neighborhood idea to obtain the desired weights to solve the problem. The smallest MLP composed of three layers: an input layer, an output layer, and one hidden layers. The inputs signals began to spread from first neuron to output neuron affecting each other related to estimated weight. Each layer consists of a predefined number of neurons. The neurons in the input layer work as a buffer serving to distribute the input signals to neurons in the hidden layer [12]. In our applications, the input layer, hidden layer, and output layer consist of 4, 10, and 2 neurons, respectively. Also, we have used the values of 0.9 and 0.8 as learning rate and momentum rate in ANN with LM.
4 Empirical Results and Discussion Data normalization is important issue in many classifier systems, since a lot of classifier algorithms work only on normalized or scaled data. In this study, we have proposed a novel data scaling method called Line Based Data Normalization and applied to diagnosis of obstructive sleep apnea syndrome that is common important disease among public. Here, we have investigated the effect of LBNM to classification accuracy of used classifiers on the diagnosis of OSAS. In order to compare the proposed normalization method, various normalization methods minmax normalization, z-score normalization, and decimal scaling were used. In diagnosing the OSAS, we have used the clinical features including ARI, AHI, SaO2 minimum value in stage of REM, and PST in stage of SaO2 intervals bigger than 89% obtained from Polysomnography device records. Table 2 shows the obtained results from C4.5 decision tree, LM back propagation algorithm, combining of C4.5 Decision Tree Classifier and LBNM, and combining of LM back propagation algorithm and LBNM using 10-fold cross validation on the diagnosis of OSAS. The best method to diagnose the OSAS was the combining of C4.5 Decision Tree Classifier and LBNM. Also, its effect of the proposed normalization method to classification accuracy of classifiers used in the diagnosis of OSAS was shown. Line Based Normalization Method was compared to other normalization or scaling methods including min-max normalization, z-score normalization, and decimal Scaling. The classifier accuracy and 95% confidence interval were used to compare above methods. Table 3 presents the obtained results from C4.5 decision tree classifier on the classification of OSAS disease using LBNM and various scaling or normalization methods comprising min-max normalization, z-score normalization, and decimal scaling on the 10-fold cross validation.
190
B. Akdemir, S. Güneş, and Ş. Yosunkaya
Table 2. The obtained results from classifiers used on the classification of optic nerve disease from VEP signals without GDA Method C4.5 Decision Tree ANN with LM C4.5 Decision Tree and LBNM ANN with LM and LBNM
PD(Recall)
Precision
0.965 0.933 1.00
Prediction Accuracy (%) 95.12 92.68 1.00
0.965 0.965 1.00
Fmeasure 0.965 0.948 1.00
The value of AUC 0.941 0.899 1.00
0.966
97.56
1
0.982
0.958
Table 3. Comprising of the obtained results from C4.5 decision tree classifier on the classification of OSAS using LBNM and various normalization methods Method
Min-Max Normalization Z-score Normalization Decimal Scaling Line Based Normalization Method (LBNM)
Prediction Accuracy (%) 49 30.85 24.666 94.81
Confidence Interval 95% 49 38.15 150.45 62.92
These results have shown that Attribute Based Data Normalization method can be useful in many pattern recognition and medical diagnostics applications as can be seen in the diagnosis of obstructive sleep apnea syndrome. Also, this method can be used in many applications such as speech recognition, text categorization, image processing etc. We believe that the proposed method can be very helpful to the physicians for their final decision on their patients. Acknowledgments. This study has been supported by Scientific Research Project of Selcuk University (Project number: 08701258).
5 Conclusion In this paper, we have proposed a novel data normalization method LBNM to assess the obstructive sleep apnea syndrome using clinical features obtained from Polysomnography device as a diagnostic tool in patients clinically suspected of suffering from sleep disorder. The proposed normalization method was compared with min-max normalization, z-score normalization, and decimal scaling methods existing in literature via diagnosing of OSAS. While the combining of C4.5 decision tree classifier and min-max normalization, z-score normalization, and decimal scaling, have been obtained the classification accuracy of 95.89% using 10-fold cross validation, combining of C4.5 decision tree classifier and LBNM has been achieved
New Data Pre-processing on Assessing of Obstructive Sleep Apnea Syndrome
191
the accuracy of 100% on the same condition. Here, we have given a medical application of this normalization method. In future, this data pre-processing method can be used in many pattern recognition applications.
References 1. AASM. Sleep-Related Breathing Disorders in Adults: Recommendations for Syndrome Definition and Measurement Techniques in Clinical Research. The Report of an American Academy of Sleep Medicine Task Force, SLEEP, vol. 22(5) (1999) 2. Eliot, S., Janita, K., Cheryl Black, L., Carole, L.: Marcus. Pulse Transit Time as a measure of arousal and respiratory effort in children with sleep-disorder breathing. Pediatric research 53(4), 580–588 (2003) 3. Al-Ani, T., Hamam, Y., Novak, D., Pozzo Mendoza, P., Lhotska, L., Lofaso, F., Isabey, D., Fodil, R.: Noninvasive Automatic Sleep Apnea Classification System, Bio. Med. Sim. 2005, Linköping, Sweden, May 26–27 (2005) 4. Haitham, M., Al-Angari, A., Sahakian, V.: Use of Sample Entropy Approach to Study Heart Rate Variability in Obstructive Sleep Apnea Syndrome. IEEE Transactions in Biomedical Engineering 54(10), 1900–1904 (2007) 5. Campo, F.d., Hornero, R., Zamarro´n, C., Abasolo, D.E., A´lvarez, D.: Oxygen saturation regularity analysis in the diagnosis of obstructive sleep apnea. Artificial Intelligence in Medicine 37, 111–118 (2006) 6. Kwiatkowska, M., Schmittendorf, E.: Assessment of Obstructive Sleep Apnea using Pulse Oximetry and Clinical Prediction Rules: a Fuzzy Logic Approach, BMT (2005) 7. Polat, K., Yosunkaya, Ş., Güneş, S.: Pairwise ANFIS Approach to Determining the Disorder Degree of Obstructive Sleep Apnea Syndrome. Journal of Medical Systems 32(3), 243–250 (2008) 8. Mitchell, M.T.: Machine Learning. McGraw-Hill, Singapore (1997) 9. Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986) 10. Akdemir, B., Polat, K., Günes, S.: Prediction of E.Coli Promoter Gene Sequences Using a Hybrid Combination Based on Feature Selection, Fuzzy Weighted Pre-processing, and Decision Tree Classifier. In: Apolloni, B., Howlett, R.J., Jain, L. (eds.) KES 2007, Part I. LNCS (LNAI), vol. 4692, pp. 125–131. Springer, Heidelberg (2007) 11. Haykin, S.: Neural networks: A comprehensive foundation. Macmillan College Publishing Company, NewYork (1994) 12. Kara, S., Guven, A.: Neural Network-Based Diagnosing for Optic Nerve Disease from Visual-Evoked Potential 31, 391–396 (2007)
Recognition of Plant Leaves Using Support Vector Machine Qing-Kui Man1,2, Chun-Hou Zheng3,*, Xiao-Feng Wang2,4, and Feng-Yan Lin1,2 2
1 Institute of Automation, Qufu Normal University, Rizhao, Shandong 276826, China Intelligent Computing Lab, Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui 230031, China 3 College of Information and Communication Technology, Qufu Normal University 4 Department of Computer Science and Technology, Hefei University, Hefei 230022, China [emailprotected], [emailprotected]
Abstract. A method using both color and texture feature to recognize plant leaf image is proposed in this paper. After image preprocessing, color feature and texture feature plant images are obtained, and then support vector machine (SVM) classifier is trained and used for plant images recognition. Experimental results show that using both color feature and texture feature to recognize plant image is possible, and the accuracy of recognition is fascinating. Keywords: Support vector machine (SVM), Image segmentation, Digital wavelet transform.
1 Introduction There are many kinds of plants that living on the earth. Plants play an important part in both our human life and other lives existing on the earth. Unfortunately, the categories of plant is becoming smaller and smaller. Fortunately, people are realizing the importance of protecting plant, they try all ways they can to protect plants that ever exist on the earth, but how can they do this work without knowing what kind of categories plant belongs to. Then a problem is: Since computer is more and more widely used in our daily life, how can we recognize the different kind of leaves using computer? Plant classifying is an old subject in human history, which has developed rapidly, especially after human being came into the Computer Era. Plant classifying not only recognizes different plants and names of the plant, but also tells the difference of different plants, and builds system for classifying plant. It can also help researchers find origins, relations of species, and trends in evolution. At present, there are many modern experiment methods in plants classifying area, such as plant cellular taxonomy, cladistics of plant, and so on. Yet all these methods are not easy for non-professional staff, because these methods can’t be easily used and the operation is very complex. * Corresponding author. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 192–199, 2008. © Springer-Verlag Berlin Heidelberg 2008
Recognition of Plant Leaves Using Support Vector Machine
193
With the development of computer technology, digital image processing develops rapidly, so people want to use image processing and pattern recognition techniques to make up the deficiency of our recognition ability, in order that non-professional staffs can use computer to recognize variety of plants. According to theory of plant taxonomy, it can be inferred that plant leaves are most useful and direct basis for distinguishing a plant from others, what’s more, leaves can be very easily found and collected everywhere. By computing some efficient features of leaves and using a suitable pattern classifier, it is possible to recognize different plants successfully. Till now, many works have focused on leaf feature extraction for recognition of plant. In [1], a method of recognizing leaf images based on shape features using a hyper-sphere classifier was introduced. In [5], the author gave out a method which combines different features based on centroid-contour distance curve, and adopted fuzzy integral for leaf image retrieval. Gu et.al[6] used the result of segmentation of leaf’s skeleton to do leaf recognition. Among these methods, using leaf shape feature is the best way to recognize plant images [1], and the accuracy of recognizing is fascinating. Since image color and texture feature are two features that most sensitive to human vision, we select both of the two features as the feature to recognize plant image in this paper. In this paper, a method of using color feature and texture feature to recognize plant image was proposed. That is, using color moments as the color feature and extracting texture feature of plant leaf image after wavelet high pass filter. Usually, the wavelet transform has the capability of mapping an image into a low-resolution image space and a series of detail image spaces. For the majority of images, their detail images indicate the noise or uselessness of the original ones. In this paper, information of leaf vein was extracted as the texture feature. Therefore, after extracting these features of leaves, different species of plants can be classified by using SVM. The remainder is organized as follows: Section 2 describes something about image segmentation, and some definition of color moments and texture feature, especially wavelet transform. Section 3 describes the Support Vector Machine (SVM) in detail. Section 4 present the experimental results and demonstrates the feasibility and validity of the proposed method. Conclusions are included in Section 5.
2 Extracting Leaf Features In this section, we will introduce something about image segmentation. After the segmentation, color moments and wavelet transform are introduced to represent images of plant leaf. 2.1 Image Segmentation The images of plant leaf, which were gotten through camera, are always with complex background. The purpose of image segmentation is to get the region of interest (ROI), which will be used to extract color moments and other texture features. There are two kinds of background in the leaf images: one is simple, and the other is complicated. In this paper we select leaf image with simple background to test our algorithm that
194
Q.-K. Man et al.
recognizing leaf images. After the procedure of image segmentation, a binary image of which ROI is displayed with 1 and background is displayed as 0 will be received. For the leaf image with simple background, it can be seen that the gray level of pixels within leaf objects is distinctly different from that of pixels within the background. For the leaf images we collected ourselves are with simple background, we use adaptive threshold [10] method to segment them, and experimental results show that this method worked very well. There are many kinds of image features that can be used to recognize leaf image, such as shape feature [1], color feature and texture feature. In this paper, we select color feature and texture feature to represent the leaf image. 2.2 Color Feature Extraction Color moments have been successfully used in many color-based image retrieval systems [2], especially when the image contains just the image of leaf. The first order (mean), the second (variance) and the third order (skewness) color moments have been proved to efficient and effective in representing color distributions of images. Mathematically, the first three moments can be defined as:
μ σ
k
= (
δk = ( Where
p ik
k
=
1 sum sum
1 sum
∑
1 sum
sum
sum
∑
p ik
(1)
i =1
( p ik − μ k ) 2 ) 1 / 2
(2)
( p ik − μ k ) 3 ) 1 / 3
(3)
i =1
∑
i =1
is the value of the k-th color component of the image i-th pixel, and sum is
the number of pixel that the region of interest contains. For the reason that HSV color space is much closer to human vision than HIS color space [12], we extracted color moments from HSV color space in this paper. 2.3 Image Normalization Texture feature is another important image feature to represent image. In this paper, we use wavelet transform to obtain the leaf vein on which the texture feature is based. Before wavelet transform, we do some preprocessing to normalize the leaf image [4]. The method of normalizing the image is summarized as follows: (1)Compute the center coordinate A( x0 , y0 ) of plant image. (2)Find the coordinate B ( x1 , y1 ) which is farthest from the center coordinate.
Recognition of Plant Leaves Using Support Vector Machine
(3)From coordinates A(x0, y0 ) and B(x1, y1) , we can get θ = arctan( (4)Rotate the plant image by θ .
195
y1 − y 0 . ) x1 − x 0
The results of this preprocessing are shown in Fig.1.
Fig. 1. Leaf image after normalization
2.4 Texture Feature Extraction The
wavelet
transform
(WT),
a
linear
integral
transform
that
maps L ( R ) → L ( R ) , has emerged over the last two decades as a powerful new theoretical framework for the analysis and decomposition of signals and images at multi-resolutions [7]. Moreover, due to its both locations in time/space and frequency, this transform is completely differs from Fourier transform [8, 9]. Wavelet transform is defined as decomposition of a signal f (t ) using a series of elemental functions called as wavelets and scaling factors, which are created by scaling and translating a kernel function ψ (t) referred to as the mother wavelet: 2
2
2
ψ
2
ab
(
1 ψ a
)=
t
(
t − b a
)
(4)
Where a , b ∈ R , a ≠ 0 and the discrete wavelet representation (DWT) can be defined as:
W
d f
( j, k ) =
∞
∫ψ
_
j ,k
( x ) f ( x ) dt = ψ
j ,k
, f
j, k ∈ Z
(5)
−∞
In this paper, we use wavelet transform in 2D, which simply use wavelet transform in 1D separately. 2D transform of an image I = A0 = f ( x, y ) of size M × N is:
A j = ∑ ∑ f ( x , y )ϕ ( x , y ) . x
y
D j1 = ∑ ∑ f ( x , y )ψ H ( x , y ) . x
y
x
y
D j 2 = ∑ ∑ f ( x , y )ψ V ( x , y ) .
D
j1
=
∑∑ x
y
f ( x , y )ψ
D
(x, y) .
196
Q.-K. Man et al.
That is, four quarter-size output sub-images, A j , D j1 , D j 2 and D j 3 , are generated after wavelet transform. After the Digital Wavelet Transform (DWT), we use high pass filter to obtain the leaf vein. Then we calculate leaf image’s co-occurrence matrix which is used to calculate the texture feature. The result of this transform is shown in Fig.2. From the image after wavelet high pass filter, it is easy for us to find that leaf vein was more distinctive than in the original image, and the approximate part of the original image was filtered.
Fig. 2. Leaf image after wavelet high pass filter transform
Then we use the transformed image to extracted co-occurrence matrix. Texture features we get can be defined as following: L −1 L −1
Entropy:
ent = − ∑
∑
i=0
Homogeneity:
h =
L −1 L −1
∑ ∑ i= 0
Contraction:
p ( i , j ) log
2
p (i, j )
(6)
j=0
con t =
j=0
L −1 L −1
∑∑ i=0
p (i, j) 0 .1 + | i − j |
p (i, j ) | i − j |
(7)
(8)
j=0
Based on co-occurrence matrix, in the four different directions of the image, i.e. the angles take different value: 0, 45, 90 and 135 degree, we can get texture features of plant images. All the data we extracted as described in Section 2 are raw data. Both data that represent color feature and texture feature will be processed before training the classifier.
3 Support Vector Machine (SVM) Support vector machine (SVM) [11] is a popular technique for classification, and using SVM to process multi-class problem is one of present research focuses. A classification task usually involves with training and testing data which consist of some data instances. Each instance in the training set contains one “target value” (class labels) and several “attributes” (feature). The goal of SVM is to produce a model which predicts target value of data instances in the testing set which are given only the attributes.
Recognition of Plant Leaves Using Support Vector Machine
197
( xi , yi ), i = 1,..., l where xi ∈ R n
Given a training set of instance-label pairs
and yi ∈ {1, −1}l , the support vector machines (SVM) require the solution of the following optimization problem:
minξ w ,b ,
l 1 T w w + c∑ ξ 2 i =1
yi (wTφ(xi ) + b) ≥ 1− ξi , ξ ≥ 0
subject to Here training vectors
.
xi are mapped into higher (maybe infinite) dimensional space
by the function φ . Then SVM finds a linear separating hyperplane with the maximal margin in this higher dimensional space. c > 0 is the penalty parameter of the error term. Furthermore,
K ( xi , x j ) ≡ φ ( xi )T φ ( x j ) is called the kernel function. Though
new kernels are being proposed by researchers, the following are the four basic kernels: Linear: K ( x i , x j ) = x iT x j . Polynomial: K ( x i , x j ) = ( γ x iT x j + r ) d , γ > 0 . Radial Basis Function (RBF): K ( xi , x j ) = exp( − γ || xi − x j || 2 ), γ > 0 . Sigmoid: K ( x i , x j ) = tanh( γ x iT x Here,
γ
r and
d
j
+ r) .
are kernel parameters.
4 Experimental Results In this section, we will select some features that we extracted through the procedure we describe above, such as image segmentation and wavelet transform, to do classification experiment, and select
K ( xi , x j ) = exp(−γ || xi − x j ||2 ),γ > 0 as the SVM kernel.
The following experiments are programmed using Microsoft Visual C++ 6.0, and run on Pentium 4 with the clock of 2.6 GHz and the RAM of 2G under Microsoft windows XP environment. Meanwhile, all of the results for the following figures and tables are the average of 50 experiments. This database of the leaf images is built ourselves in our lab by the scanner and digital camera, including twenty four species. In this section, we take 500 leaf samples corresponding to 24 classes collected by ourselves such as seatung, ginkgo, etc (as shown in Fig.3). We selected data of color feature and texture features as the input data for training classifier SVM. Before training the classifier, we do some processing with the raw data [3]. We used z-score normalization to do data preprocess, which is defined as: _
v ' = (v − A ) / σ _
Where A and
σ
A
(9) A
are the mean and standard deviation of component A respectively.
198
Q.-K. Man et al.
Fig. 3. Leaf images used for experiment
Firstly, we only use color features to do experiment and find that the accuracy is more than 90 percent if number of categories is small, yet when the number grew to five or six the accuracy drop to 60 percent. This is because that the color of most plant leaf images are green, and in HSV color space, which is similar to human’s vision, the difference between every two plants leaf image is very little, that’s to say, color feature is not a good feature for plant leaf image recognition. Secondly, we take only texture features as the experiment data. The result is that: the rate of recognition is satisfying. From the result, we can get the truth that texture of image is a good feature to recognize plant images. Thirdly, because color is an important feature of the plant image, so we also use both of the two image features, color feature and texture feature, to do experiments. The result is fascinating: the right recognition accuracy is up to 92%. Table 1. Results of leaf image recognition
Accuracy 4 categories
Using color feature Using texture feature Using both feature
90% 98% 100%
6 categories
63% 96% 100%
10 categories
40% 93.5% 97.9%
24 categories
Very low 84.6% 92%
The result of our experiment is shown in table 1. From the table we can see that our method is competitive. In [1], the method authors proposed that using shape feature to recognize plant images can recognize more than 20 categories plants with average correct recognition rate up to 92.2%. Compared to that method, our way that using color feature and texture feature of plant image is very good.
5 Conclusions In this paper, a way of using color feature and texture feature to recognize plant images was proposed, i.e. using color moments and texture feature of plant leaf image after wavelet high pass filter to recognize plant leaf images. The wavelet transform is of the capability of mapping an image into a low-resolution image space and a series of detail
Recognition of Plant Leaves Using Support Vector Machine
199
image spaces. However, in this paper, information of leaf vein was extracted after wavelet high pass filter to represent the texture feature. After computing these features of leaves, different species of plants was classified by using SVM. And the rate of recognizing plant using this method is satisfying. Our future work include selecting most suitable color feature and texture feature, as well as preprocessing the raw data we selected from leaf images, which will heighten the accuracy rate. Acknowledgements. This work was supported by the grants of the National Science Foundation of China, Nos. 60772130 & 60705007 the grant of the Graduate Students’ Scientific Innovative Project Foundation of CAS (Xiao-Feng Wang), the grant of the Scientific Research Foundation of Education Department of Anhui Province, No. KJ2007B233, the grant of the Young Teachers’ Scientific Research Foundation of Education Department of Anhui Province, No. 2007JQ1152.
,
References 1. Wang, X.F., Du, J.X., Zhang, G.J.: Recognition of Leaf Images Based on Shape Features Using a Hypersphere Classifier. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 87–96. Springer, Heidelberg (2005) 2. Han, J.H., Huang, D.S., Lok, T.M., Lyu, M.R.: A Novel Image Retrieval System Based on BP Neural Network. In: The 2005 International Joint Conference on Neural Networks (IJCNN 2005), Montreal, Quebec, Canada, vol. 4, pp. 2561–2564 (2005) 3. Liu, Z.W., Zhang, Y.J.: Image Retrieval Using Both Color and Texture Features. J. China Instit. Commun. 20(5), 36–40 (1999) 4. Liu, J.L., Gao, W.R., Tao, C.K.: Distortion-invariant Image Processing with Standardization Method. Opto-Electronic Engin. 33(12), 75–78 (2006) 5. Wang, Z., Chi, Z., Feng, D.: Fuzzy Integral for Leaf Image Retrieval. Proc. Fuzzy Syst. 372–377 (2002) 6. Gu, X., Du, J.X., Wang, X.F.: Leaf Recognition Based on the Combination of Wavelet Transform and Gaussian Interpolation. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 253–262. Springer, Heidelberg (2005) 7. Vetterli, M., Kovacevic, J.: Wavelets and Subband Coding. Prentice Hall, Englewood Cliffs (1995) 8. Akansu, A.N., Richard, A.H.: Multiresolution Signal Decomposition: Transforms, Subbands, and Wavelets. Academic Press. Inc., London (1992) 9. Vetterli, M., Herley, C.: Wavelets and Filter Banks: Theory and Design. IEEE Trans. On Signal Proc. 40, 2207–2231 (1992) 10. Chan, F.H.Y., Zhu, F.K.H.: Adaptive Thresholding by Variational Method. IEEE Trans. Image Proc., 468–473 (1998) 11. Cortes, V.V.: Support-vector network. Machine Learn. 20, 273–297 (1995) 12. Plataniotis, K.N., Venetsanopoulos, A.N.: Color Image Processing and Applications. Springer, Heidelberg (2000)
Region Segmentation of Outdoor Scene Using Multiple Features and Context Information Dae-Nyeon Kim, Hoang-Hon Trinh, and Kang-Hyun Jo Graduate School of Electrical Engineering, University of Ulsan, San 29, Mugeo-Dong, Nam-Gu, Ulsan, 680 - 749, Korea {dnkim2005,hhtrinh,jkh2008}@islab.ulsan.ac.kr
Abstract. This paper presents a method to segment the region of objects in outdoor scene for autonomous robot navigation. The proposition of the method segments from an image taken by moving robot on outdoor Scene. The method begins with object segmentation, which uses multiple features to obtain the object of segmented region. Multiple features are color, edge, line segments, Hue Co-occurrence Matrix (HCM), Principal Components (PCs) and Vanishing Points (VPs). Model the objects of outdoor scene that define their characteristics individually. We segment the region as mixture using the proposed features and methods. Objects can be detected when we combine predefined multiple features. Next, the stage classifies the object into natural and artificial ones. We detect sky and trees of natural object and building of artificial object. Finally, the last stage shows the combination of appearance and context information. We confirm the result of object segmentation through experiments by using multiple features and context information. Keywords: object segmentation, outdoor scene, multiple features, context information.
1
Introduction
When an autonomous the robot navigation on outdoor scene, it is likely for him to set specific a target. He also needs avoid objects when he encounters obstacle, and know where he is and know further path take he. To object segmentation, we classify the object into the artificial and the natural [9]. Then we define their characteristics individually. The method begins with object segmentation, which uses multiple features to obtain the object of segmented region. Multiple features are color, edge, line segments, PCs, vanishing point and HCM. Among multiple features, we present a method to apply the texture and color information. Image segmentation can become very difficult, as the image gray value or color alone are rarely good indicators for object boundaries due to noise, texture, shading, occlusion, or simply because the color of two objects is nearly the same. Zhang et al. [3] proposed color image segmentation method by intensity and color. For example, we have good result for image in simple object in single form such as building. But, the case which one object is consisting of so that it is complex or D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 200–207, 2008. c Springer-Verlag Berlin Heidelberg 2008
Region Segmentation of Outdoor Scene
201
different object has identical color; different object is divided to identical object or one object has many results which are divided to object. To overcome such defect, we are presenting method to combine various features in complex image. So, we propose a method for detecting the faces of building using line segments and their geometrical vanishing points [6, 9]. Haralick et al. [1] used statistical features extracted from object using gray level co-occurrence matrix (GLCM) in analysis method of texture [1,4,2]. We developed and evaluated different implementations of GLCM, using co-occurrence matrix of Hue-value instead of Gray-level. This paper shortened a processing time taking a displacement vector which is specific direction 135 ◦ into accounts at HCM [9]. In additional, we use HCM to detect the region of trees. The method to combine a features use according to the characteristic of the object and does segment the images. We consider images of outdoor scene and we would like to segment each pixel as sky, trees, building, etc. To achieve this goal, the object segmentation task requires the knowledge of the objects contained in the image. We propose a probabilistic method taking contextual information into account to segment regions belonging to scene primarily containing objects. Nevertheless, it is increasingly being recognized in the vision community that context information is necessary for a reliable extraction of the image regions and objects. This paper is organized as the following. Section 2 describes feature extraction for objects of image that present a color, edge, line segments, PCs, VPs and HCM. Section 3 describes a probabilistic method using contextual information to segment regions belonging to scene primarily containing objects. Section 4 presented the methods of region segmentation. Experimental results are shown in section 5. Section 6 concludes the paper.
2
Multiple Features
When the robot navigates on outdoor scene, we classify to know the object from image to get as the priori knowledge, and then we apply the knowledge of an object. We present the candidate for a segmented region with natural and artificial object such as sky, trees and building. So, we segment the region by using multiple features. The features are color, edge, line segments, PCs, vanishing point and HCM. The feature of color use Hue, Saturation and Intensity (HSI) color model. We can see a line segments component much in artificial object such as building. The PCs are formed by merging neighborhood of basic parallelograms which have similar colors [6]. The regions of PCs are detected. An edge is boundary between two regions with relatively distinct gray-level properties [9]. We use M-estimator SAmple Consensus (MSAC) to group such parallel line segments which have a common vanishing points [6]. We calculate one dominant vanishing point for vertical direction and five dominant vanishing points in maximum for horizontal direction. HCM are spatial dependence frequencies with a function of the angular relationship between the neighboring resolution pixels as well as a function of the distance between them [9]. We use by mixture with extracted six features. To extract region of sky and cloud, we use features of color, context
202
D.-N. Kim, H.-H. Trinh, and K.-H. Jo
information. The extraction method of tree region use features of color, context information and HCM. Also, we use to extract building as color, edge, line segments, PCs and Vps.
3
Contextual Probability
For each object we searched its habitual location in the image, which is described by the percentages of being at the top, middle and bottom of an image, (LT i , LMi and LBi , respectively). The y position of all pixels is obtained and the probability of each of them to belong to a certain position is computed. The main drawback of not using context is the overlap between classes, e.g. sky and water, both blues. The system can then easily confuse a water region, at the bottom of the image, with sky, since they have a very similar appearance. Two small image patches are ambiguous at a very local scale but clearly identifiable inside their context. Specifically, we distinguish two kinds of context information: (i) Absolute context: refereed to the location of objects in the image (sky is at top of the image, and water at bottom), (ii) Relative context: position of the objects respect to other objects in the images (grass tends to be next to the road, and clouds in the sky). Some proposals consider both kinds of context [5], while only the relative context is considered by He et al. [7]. The fuzzy rules used to provide the position of pixels in a fuzzy way. The probabilities PT (yj ), PM (yj ) and PB (yj ), are the belief that a pixel with yj position is to a certain location (top, middle and bottom) in the image. Therefore, Eq. (1) gives us the probability that a pixel j at position yj belongs to an object ØLi considering its absolute position: PL (j|ØLi ) = max(LT i ∗ PT (yj ), LMi ∗ PM (yj ), LBi ∗ PB (yj ))
(1)
We carry out with the results in this paper as the pixels with the highest probability to belong to an object (PL > 0.8) constitute the region.
Fig. 1. Flowchart for segmentation of natural and artificial object
Region Segmentation of Outdoor Scene
4
203
Segmentation of Object Region
The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. We might consider images of outdoor scene and we would like to segment each object as sky, trees, building, etc. Region segmentation uses the mixture such multiple features according to the characteristic of the objects. The flowchart of the process for segmentation of natural and artificial object is described in Fig. 1. 4.1
Segmentation of Sky and Cloud Region
Several color spaces are in wide use, including RGB, HSI, CIE, YIQ, YCbCr, etc. We convert RGB color space to HSI [3]. This paper uses HSI color model and find the value of sky and cloud in the image. This method finds the value of HSI to repeated experiment. Also, we use absolute context information for referee to the location of objects in the image. The image divides as a three part at the top, middle and bottom. If the robot travels as regular intervals on outdoor scene, we regard that sky and cloud exist at top in image. Here, we add context information as sky position at top of the image. If there is a different object in the sky, it regards as the region of sky. The method used HSI color model and found the part to correspond to the value of cloud and sky in the image. We find the value of HSI to repeated experiment. The range of sky and cloud is equal to hue, saturation and intensity of table 1. Region segmentation extracts the region of the cloud after the extraction does sky region. Region segmentation of sky and cloud shows in Fig. 1. The image to do region segmentation of sky and cloud is seen at a Fig. 2(b), Fig. 2(c). The merger of sky and cloud show Fig. 2(d). 4.2
Segmentation of Trees Region
We used HSI color model and found the part to correspond to the value of the trees in image. To find the region of trees, we uses the value of HSI to repeated experiment. Additionally, in order to estimate the similarity between different gray level co-occurrence matrices (GLCM), Haralick [1] proposed statistical features extracted from them. GLCM, one of the most known texture analysis methods,
(a)
(b)
(c)
(d)
Fig. 2. Segmentation of sky and cloud region: (a) original image (b) sky (c) cloud (d) the merger of sky and cloud
204
D.-N. Kim, H.-H. Trinh, and K.-H. Jo
(a)
(b)
(c)
(d)
Fig. 3. Comparison results of diverse cues of segments region with trees: (a) original images (b) trees detection using HSI (c) trees detection using HCM (d) trees detection using HSI+HCM
estimates image properties related to second-order statistics. Each entry (i, j ) in GLCM corresponds to the number of occurrences of the pair of gray levels i and j which are a distance apart in original image. We use co-occurrence matrix of hue-value instead of gray-level. To reduce the computational complexity, only some of these features were selected. We analyze the spatial characteristics using HCM [9]. The HCM P [i, j ] is defined by specifying a displacement vector and counting all pairs of pixels separated by distance d and direction φ having hue level i, j. Kim et al. [9] illustrated how to object HCM in the 135 ◦ diagonal direction from a simple original image having hue levels 0, 1 and 2. We have get image segmentation using displacement vector of 135 ◦ diagonal direction in the HCM. This paper thus attempts to outline an alternative reading of GLCM. Thus, this paper proposes the method of HCM algorithm. HCM analyzes of appearance count of hue value pixel pairs at original image. At first, we use HSI and find the range of hue. Then, we define a range from CM for high frequency regions. At last, we obtain the value of HCM. Finally, we use HCM and HSI together. The method of HSI alone has many noises at the image segmentation. We decrease such noise using the method using HCM and HSI together. HSI has been desired from the repeated experiment trials to segment trees regions for the natural object. 4.3
Segmentation of Building Face Region
Face of building is a plane surface which contains PCs as doors, windows, wall region and columns. The first step detects region of trees as algorithm of HSI and HCM described at previous chapter 4.2. The second step of the line segments detection use Canny edge detector. Line segments detection is a part of edge which√ satisfied two conditions [9]. In experiments, we choose T1 and T2 as 10 and 2 pixels respectively. The result of line segments detection is shown in Fig. 4(b). Most of the low contrast lines usually do not locate on the edge of PCs because the edge of PCs distinguishes the image into two regions which have high contrast color. We based on the intensity of two regions beside the line to discard the low contrast lines [9]. The result is illustrated by Fig. 4(c). The vertical group contains line segments which create an acute angle 20 ◦ in
Region Segmentation of Outdoor Scene
205
Table 1. Region segmentation of object using the range of HSI value HSI
Hue
Sky(1)
170∼300
10∼50
I ≥ 160
Cloud(2)
170∼300
S ≤ 15
I ≥ 200
Merge of (1),(2) 170∼300
S ≤ 10
I ≥ 160
S ≤ 15
I ≥ 65
Trees
(a)
60∼140
Saturation Intensity
(b)
(d)
(c)
(e)
Fig. 4. The result of building detection: (a) original images (b) line segments detection and trees region (c) survived line segments reduction (d) dominant vanishing points detected by MSAC (e) mesh of basic parallelograms of face
maximum with the vertical axis. The remanent lines are treated as horizontal groups. For the fine separation stage, we used MSAC [6] robustly to estimate the vanishing point. Suppose that the line segments end points are x1 , x2 such that [2] l = (a, b, c)T ; l = x1 × x2 and x1 = (x´1 , y´1 , 1)T , x2 = (x´2 , y´2 , 1)T . Given two lines, a common normal is determined by v = li × lj , where v = (v1 , v2 , v3 )T . Hence given a set of n line segments belonging to the lines parallel in 3D, the vanishing point v is obtained by solving the following Eq. (2): liT v = 0;
i = 1, 2, . . . , n.
(2)
The robust estimation of v by MSAC has proven the most successful. We calculate five dominant vanishing points in maximum for horizontal direction [9]. The algorithm proceeds in three steps [8,9]. The priority of horizontal vanishing point is dependent on the number Ni of parallel lines in corresponding groups. They are marked by color as following red, green, blue, yellow and magenta color. They are illustrated by Fig. 4(d). The vertical line segments are extended
206
D.-N. Kim, H.-H. Trinh, and K.-H. Jo
to detect a vanishing points. We based on the number of intersection of vertical lines and horizontal segments to detect and separate planes as the faces of building. Fig. 4(e) shows us the results of face detection. The boundaries of faces define as three steps by Kim et al [9]. The minimum of horizontal lines in left and right faces is Nl and the number of points of intersection is Ni . The ratio of Nl and Ni is larger than given threshold satisfying Eq. (3) with NT is 0.35. N=
Ni ≥ NT Nl
(3)
Finally, the mesh of basic parallelograms is created by extending the horizontal lines. Each mesh represents one face of building. Fig. 4(e) shows the results of mesh of face detection.
5
Experiment
The image database used in the experiment consist about 1300 images. Normally around the leaf of the trees has high frequency; we search the object which has proposed HCM algorithm for trees. We can see that result in Fig. 3. Also, we used HSI color model, conversion of RGB to HSI, and found a part to correspond to the value of the trees in image. At last, we find the region of trees as combined features of HSI, context information and HCM. The result of segmentation of trees region is preprocessing for detection of building. Then we remove the high frequency in trees region. Line segments for detecting building were used because of noise reduction. For detecting the faces of building used line segments and their geometrical vanishing points. MSAC algorithm is used to find the vanishing points not only for the multiple faces of building but also for the face having noises as branches of trees or electrical lines. We can see that well result in Fig. 4(e). The meshes of parallelograms can help us to detect more PCs as window, door and so on. In addition, the relation of geometrical properties as the height and the number of windows can be exploited to analyze more information of building. For example, how many rooms the building has.
6
Conclusion
This paper proposed a method of object segmentation on outdoor scene by using multiple features and context information. Multiple features are color, edge, line segments, HCM, PCs and VPs. Mixing those features, we segment the image to several regions such as sky and trees of natural object and building of artificial object. Here, we use features of color and absolute context information for extract of sky and cloud region. And, we use features of color, edge and HCM for extract of trees region. Also, we use to extract building as color, edge, line segments, PCs and VPs. Then we remove the high frequency in trees region. The meshes of parallelograms can help us to detect more PCs as window, door and so on. Overall the system of this paper segments the region of the object as a mixture by
Region Segmentation of Outdoor Scene
207
using multiple features. We accomplished the process of preprocessing to know objects from an image taken by moving robot on outdoor scene. In future, we will study how to the objects respect geometric relationships in outdoor scene between objects as well as to apply the method in a set of images containing more objects (car, people, animals, etc.). In addition, we want to know accurate the property of trees according to the season, the time of day and the weather. Acknowledgments. The authors would like to thank to Ulsan Metropolitan City and MOCIE and MOE of Korean Government which partly supported this research through the NARC and post BK21 project at University of Ulsan.
References 1. Haralick, R.M., Shanmugam, K., Dinstein, I.: Texture Features for Image Classification. IEEE Trans. on Syst. Man Cybern. SMC 3(6), 610–621 (1973) 2. Li, J., Wang, J.Z., Wiederhold, G.: Classification of Textured and Non-textured Images Using Region Segmentation. Int’l, Conf. on Image Processing, pp. 754–757 (2000) 3. Zhang, C., Wang, P.: A New Method of Color Image Segmentation Based on Intensity and Hue Clustering. Int’l Conf. on Pattern Recognition 3, 613–616 (2000) 4. Partio, M., Cramariuc, B., Gabbouj, M., Visa, A.: Rock Texture Retrieval Using Gray Level Co-occurrence Matrix. In: Proc. of 5th Nordic Signal Processing Symposium (2002) 5. Singhal, A., Jiebo, L., Weiyu, Z.: Probabilistic spatial context models for scene content understanding. In: IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol. 1, pp. 235–241 (2003) 6. Hartley, R., Zisserman, A.: Multiple view geometry in computer vision. Cambridge University Press, Cambridge (2004) 7. Xuming He., Zemel R. S., Carreira-Perpinan, M. A.: Multiscale conditional random fields for image labeling. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 695–702(2004) 8. Zhang, W., Kosecka, J.: Localization based on building recognition. In: Int’l Conf. on Computer Vision and Pattern Recognition, vol. 3, pp. 21–28 (2005) 9. Kim, D.N., Trinh, H.H., Jo, K.H.: Object Recognition by Segmented Regions Using Multiple Cues on Outdoor Environment. International Journal of Information Acquisition 4(3), 205–213 (2007)
Two-Dimensional Partial Least Squares and Its Application in Image Recognition Mao-Long Yang1,2, Quan-Sen Sun1, and De-Shen Xia1 1
Institute of Computer Science, Nanjing University of Science & Technology, Nanjing 210094, China 2 International Studies University, Nanjing 210031, China [emailprotected]
Abstract. The problem of extracting optimal discriminant features is a critical step in image recognition. The algorithms such as classical iterative partial least squares (NIPALS and CPLS), non-iterative partial least squares based on orthogonal constraints (NIPLS), and partial least squares based on conjugation orthogonal constraints (COPLS) are introduced briefly. NIPLS and COPLS methods based on original image matrices are discussed where image covariance matrix is constructed directly using the original image matrices just like 2DPCA and 2DCCA. We call them 2DNIPLS and 2DCOPLS in the paper. Two arbitrary optimal discriminant features can be extracted by 2DCOPLS based on uncorrelated score constraints in theory. At the same time, it is pointed out that 2DCOPLS algorithm is more complicated than other PLS based algorithms. The results of experiments on ORL face database, Yale face database, and partial FERET face sub-database show that the 2DPLS algorithms presented are efficient and robust. Keywords: Partial Least Squares (PLS), Uncorrelated Constraints, 2DPCA, Optimal Projection, Image Recognition.
1 Introduction Partial Least Squares Regression (PLSR) is a new multivariable analysis method proposed from application fields, which was conceived by Herman Wold for econometric modeling of multivariate time series in order to reduce the impacts from the noise in the data and to get a robust model[1]. It becomes a tool widely used for chemometrics[2]. PLS has been developing quickly in theories, algorithms and applications since 1980s. Its properties make PLS a powerful tool for regression analysis and dimension reduction, and which has good employment in many fields such as program control, data analysis and prediction, and image process and classification[3]. Classical iterative PLS (CPLS) based on singular value decomposition (SVD) is proposed because of the uncertain solutions of nonlinear iterative PLS (NIPALS) [4,5]. The first d (d = rank ( X )) projective vectors (loading vectors)
α1 ,L , α d
based on CPLS are orthogonal, and the PLS components
corresponding to them are orthogonal, too. On the other hand, non-iterative PLS D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 208–215, 2008. © Springer-Verlag Berlin Heidelberg 2008
Two-Dimensional Partial Least Squares and Its Application in Image Recognition
209
(NIPLS) based on orthogonal constraints can extract PLS scores (PLS projective features) effectively by solving SVD one time but the PLS scores may be correlated. PLS based on conjugation orthogonal constraints (COPLS) instead of orthogonal constraints can extract uncorrelated PLS scores in theory [6-13]. The criterion function of two-dimensional PLS (2DPLS) can be established with original image covariance matrix directly which is similar to 2DPCA[14] and 2DCCA[15], instead of undergoing reshaping them into vectors. In the case of image matrices, 2DPLS involves iterative problems and eigenvalue problems for much smaller size matrices, compared to the 1DPLS based, which reduces the complexity dramatically, and the PLS scores based on 2DPLS can be extracted more effectively. We introduce the basic idea of CPLS, NIPLS and COPLS briefly, then present a 2D extension of PLS, referred to 2DNIPLS and 2DCOPLS based on the basic idea of NIPLS and COPLS, which are used to extract the PLS scores of images for recognition. The results of experiments on ORL face database, Yale face database, and a partial FERET face sub-database show that the algorithms present are more efficient and robust than 1DPLS.
2 Partial Least Squares Considering
two
centered
( X , Y ) = {( xi , yi )}i =1 ∈ R × R n
vectors), α and β
p
q
sample
sets
with
n
samples,
, PLS finds pairs of projective vectors(loading
, which make the projections x
*
= X α , y * = Y β cover their *
*
variation information as much as possible, and the correlation between x , y are maximized at the same time. In general, PLS creates orthogonal score vectorsby CPLS. In other words, the criterion function to be maximized is given by
Cov ( x* , y* ) = α T E ( X T Y ) β = α T Gxy β → max .
(
Where Gxy = E X Y T
(1)
) denotes the covariance matrix between X and Y . Then
PLS is formulated as
J PLS (α , β ) = α Gxy β = T
Subject to α
T
α = β T β = 1.
The unit projective vectors,
α T Gxy β
(α α ⋅ β β ) T
1/ 2
T
.
(2)
α and β , which maximizing the function, are called *
*
PLS loading vectors. The projective vectors, x and y , have the largest covariance when the original sample vectors are projected on the loading vectors. From the idea of PLS modeling, it is easy to see how PCA and canonical correlations analysis (CCA) work in PLS, and the advantages of PCA and CCA are integrated in PLS. Besides, it is easy to see how PLS can be thought of as “penalized” CCA, with
210
M.-L. Yang, Q.-S. Sun, and D.-S. Xia
basically two PCAs (one in the X space and the other in the Y space) providing the penalties[3,9]. Formula (2) based on the orthogonal constraints on transformed
Gxy Gyxα = λα
in order (or G yx Gxy β
to
= λβ
solve ). The first
α kT α i = β kT β i = 0 can
be
eigenvalue problem of k (k ≤ r ) pairs of PLS projective
vectors are the eigenvectors of Gxy G yx (and G yx Gxy )corresponding to the first
k th
largest eigenvalues. We call the algorithm NIPLS. If, instead of orthogonal constraints, conjugate orthogonal constraints are imposed, formula (2) can be transformed in order to solve eigenvalue problem of
( I − (Gx Dx )((Gx Dx )T (Gx Dx )) −1 (Gx Dx ))Gxy Gyxα k +1 = λα k +1 , where I is a
,and D
unit matrix
x
= (α1 , α 2 ,L , α k )T . There is a completely similar expression
for the Y space structure[3,9]. We call the algorithm COPLS.
3 2DPLS 3.1 2DNIPLS Let X
= [ x1,1 ,L , xc ,nc ] be image sample matrices. xi , j is image matrix with size of
h × l , where ni (i = 1,L , c) is the number of samples belonging to the i th class and
N = n1 + n2 + L + nC is the total numbers of sample set. Thus we can obtain the mean matrix of samples matrix X =
1 N
c
ni
∑∑ x
i, j
.
i =1 j =1
To image recognition tasks, sample images can be considered as a variable set in 2DPLS, called sample matrix. Another variable set is class membership matrix, which represents the relationship between samples and classes. It is similar to the definition in traditional CCA and PLS methods[3], the class membership matrix can be coded in two equally reasonable ways as follow[3,15]:
⎡ P1 02 ⎢0 P Z1 = ⎢ 1 2 ⎢M L ⎢ ⎣ 01 02
L 0c ⎤ ⎡ P1 02 ⎥ ⎢ L 0c , Z = ⎢ 01 P2 ⎥ 2 O M⎥ ⎢M L ⎥ ⎢ L Pc ⎦ ( h×c )×(l× N ) ⎣01 02
L L 0c ⎤ L L 0c ⎥⎥ . O M M⎥ ⎥ L Pc −1 0c ⎦ ( h×( c −1))×(l× N )
(3)
Pi means there are ni samples in the i th class, but each sample here is corresponding to a matrix Qh×l as large as the size of sample image (in general, we
Where
presume that the number of row is larger than that of column in image samples, namely h > l ). So the matrix Pi can be denoted as
Two-Dimensional Partial Least Squares and Its Application in Image Recognition
211
Pi = [Q,L , Q]h×( l×ni ) , i = 1,L , c . Such class membership matrix can not only show the membership between samples and classes but also maintain the special information of sample images. For obtaining the mean of class membership matrix in the sense of two dimensional sample representation, where yi , j is a matrix with the matrix Y is rewritten as Y = [ y1,1 ,L , yc , nc ]
,
size
(h × c) × l . Then the mean of class membership matrix is
of
1 Y= N
ni
c
∑∑ y
1 Gx = N
i, j
, and the covariance matrices of
X and Y are denoted as
i =1 j =1 ni
c
∑∑ ( x
i, j
− X )( xi , j − X )
T
i =1 j =1
Gxy = G Tyx =
1 N
c
ni
∑∑ ( x
i, j
,
1 Gy = N
c
ni
∑∑ ( y
i, j
− Y )( yi , j − Y )T
,
i =1 j =1
− X )( yi , j − Y )T , respectively.
i =1 j =1
Then formula (2) can be transformed in order to solve two eigenvalue problems of matrices as below:
Gxy G yxα = λ 2α .
(4)
G yx Gxy β = λ 2 β .
(5)
Under the orthogonal constraints
α kT α i = β kT β i = 0 ( 1 ≤ i < k ), as we know, the
number of available projective vectors is r pairs ( r is the nonzero eigengvalue numbers of matrix Gxy G yx ), and any subsequent PLS projective vectors, say α k , β k ( k
≤ r ) , with the computation of eigenvector of equation (4) and (5) corresponding to the k th largest eigenvalue. Since matrix Gxy G yx and G yxGxy are
symmetric matrices, and rank (Gxy G yx ) = rank (G yx Gxy ) ≤ rank (Gxy ) , we can get the conclusion that the nonzero eigengvalues of eigen-equation (4) and (5) are uniform and the numbers are not greater than rank (Gxy ) . Let λ1 ≥ λ2 ≥ L ≥ λr > 0 , the r pairs of eigenvectors corresponding to them are 2
2
2
orthogonal, namely α i
T
and
α j = βiT β j = δ ij , we can also deduce α i = λi−1Gxy βi .
(6)
βi = λi−1Gyxα i .
(7)
212
M.-L. Yang, Q.-S. Sun, and D.-S. Xia
α iT Gxy β j = α iT Gxy (λ j−1Gyxα j ) = λ j−1α iT (λ j2α j ) = λ jδ ij .
(8)
Generally we solve equation (4) or (5) which rank is less, and calculate another eigenvector with formula (6) or (7). We call the method mentioned above 2D noniterative PLS (2DNIPLS). 3.2 2DCOPLS *
The covariance matrix of sample feature vectors xi and
x*j ( yi* and y*j ) obtained with
2DNIPLS can be defined as
E[( xi* − E ( xi* ))T ( x*j − E ( x*j ))] = α Tj Gxα i .
(9)
E[( yi* − E ( yi* ))T ( y*j − E ( y*j ))] = β Tj Gy βi .
(10)
() ( )
Generally equation 9 and 10 are not equal to 0, that is, the feature vectors projected by loading vectors of NIPLS may be correlative. In order to obtain uncorrelated projective features, the k + 1 st (k ≥ 1) pair of optimal projective directions,
{α k +1 ; β k +1} , could be the one that satisfy conjugate orthogonal
constraints (11) and maximize criterion function (2) after the first pair of optimal discriminative projective direction by 2DNIPLS given in section 3.1.
α kT+1G xα i = β kT+1G y β i = 0
( i = 1, 2, L , k ) .
(11)
If we calculate r (r ≤ n) pairs of optimal projective directions with the method mentioned as the above, the improved optimal projective features *
x* , y * will be
*
deduced, and the arbitrary two projective features xi and x j are uncorrelated. As we know, the optimal projective directions {α k +1 ; β k +1} which satisfy conjugate orthogonal constraints (11) and maximize criterion function (2) are the eigenvector corresponding to the largest eigenvalue of the two eigen-equation (12) and (13) [3,7]
,
PGxy G yxα k +1 = λα k +1 .
(12)
QG yx Gxy β k +1 = λβ k +1 .
(13)
−1
Where P = I − (Gx Dx )((Gx Dx ) (Gx Dx )) (Gx Dx ) , T
−1
Q = I − (G y Dy )((Gy Dy ) (Gy Dy )) (G y Dy ) T
Dy = ( β1 , β 2 ,L , β k )T .
,
I is a unit matrix, and
Dx = (α1 , α 2 ,L , α k )T
,
Two-Dimensional Partial Least Squares and Its Application in Image Recognition
213
After the optimal projections {α i ; β i }i =1 are calculated, we can use the k
= (α1 , α 2 ,L , α d )(d = 1,L , k ) to extract 2D feature of images. For j = W T x .The size of example, for a given image xni with h × l , we have x ni x ni j j as the matrix W and x is h × d and d × l respectively, and we call matrix x
matrix Wx
x
ni
ni
projective feature matrix of given image A.
4 Experiments and Discussion In this section, we design experiments about image recognition to test the performance of the 2DCOPLS method on the ORL database, Yale database and partial data of a FERET sub-database, respectively. All the experiments are carried out on a PC with Intel Core 2, 1.83GHz, 1.5GMB memory and the MATLAB7.5 software platform. The ORL database (http://www.cam-orl.co.uk) contains images from 40 individuals, each providing 10 different images. For some subjects, the images were taken at different times. The facial expressions (open or closed eyes, smiling or nonsmiling) and facial details (glasses or no glasses) also vary. The images were taken with a tolerance for some tilting and rotation of the face of up to 20 degrees. Moreover, there is also some variation in the scale of up to about 10 percent. All images are grayscale and normalized to a resolution of 92 112 pixels. Yale database (http://cvc.yale.edu/projects/yalefaces/yalefaces.html) contains 165 grayscale images of 15 individuals, each of the following facial expressions or configurations: centerlight, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink. All images are cropped with the size of 120×91 pixels. A partial FERET face sub-database comprises 400 gray-level frontal view face images from 100 individuals, and each individual has two images (fa and fb) with different facial expressions. The images are pre-processed by the methods present in literature[16] which are normalized with respect to eye locations, and are cropped with the size of 130×150 pixels. We select five image samples randomly per individual for training and the remaining five images for testing on ORL and Yale database. Two and two sample images are selected randomly for training and testing on FERET face sub-database respectively. In the experiment of 1DPCA, 1DCCA and 1DPLS, we first reduce the image dimension by PCA to a lower dimension until 90% energy of the image being kept. In the case of 2D, image size is reduced to 1/4 of the original on ORL and Yale database, and which is 1/5 on FERET database. In the experiments, we use image samples and class membership matrix Z1 given in
×
section 3.1, and the nearest neighbor classifier is employed. The experiments are repeated 20 times and the optimal average results are shown in table 1. Data in parenthesizes are the feature dimensions corresponding to the accuracy. The results obtained by PCA and CCA with the same samples and conditions are shown in the table. Time elapsed corresponding to the best accuracy on databases are shown in table 2.
214
M.-L. Yang, Q.-S. Sun, and D.-S. Xia Table 1. The best results on databases
Data base
PCA 0.9405 ORL (39) 0.7433 Yale (17) 0.7876 FERET (62)
CCA CPLS 0.9535 0.9490 (18) (38) 0.8893 0.7647 (41) (14) 0.8255 0.7874 (66) (61)
Method NIPLS COPLS 2DPCA 0.9490 0.9533 0.9545 (38) (36) (16) 0.7700 0.7900 0.8160 (15) (15) (22) 0.7884 0.7942 0.8354 (63) (61) (21)
2DCCA 2DNIPLS 2DCOPLS 0.9590 0.9503 0.9638 (6) (23) (2) 0.9453 0.8213 0.9093 (23) (16) (17) 0.8388 0.8202 0.8331 (25) (24) (15)
From table 1, we can find that both 2DNIPLS and 2DCOPLS work effectively in image recognition. The efficiency is equivalent to 2DPCA and 2DCCA since each has its strong point on the three databases. The best recognition accuracy of 2DCOPLS , corresponding to 1DPLS, rises 1%,12% and 4% on ORL, Yale and FERET database respectively. In the experiments we also find that the error rate with 2DCOPLS descends more quickly than that of other PLS methods with feature dimensions increasing. From table 2, we can find that Time elapsed is less than other PLS methods when 2DNIPLS is employed. 2DCOPLS works more inefficient with the increments of training sample number and image size. For example, when 2DCOPLS is employed, in the case of 200 training samples and 200 testing samples, 28×23 (scale=1/4)and 30×26 (scale=1/5) image size on ORL and FERET, time elapsed for feature extracting on ORL is twelvefold as that on FERET! Table 2. Time elapsed corresponding to the best accuracy on databases(s) Sample numbers 400 112 92 165 120 91 400 150 130
Database Image size ORL Yale FERET
× × ×
CPLS 14.7 3.3 19.1
NIPLS 15.3 3.7 19.4
Method COPLS 2DNIPLS 17.8 8.8 4.0 3.2 20.8 15.1
2DCOPLS 226.7 19.5 2458
From the process of solving projective vectors, we know that 2DCOPLS is an effective method for image recognition whether total-class scatter matrices are singular. On the other hand, 2DCOPLS consumes away more spatial and temporal resource than other PLS method mentioned in the paper. So we should consider all the factors such as image size and sample number to select an appropriate method for recognition. For example, the 2DNIPLS may be a good choice in some cases.
5 Conclusion We present reformative PLS methods called 2DNIPLS and 2DCOPLS in the paper, which are efficient and robust methods for image recognition. The proposed methods directly use image matrix to extract the feature instead of matrix-to-vector transformation, which can effectively avoid the singularity of total-class scatter matrices. Furthermore, 2DCOPLS can achieve better recognition accuracy than other
Two-Dimensional Partial Least Squares and Its Application in Image Recognition
215
PLS based methods since conjugate orthogonality constraints are imposed on the directions in both the X and Y spaces. In theory, 2DCOPLS can extract two arbitrary uncorrelated vectors, and then the optimal discriminant projective features can be extracted. Besides, we point out that 2DCOPLS is more complicated than other PLS based method, and the spatial and temporal cost increases more quickly with sample size and number increasing. Acknowledgements. We wish to thank the National Science Foundation of China, under Grant No. 60773172, for supporting our research.
References 1. Wold, H.: Estimation of Principal Components and Related Models by Iterative Least Squares. In: Multivariate Analysis. Academic, New York (1966) 2. Wold, S., Sjölström, M., Erikson, L.: PLS_Regression: A Basic Tool of Chemometrics. Chemometrics and Intelligent Laboratory Systems 58, 109–130 (2001) 3. Barker, M., Rayens, W.: Partial Least Squares for Discrimination. Journal of Chemometrics 17, 166–173 (2003) 4. Wold, H.: Path with Latent Variables: The NIPALS Approach. In: Balock, H.M. (ed.) Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building, pp. 307–357. Academic Press, London (1975) 5. Höskuldsson, A.: PLS Regression Methods. Journal of Chemometrics 2, 211–228 (1988) 6. Liu, Y.-S., Rayens, W.: PLS and Dimension Reduction for Classification. Computational Statistics 22, 189–208 (2007) 7. Yang, J., Yang, J.-Y., Jin, Z.: A Feature Extraction Approach Using Optimal Discriminant Transform and Image Recognition. Journal of Computer Research & Development 38, 1331–1336 (2001) 8. Frank, I.E., Friedman, H.: A Statistical View of Some Chemometrics Regression Tools. Technometrics 35, 109–135 (1993) 9. Han, L.: Kernel Partial Least Squares for Scientific Data Mining. PHD thesis, Rensselaer Polytechnic Institute, Troy, New York (2007) 10. Arenas-García, J., Petersen, K.B., Hansen, L.K.: Sparse Kernel Orthonormalized PLS for Feature Extraction in Large Data Sets. In: Advances in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007) 11. Baek, J.-S., Kim, M.: Face Recognition Using Partial Least Squares Components. Pattern Recognition 37, 1303–1306 (2004) 12. Jacob, A.: A Survey of Partial Least Squares Methods, with Emphasis on The Two-block Case. Technical Report, Department of Statistics, University of Washington, Seattle (2000) 13. Trygg, J., Wold, S.: Orthogonal Projections to Latent Structures. Journal of Chemometrics 16, 119–128 (2002) 14. Yang, J., Zhang, D., Alejandro, F., Yang, J.-Y.: Two-dimensional PCA: A New Approach to Appearance-based Face Representation and Recognition. IEEE transactions on pattern analysis and machine intelligence 26, 131–137 (2004) 15. Lee, S.-H., Choi, S.: Two-Dimensional Canonical Correlation Analysis. IEEE Signal Processing Letters 14, 735–738 (2007) 16. Bolme, D.S., Beveridge, J. R., Teixeira, M., Draper, B. A.: The CSU Face Identification Evaluation System: Its Purpose, Features, and Structure. In: Proceedings of 3rd International Conference on Computer Vision Systems (ICVS), pp.304–313 (2003)
A Novel Method of Creating Models for Finite Element Analysis Based on CT Scanning Images Liulan Lin, Jiafeng Zhang, Shaohua Ju, Aili Tong, and Minglun Fang Rapid Manufacturing Engineering Center, Shanghai University, 99 Shang Da Road, 200444 Shanghai, China {linliulan}@staff.shu.edu.cn
Abstract. A novel method of creating models for finite element analysis (FEA) from medical images was proposed in this paper. The CT scanning images of human right hand were imported into a medical image processing software Mimics and the 3D STL model of the bone framework was reconstructed by selecting proper threshold value. A piece of the radius was cut from the bone framework model and remeshed in Magics to obtain triangles with higher quality and optimized quantity. The remeshed radius model was exported into FEA software ANSYS to create the volume mesh, and the unidirectional loading simulation was analyzed. This method eliminates the need for extensive and long time experiments and provides a helpful tool for biomedicine and tissue engineering. Keywords: Finite element analysis; CT scanning images; STL.
1 Introduction Recently, finite element (FE) modeling technique in combination with computed tomography (CT) imaging methods has become an important tool for the characterization of bone mechanics[1,2]. Although the resolution of CT images is not as good as that obtained from micro-imaging techniques, it is sufficient to provide a basis for the generation of FE-models that represent the bones in vivo. FEA methods have been used for the determination of mechanical stresses during anatomical function, the strength of tissue segments, the prediction of failure modes/causes, but also the suggestion of possible remedies[3,4]. In order to generate the FE-models, the traditional meshing procedures have been developed. The most commonly applied being the voxel conversion technique providing meshes with hexahedron elements and the marching cube algorithm providing meshes with tetrahedral elements [5-7]. Since, however, this method is inefficient, which will exceed the desired time of optimal clinical treatment, and a large number of elements and nodes are created to represent the FE model, which has negative effect in interactive operation with the users. An alternative meshing strategies of creating models for FEA from CT images was proposed in this paper, which is comprised by area mesh optimization and solid mesh D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 216–221, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Novel Method of Creating Models for Finite Element Analysis
217
Fig. 1. Schematic of the computer assisted analysis method
creation method. The entire process of the method was showed as Fig.1. This method was demonstrated by uniaxial pressure analysis to simulate stress distribution of a human hand bone with accurate results.
2 Methods 2.1 Modeling for FE One human hand was scanned by Computer Tomography (CT) with the scan distance of 0.1mm. A total of 208 slices were taken about 10min. The different bone tissues visible on the scans were using an interactive medical image control system (MIMICS 10.01, Materialise, Leuven, Belgium). MIMICS imports CT data in a wide variety of formats and allows extended visualization and segmentation functions based on image density thresholding. 3D objects were automatically created by growing a threshold region on the entire stack of scans (Fig.2A). Then these objects were exported with STL file into rapid prototyping software (Magics X, Materialise, Leuven, Belgium). Cut a part of radius with cutting operation in this software (Fig.2B).
Fig. 2. FE modeling. (A) CT-scan data as seen in MIMICS 10. 3D representation of human’ hand bone as a result of construction in MIMICS (B) Cut a part of radius (Green part) with cutting operation in Magics.
218
L. Lin et al.
2.2 Mesh Generation The REMESH module attached to Magics was used to reduce the amount of triangles automatically, and simultaneously improve the quality of the triangles while maintaining the geometry. During remesh, the tolerance variation from the original data can be specified. In view of loaded and constrained faces, the local optimizations were applied in these faces. The holistic quality is defined as a measure of triangle height/base ratio so that the file can be imported in the finite element analysis software package without generating any problem (Fig. 3B). This step was taken about 15min.
Fig. 3. Meshing. (A) STL file of the part of radius obtained through Magics. (B) Radius STL file optimized for FEA using the REMESH module within Magics.
Fig. 4. Volumetric meshes (Element type: Solid 186, 20node)
The optimized STL file of the radius was then imported in finite element analysis software (ANSYS, inc. USA) for the generation of volumetric mesh and appended material properties (Fig.4). Before volumetric mesh operation, the loading and
A Novel Method of Creating Models for Finite Element Analysis
219
constraining face should booleaned into an integer respectively. The radius model was meshed by tetrahedral element (20 notes). 2.3 FEA Validation For validation of the model, 1MPa of pressure was inflicted on the top face of radius in this examination (Fig.5). The material properties obtained in other researches were also considered. A finite element model for strength analysis of radius model under compression was presented. Characteristics of stress distribution and location are determined according to the model.
Fig. 5. Loading and constraining model
3 Results and Discussion A series of meshing operation to STL file of radius were implemented before the distribution of stress analysis. The number of elements and nodes were in these operations for meshing (Table 1). Before the optimization of STL file which obtained through Magics (Fig. 3A), the number of elements is 4806 but no node. Using the Magics REMESH module, the number of elements was reduced by 20%. The two meshing steps have empty node because the mesh is just surface mesh. The optimized STL file of the radius was then imported in finite element analysis software (ANSYS, inc. USA) for the generation of volumetric mesh. The elements increase from 3826 to 25082. The number of nodes is 37791. The meshing approach used in this study suggests that maximum anatomical detail is obtained by surface/interface-based meshing using Stereolithography (STL) surface data. The different parts of the model that featuring different mechanical properties are identified firstly (segmentation process) and meshed accordingly. The very
220
L. Lin et al. Table 1. The Comparison of each meshing step with number of elements and nodes Before optimized
After optimized
Volumetric mesh
Elements
4806
3826
25082
Nodes
N
N
37791
Fig. 6. Stress distribution
user-friendly graphic interface allows for rapid modifications of the different parts and generation of new STL that can be instantly exported and volumetrically meshed the FEA program. Results of the finite element analysis show that stress density in the radius of reconstructed model which was based on CT scan data (Fig.6).The applicability of the method is proved by the results of stress distribution. The minimum stress was distributed in the constraining face of model. Stress distribution of this 3D digital model was continuous. The potential use of the model was demonstrated using nonlinear contact analysis to simulate compression loading. It has proven to be a useful tool in the thinking process for the understanding of the biomimetic approach in restorative bone grafts.
4 Conclusion This method of creating models for finite element analysis (FEA) from medical images could eliminate the need for extensive and long time experiments. The efficiency and accuracy of image processing, 3D reconstruction, STL file remesh and FEA volume mesh generation of this method were validated in this paper. This methodology
A Novel Method of Creating Models for Finite Element Analysis
221
could facilitate optimization and understanding of biomedical devices prior to animal and human clinical trials. Acknowledgments. The authors would like to acknowledge the support of Shanghai Academic Excellent Youth Instructor Special Foundation Postdoctor Science Fund (No.20070410715).
References 1. Pistoia, W., Rietbergen, B.V., Lochmuller, E.M., Lill, C.A., Eckstein, F., Rüegsegger, P.: Estimation of Distal Radius Failure Load with Micro-finite Element Analysis Models Based on Three-Dimensional Peripheral Quantitative Computed Tomography Images. Bone 30(6), 842–848 (2002) 2. Zannoni, C., Mantovani, R., Viceconti, M.: Material Properties Assignment to Finite Element Models of Bone Structures: A New Method. Medical Engineering & Physics 20, 735– 740 (1998) 3. Cattaneo, P.M., Dalstra, M., Melsen, B.: The Finite Element Method: A Tool to Study Orthodontic Tooth Movement. J. Dent. Res. 84(5), 428–433 (2005) 4. Su, R., Campbell, G.M., Boyd, S.K.: Establishment of an Architecture-Specific Experimental Validation Approach for Finite Element Modeling of Bone by Rapid Prototyping and High Resolution Computed Tomography. Medical Engineering & Physics 29, 480–490 (2007) 5. Chevalier, Y., Pahr, D., Allmer, H., Charlebois, M., Zysset, P.: Validation of a Voxel-Based FE Method for Prediction of the Uniaxial Apparent Modulus of Human Trabecular Bone Using Macroscopic Mechanical Tests and Nanoindentation. Journal of Biomechanics 40, 3333–3340 (2007) 6. MacNeil, J.A., Boyd, S.K.: Bone Strength at the Distal Radius can Be Estimated from HighResolution Peripheral Quantitative Computed Tomography and the Finite Element Method. Bone 42, 1203–1213 (2008) 7. Ulrich, D., Rietbergen, B.V., Weinans, H., Rüegsegger, P.: Finite Element Analysis of Trabecular Bone Structure: A Comparison of Image-Based Meshing Techniques. Journal of Biomechanics 31, 1187–1192 (1998)
Accelerating Computation of DNA Sequence Alignment in Distributed Environment Tao Guo1, Guiyang Li1, and Russel Deaton2 1
College of Computer Science, Sichuan Normal University, 610066 Chengdu, China {tguo,gyli}@sicnu.edu.cn 2 College of Computer Science and Engineering, University of Arkansas, 72701 Fayetteville, USA [emailprotected]
Abstract. Sequence similarity and alignment are most important operations in computational biology. However, analyzing large sets of DNA sequence seems to be impractical on a regular PC. Using multiple threads with JavaParty mechanism, this project has successfully implemented in extending the capabilities of regular Java to a distributed environment for simulation of DNA computation. With the aid of JavaParty and the design of multiple threads, the results of this study demonstrated that the modified regular Java program could perform parallel computing without using RMI or socket communication. In this paper, an efficient method for modeling and comparing DNA sequences with dynamic programming and JavaParty was firstly proposed. Additionally, results of this method in distributed environment have been discussed.
1 Introduction DNA contains the genetic information of cellular organisms. It consists of polymer chains or DNA strands. The DNA strand contains a linear chain of nucleotides or bases. With the development of modern methods for DNA sequencing, a huge amount of DNA sequences has been generated so far. However, mining the voluminous sequence databases to generate useful information is backward because of the problem complexity [1]. During the past years, various heuristic methods like FASTA [2] and BLAST [3], and dynamic programming methods of Smith-Waterman [4] to identify homologous sequences have been reported. Some of methods have been showed very promising. Janaki pointed out that it is impossible for current single processor computer to handle such voluminous DNA sequences [5]. Javaparty provides a distributed platform and can be possibly used for DNA computation when appropriate computing algorithm is selected. Currently, no literature information has been reported for DNA sequence comparison by using JavaParty combined with dynamic programming algorithm in a distributed environment for parallel computation. In this paper, a dynamic programming running on a distributed JavaParty environment is proposed to accelerate DNA sequence computation. The dynamic programming algorithm, thread generation, DNA concurrent computation, and validity of this method have been addressed. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 222–228, 2008. © Springer-Verlag Berlin Heidelberg 2008
Accelerating Computation of DNA Sequence Alignment in Distributed Environment
223
In this method, each generated thread will be sent to a virtual machine in JavaParty runtime environment to perform DNA sequence comparison concurrently. JavaParty classes in this method can be declared as remote objects targeted to different standard virtual machines for implementing a distributed computation of DNA sequences. The outline of this new method of DNA computing was described by the flow diagram. See Figure 1. JavaParty Environment
Dynamic Programming for DNA Sequence Comparison
Multiple Threads
DNA Sequence Comparison Fig. 1. DNA computing in JavaParty distributed environment
2 Methods for Sequence Computation Sequence alignment is one of the most important operations in computational biology, facilitating everything from identification of gene function to structure prediction of proteins. Alignment of two sequences shows how similar the two sequences is, where there are differences between them, and the correspondence between similar subsequences since sequence alignment represents important information for biologists. To find optimal alignment score Mij of two sequences X[1...i] and Y[1...j], three steps of sequence alignment computation are considered in this method: 1) Create a matrix and perform an initialization To find the alignment, the first step in the alignment dynamic programming approach is to create a matrix with M + 1 columns and N + 1 rows where M and N correspond to the size of the sequences to be aligned. 2) Calculate score of each cell in matrix One possible solution of the matrix fill step finds the maximum global alignment score by starting in the upper left hand corner in the matrix and finding the maximal score Mi,j for each position in the matrix. In order to find Mi,j for any i,j it is minimal to know the score for the matrix positions to the left, above and diagonal to i, j. In terms of matrix positions, it is necessary to know Mi-1,j, Mi,j-1 and Mi-1, j-1.
224
T. Guo, G. Li, and R. Deaton
For each position, Mi,j is defined to be the maximum score at position i,j; (1)
Mi,j = MAXIMUM [Mi-1, j-1 + Si,j, Mi,j-1 + w, Mi-1,j + w ] Si,j : match or mismatch in diagonal. If matches, Si,j=1, if not, Si,j = -1 w: gap penalty in sequence M and N. Default value equals 0
3) Trace back to get sequence alignment and computing the length of a Longest Common Sequence (LCS) After the matrix was filled with score, the maximum alignment score for the two test sequences is gotten. The traceback step determines the actual alignment(s) that result in the maximum score. Assume two DNA sequences X=
A
A
T
T
C
A
G
T
C
A
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
2
1
1
2
2
2
2
2
2
2
2
3
1
2
2
3
3
3
3
3
3
3
3
1
2
2
3
3
4
4
4
4
4
4
1
2
2
3
3
4
4
5
5
5
5
1
2
3
3
3
4
5
5
5
5
6
Fig. 2. A matrix of scores comparing two DNA sequence; continuous high-scoring matches are highlighted
To measure the similarity of strands M and N is by finding a third strand L=
3 JavaParty for Parallel Computing JavaParty was first designed and built by Michael Philippsen and Matthias Zenger in 1996[6]. It combines Java-like programming and the concepts of distributed shared
Accelerating Computation of DNA Sequence Alignment in Distributed Environment
225
memory in heterogeneous networks. JavaParty is a “two-purpose platform” [7]. It serves as a programming environment for cluster or parallel applications and provides a basis for computer science research in finding an optimization technique to improve performance. With JavaParty, remote classes and their instances are visible and accessible through the entire distributed JavaParty environment. This mechanism allows objects are used locally at the cost of a pointer indirection instead of expensive OS communication overhead [7]. JavaPary is easily to turn the multi-threaded Java program into a distributed environment by identifying those classes and thread. At this point, JavaParty is an optimal way to program clusters of workstations and workstation-based parallel computers with Java. Haumacher [8] pointed that JavaParty was already used for transparent distributed threads successfully.
4 Multi-threads and Concurrent Computing 4.1 Multi-threads for Sequence Comparison In this new method, a distributed DNA sequence comparison with dynamic programming was designed to run in separated multiple threads under JavaParty environment. Each thread performs sequence alignment concurrently. The algorithm for generating multi-threads is shown. Function for multi-threads generation Function GENERAT-THREAD (int-seq, tar_seq, parameter) interest_seqÅinterestGen target_seq<- targetGen parameter Å Multi_Thread(GetObject, interest_seq, target_seq) goÅgetOpt(UserInput) GetObject ÅPut-IntoHash(go.optArgGet()) FOR iÅ0 to generationNumber FOR node Å0 to threadNumber DO Thread(parameter) EenFunc Procedure GENRRATE-THREAD reads targeted DNA sequence and interested sequence as input, and then gets user’s option for thread number from user’s input. Based on the number of thread, the function generates several threads. These threads will be sent to different visual machines to perform job of DNA computation independently. 4.2 DNA Concurrent Computation In this method, following algorithm was employed for generating subsections of DNA sequence and to realize the concurrent computation.
226
T. Guo, G. Li, and R. Deaton
Function SEP-DNA(A, r) thÅA[r] genSizeÅRead_targetGen FOR i Å0 to threadNumber FOR j Åth*(genSize/threadNumber) to (th+1)*(genSize/threadNumber) FOR k Åj+1 to genSize DO LCS-Length(X, Y) EndFunc Before comparing an interested sequences with a targeted DNA sequence, function SEP-DEN() is called for constructing subsection based on number of threads generated from function GENERATE-THREAD(). Each subsection represents a segment of targeted DNA sequence. Then, the interested sequence performs sequence alignment operation with dynamic programming. The function LCS-Length (X, Y) [5] returns a tables which contains the length of an LCS of X and Y.
5 Results and Discussion This research project has successfully implemented multi-threads and JavaParty in extending the capabilities of regular Java to a distributed computing environment for implementing comparison of DNA sequences. From the results, we can make following conclusions:
) Regular Java program can be modified by multi-threaded and JavaParty to perform computation in distributed environment. 2) The running time was between 2.371 to 3.73ms when DNA sequence size 100 and
1
1-5 threads were given. Increasing the size to 500, the running time increased to 77.83ms. When the DNA sequence size increased to 1000, the running time increased rapidly to over 647.90ms. Figure 3 indicates that computation time increases as sequence size increase.
This research project demonstrated that using multiple threads and JavaParty, regular Java programming can be modified to perform parallel computing without involving complex RMI or socket communication. JavaParrty includes a preprocessor for helping parallel computing. It generates effective multi-threads to deal with distributed computation for DNA sequence. From Figure 3, it is clear that if number of DNA sequence is less than 500, there is no significant impact on running time when increasing the number of threads; if we increase the number of DNA sequence (greater than 1000), the advantage of multi-threaded distributed computing is fully reflected. Figure 3 shows that when the lengths of DNA sequence increased and more threads generated, the running time of computation decreased significantly in a Javaparty distributed environment. Phillippsen pointed out: increasing the number of threads will offer appropriate means either for improving performance or shorten the running time when computing DNA sequence [9].
Accelerating Computation of DNA Sequence Alignment in Distributed Environment
227
7LPHPV
/LEUDU\6L]HB7KUHDG1XPEHU
Fig. 3. Effect of using multiple threads and JavaParty on timing for simulation of DNA computing with five virtual machines
6 Conclusion Sequence alignment is an important research area in the field of bioinformatics. Since the growth of biological sequence happens at faster pace. The execution time for the sequence similarity algorithms is dramatically increased using dynamic program algorithm, as the DNA sequence size gets larger. In this paper, a dynamic programming algorithm is implemented to find DNA sequence similarity and alignment, multiple threads with JavaParty were also involved to extend the capabilities of regular Java to distributed computing environments for accelerating computation without using RMI or socket communication.
References 1. Russell, D., et al.: DNA Computing: A Review. Fundamental Informatics 30, 23–41 (1997) 2. Pearson, W.R., Lipman, D.J.: Improved Tools for Biological Sequence Comparison. Proc. Natl. Acad. Sci. 85, 2444–2448 (1988) 3. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990) 4. Smith, T.F., Waterman, M.S.: Identification of Common Molecular Subsequences. J. Mol. Biol. 147, 195–197 (1981) 5. Chintalapati, J., Rajendra, R.J.: Accelerating Comparative Genomics Using Parallel Computing. Silico Biology 3, 36 (2003) 6. Philippsen, M.: Data Parallelism in Java. In: Schaefer, J. (ed.) High Performance Computing Systems and Applications, pp. 85–99. Kluwer Academic Publishers, Dordrecht (1998) 7. Philippsen, M., Zenger, M.: JavaParty-Transparent Remote Objects for Java. Practice and Experience 9(11), 1225–1242 (1997)
228
T. Guo, G. Li, and R. Deaton
8. Bernhard, H.: Transparent Distributed Threads for Java. 5th Internal Workshop on Java for Parallel and Distributed Computing. In: Conjunction with the International Parallel and Distributed Processing Sysposium (IPDPS 2003), NICE, France (2003) 9. Philippsen, M.: Transparent Remote Objects in Java. Practice and Experience 9(11), 1225–1242 (1997)
Predicting Protein Function by Genomic Data-Mining Changxin Song1 and Ke Ma2 1
Department of Computer, Qinghai Normal University 810008, Xining, P.R.China [emailprotected] 2 Network center, Qinghai Normal University 810008, Xining, P.R.China [emailprotected]
Abstract. In this paper, we investigated data analysis methods to discover useful genomic data for predicting protein function. Nowadays, non-SIM based bioinformatics methods are becoming popular. One such method is Data Mining Prediction (DMP). This is based on combining evidence from amino-acid attributes, predicted structure and phylogenic patterns; and uses a combination of Inductive Logic Programming data mining, and decision trees to produce prediction rules for functional class. We examined the scientific literature for direct experimental derivations of ORF function. It confirmed the DMP predictions. Accuracy varied between rules, and with the detail of prediction, but they were generally significantly better than random. These DMP predictions have been confirmed by direct experimentation. DMP is, to the best of our knowledge, the first non-SIM based prediction method to have been tested directly on new data.
1 Introduction As the completion of the multiple genome sequencing projects, the study emphasis of scientists is now changing to a functional understanding of genes. The most important problem in functional genomics is the assignment of function for sequence open reading frames (ORFs). In the absence of direct experimental evidence of gene function, bioinformatics approaches must be applied. The most common used method is to infer orthologous homology using a statistically based sequence similarity (SIM) method, such as FASTA (Pearson and Lipman1988) or PSI-BLAST (Altschul et al, 1997). Without a recognized orthologous sequence that is detected by SIM, the bioinformatics prediction of ORF function is more problematic. The methods with greatest promise are probably those based on data from high-throughput functional genomics experiments, e.g. microarrays (Brown et al., 2000) and phenotype analysis (Clare and King, 2002). However, if no experimental evidence is available, methods based only on information derived from sequence are required. Many different ways of doing this have been proposed, and each based on different types of information: 1 Amino-acid attributes. 2 Structure. 3 Gene fusion. 4 Chromosome proximity. 5 Phylogenic patterns. 6 Hybrid approaches. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 229–235, 2008. © Springer-Verlag Berlin Heidelberg 2008
230
C. Song and K. Ma
2 Methods We used two ways to test the predictions. • •
We compared our predictions with the updated (20.02.02)Monica Riley genome group annotations. This test has the advantage of testing a large number of predictions. We examined the scientific literature for the direct experimental derivation of ORF functions for our predictions. This test has the advantage of directly experimentally testing the predictions.
In the intervening period, the functional ontology used by this group has changed considerably in structure. This meant that it was not possible to use the new annotation classes directly to test the predictions, and therefore each predicted ORF has had to be examined individually by hand to judge if the prediction had been confirmed or not. The original classification scheme had three levels, however, we have only examined levels two and three. The reason for this is that the top level (one) is least informative, and that this level of classification has changed most dramatically in the new Riley classification scheme. To classify the results of these predictions we used the following scheme: • • • • • • • • • •
Correct—In the new annotation, the function is consistent with our predictions and the term ‘putative’ or a related term is not used. Wrong—In the new annotation, the function is inconsistent with our prediction and the term ‘putative’ or a related term is not used. New correct putative—In the new annotation, the function is consistent with our predictions, and the term ‘putative’ or a related term is used. New wrong putative—In the new annotation the function is inconsistent with our prediction, and the term ‘putative’ or a related term is used. Same correct putative—The function in the new annotation is the same as previously present, it is consistent with our predictions, and the term ‘putative’ or a related term is used. Same wrong putative—The function in the new annotation is the same as previously present, it is inconsistent with our prediction, and the term ‘putative’ or a related term is used. Evidence for—No definite function is given, but the new annotation presents evidence that is consistent with our predictions. Evidence against—No definite function is given, but the new annotation presents evidence that is inconsistent with our predictions. No evidence—Either no annotation is given, or the new annotation provides no evidence to decide on the validity of our prediction. Near miss—The new annotation is inconsistent with our prediction, but the predicted class is functionally close to the annotated class.
To ensure consistency ,every classification was examined at least twice by RDK, with a delay between two examinations. In addition, the predictions of the different rules were examined separately, and any contradiction that arose was examined and resolved.This classification procedure is clear to some extent subjective, as is much of
Predicting Protein Function by Genomic Data-Mining
231
genome annotation, It is to be expected that some of the classifications of the predictions will be incorrect. However, we believe that the great majority are correct, and statistical arguments ensure that the conclusions we draw in this paper are correct. In annotating the results of our predictions, our strategy has always been to try to err on the side of caution, and not to ‘call’ predictions if there was doubt. The strategy is also conservative in that there are inevitably mistakes in the Riley annotation (incorrectly asserted functions—false positives; and missing functions—false negatives), which will cause the annotated accuracy to be underestimate. We used two methods to find ORFs that had new wet experimental evidence about their function: •
•
We examined the updated Riley annotation for ORFs that were now annotated as having a definite function, i.e. not ‘putative’ etc. The literature on those ORFs was then examined to see if indeed there was direct experimental evidence about their function. We examined the Echobase database. This is a database of E.coli genes characterized since the completion of the genome sequence. For each ORF in Echobase with a characterized function, we checked the literature to confirm the Echobase assignment (for YdeP we found that our prediction was likely to be correct, and a mistake in Echobase had occurred).
To test for the probability of our predictions occurring by chance we used a binomial test, with the probability of success being the probability of the most populous class. With few assumptions, this test is easy to calculate and is guaranteed to give an overestimate. The probability of obtaining this accuracy by chance is given by the following equation. where n is the number of trials, x is the number of successes and p is the probability of success. This test will give an overestimate because we use the probability of the most populous class as our probability of success by chance. In fact, for most of the classes their probability of success is lower than this.
∑
n i= x
( ni ) p i (1 − p )( n −i )
(1)
3 Results Tables 1–4 show the results from the updated annotation. The levels refer to the class levels in the Riley group E.coli functional hierarchy. More than one prediction can be made for an ORF, as ORFs can be involved in more than one function. The results that we show come from voting and simply using all predictions (‘non-voting’). Predictions based on voting rules are those that two rules agree on them at least. We consider that the former is more accurate by filtering out predictions without strongly supported. The non-voting predictions are simply all counted. No ranking of predictions is used in either case—ORFs may have more than one annotation, but all predictions for an ORF may be correct. For example, assume there are three rules to make predictions for an ORF, and these predictions are class A, class A and class B. Counting by voting we would only count the class A prediction, as two rules agree on this at least, whereas counting by non-voting we would count both class A and class B as predictions for the ORF.
232
C. Song and K. Ma
The results in Tables 1–4 are statistically highly significant (see figure legends). The given default accuracy is the accuracy if you predict all ORFs to belong to the most populous class. As expected, the voting strategy produces significantly higher accuracy (correctly predicted ORFs/number of predicted ORFs) than non-voting, but has lower coverage (number of predicted ORFs/total number of ORFs). It is interesting that for the voting strategy the predictions for level 3 are more accurate than those for level 2; and for the nonvoting strategy the level 2 accuracies are higher than level 3. This can be explained by the far greater number of classes and lower default accuracy, in level 3 compared with level 2: it is more difficult to predict at level 3 than 2. For the voting rules, the probability of two or more rules agreeing by chance is much lower for level 3 than level 2. when this does occur, it is possible that the predictions are correct. It should also be stressed again that these accuracies are likely to be underestimates because they are based on the assumption that Riley annotation is complete and correct. However, even if a different function predicted has been determined, this does not absolutely exclude that our prediction is in still potentially correct; so these results are conservative estimates. There is a significant bias in the functional classes that metabolism has confirmed functions, but the reason is unclear. Note the subjective element in some classification: e.g. should an isopentenyl diphosphate isomerase (b2889) be considered part of ‘Energy metabolism carbon’? We have erred on the side of caution and said ‘it isn’t’. The rules that we have previously used as illustrations in publications Table 1. Level 2 voting rules Correct 13 Wrong 1
New correct putative 41 New wrong putative 10
Same correct putative 37 Same wrong putative 1
Evidence for 43 Evidence against 1
No evidence 18 Near miss 1
Of the ORFs with unambiguous function (Correct, Wrong), DMPis 93% accurate. Of the ORFs with newly assigned function (New correct putative, New wrong putative), DMP is 80% accurate. Of all the ORFs with assigned function, DMP is 87% accurate. The default accuracy of the largest class is 19%. The probability of obtaining this accuracy on newly assigned functions occurring by chance is estimated at less than 2.59e−22. Table 2. Level 3 voting rules Correct 4 Wrong 0
New correct putative 15 New wrong putative 1
Same correct putative 5 Same wrong putative 1
Evidence for 2 Evidence against 1
No evidence 1 Near miss 2
Predicting Protein Function by Genomic Data-Mining
233
Of the ORFs with unambiguous function (Correct, Wrong), DMP is 100% accurate. Of the ORFs with newly assigned function (New correct putative, New wrong putative), DMP is 94% accurate. Of all the ORFs with assigned function, DMP is 91% accurate. The default accuracy of the largest class is 5%. The probability of obtaining this accuracy on newly assigned functions occurring by chance is estimated at less than 4.53e−19. Table 3. Level 2 non-voting rules Correct 30 Wrong 7
New correct putative 84 New wrong putative 47
Same correct putative 98 Same wrong putative 51
Evidence for 57 Evidence against 7
No evidence 147 Near miss 7
Of the ORFswith unambiguous function (Correct, Wrong), DMP is 81% accurate. Of the ORFs with newly assigned function (New correct putative, New wrong putative), DMP is 64% accurate. Of all the ORFs with assigned function, DMP is 65% accurate. The default accuracy of the largest class is 20%. The probability of obtaining this accuracy on newly assigned functions occurring by chance is estimated at less than 4.25e−35. Table 4. Level 3 non-voting rules Correct 14 Wrong 16
New correct putative 29 New wrong putative 77
Same correct putative 51 Same wrong putative 23
Evidence for 25 Evidence against 23
No evidence 101 Near miss 29
Of the ORFs with unambiguous function (Correct, Wrong), DMP is 47% accurate. Of the ORFs with newly assigned function (New correct putative, New wrong putative), DMP is 27% accurate. Of all the ORFs with assigned function, DMP is 44% accurate. The default accuracy of the largest class is 6%. The probability of obtaining this accuracy on newly assigned functions occurring by chance is estimated at less than 2.14e−19. (King et al., 2000a, 2001). The accuracies obtained for these rules when applied to the new experimental data are consistent with the original claims. In addition, for rules in Figures 1 and 3 plausible biological explanations are proposed. This gives additional confidence in the rules. It is important to note that these explanations result in testable hypotheses. Although the explanations for the rules in Figures 2 and 4 may be less convincing, nevertheless these rules are found to be empirically successful, so they must reflect some biological causation. The rule does not perform any better than random and we now do not believe that it represents a true biological pattern.
234
C. Song and K. Ma
4 Discussion In this paper, we present strong evidence for one such method, DMP that can accurately predict function. The evidence is in two forms: direct new experimental results taken from the literature on E.coli which confirm the predictions, and new annotations based on new sequences and experimental results in other species which also confirm the predictions. The success of DMP should also increase the confidence in other nonSIM based prediction methods. Comparing SIM and non-SIM based function prediction methods, the advantages of non-SIM based methods are: • • • • • • •
Function can be predicted in the absence of homology to a sequence with known function (King et al., 2000a, 2001; Jensen et al., 2002). More general types of sequence similarity can be utilized which make more remote relationships to be detected. Explicit comprehensible rules can be produced that may provide genuinely novel and unexpected biological insights. The disadvantages of non-SIM based methods are: The biological basis of the methods is hardly understood. The significance level of the predictions of many non- SIM methods is difficult to establish. Non-SIM methods may require SIM methods from which to bootstrap.
We have designed the ‘Genepredictions’ database for protein functional predictions. This database was first designed to hold our E.coli ORF predictions and has been extended to hold predictions of any organism. It is intended to act as a free repository of function predictions. This database can be accessed by anyone who wants information about the possible function(s) of gene without anannotation in the standard databases. The Genepredictions site presents a simple user interface through which the database of gene predictions can be searched. The database holds predictions of ORF functions as well as information about how those functions were predicted. The following criterias can be used in searching for predictions in the database: organism, functional class, single or multiple ORF names, date and institute. In conclusion, the actual function(s) of a gene can only be fully determined by multiple ‘wet’ experiments. However, bioinformatic techniques with accurately predicting function can make such experimental determination simpler. It is clear that test for a high probability hypothesis is more efficient than randomly test for possible functions. We look forward to our predictions and the Genepredictions database becoming a useful tool in functional genomics study.
References 1. Brown, M., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T., Ares, M.: Knowledge-based Analysis of Microarray Gene Expression Data by Using Support Vector Machines. In: Proc. Natl. Acad. Sci., USA, vol. 97, pp. 262–267 (2000) 2. Clare, A., King, R.D.: Machine Learning of Functional Class from Phenotype Data. Bioinformatics 18, 160–166 (2002)
Predicting Protein Function by Genomic Data-Mining
235
3. Danchin, A.: From Function to Sequence, an Integrated View of the Genome Texts. Physica A 273, 92–98 (1999) 4. des Jardins, M., Karp, P., Krummenacker, M., Lee, T., Ouzounis, C.: Prediction of Enzyme Classification from Pprotein Sequence without the Use of Sequence Similarity. In: Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, June 21–26. AAAI, Halkidiki (1997) 5. Aha, D., Kibler, D., Albert, M.: Instance-based Learning Algorithms. Machine Learn. 6, 37–66 (1991) 6. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSIBLAST: a New Generation of Protein (1997) 7. King, R., Karwath, A., Clare, A., Dehaspe, L.: Accurate Prediction of protein Functional Class in the M.tuberculosis and E.coli Genomes Using Data Mining. Yeast 17, 283–293 (2000a) 8. King, R., Karwath, A., Clare, A., Dehaspe, L.: Genome Scale Prediction of Protein Functional Class from Sequence Using Data Mining. In: KDD (2000) 9. King, R., Karwath, A., Clare, A., Dehaspe, L.: The Utility of Different Representations of Protein Sequence for Predicting Functional Class. Bioinformatics 17, 445–454 (2001) 10. Klein, P., Kanehisa, M., DeLisi, C.: Prediction of Protein Function from Sequence Properties: Discriminant Analysis of a Data Base. Biochim. Biophys. Acta 787, 221–226 (1984) 11. Marcotte, E., Pellegrini, M., Thompson, M., Yeates, T., Eisenberg, D.: A Combined Algorithm for Genome-wide Prediction of Protein Function. Nature 402, 83–86 (1999a) 12. Marcotte, E.M., Pellegrini, M., Ng, H.L., Rice, D.W., Yeates, T.O., Eisenberg, D.: Detecting Protein Function and Protein Protein Interaction from Genome Sequences. Science 285, 751–753 (1999b)
Tumor Classification Using Non-negative Matrix Factorization Ping Zhang1, Chun-Hou Zheng2,3,*, Bo Li3, and Chang-Gang Wen2 1
Institute of automation, Qufu Normal University, Rizhao, Shandong 276826, China [emailprotected] 2 College of Information and Communication Technology, Qufu Normal University [emailprotected] 3 Intelligent Computing Lab, Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui 230031, China
Abstract. With the advent of DNA microarrys, it is now possible to use the microarry data for tumor classification. Yet previous works have not use the nonnegative information of gene expression data for classification. In this paper, we propose a new method for tumor classification using gene expression data. In this method, we first extract new features of the gene expression data by virtue of non-negative matrix factorization (NMF) and its extension, i.e. sparse NMF (SNMF) then apply support vector machines (SVM) to classify the tumor samples using the extracted features. To better fit for classification aim, a new SNMF algorithm is also proposed. Keywords: Gene expression data, Non-negative matrix factorization, SVM.
1 Introduction With the advent of DNA microarrys, it is now possible to simultaneously monitor expression of all genes in the genome. Increasingly, the challenge is to interpret such data to gain insight into biological processes and the mechanisms of human disease. Up to now, many studies have been reported on the application of microarray gene expression data analysis for molecular classification of cancer [1, 2, 3].In addition to a broader utility in analysis method, principal component analysis (PCA) [4] is a frequently used and valuable approach in obtaining an up-front characterization of the structure of the data. However, due to the holistic nature of PCA, the resulting components are global interpretations and lack intuitive meanings [5]. Independent component analysis (ICA) [6, 7] is a useful extension of PCA, which has been developed in context with blind separation of independent sources from their linear mixtures [7].Roughly speaking, rather than requiring that the coefficients must be mutually independent. This implies that higher-order statistics are needed in determining the ICA expansion. ICA, and, requires searching for the maxima of a target function in a large-dimensional configuration space. This problem had also been resolved in Chiappetta (2004) [8].A disadvantage of the two methods mentioned above always *
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 236–243, 2008. © Springer-Verlag Berlin Heidelberg 2008
Tumor Classification Using Non-negative Matrix Factorization
237
need to normalize the expression data. To overcome the problem described above, we use a new technique to extract relevant biological correlations, or “molecular logic” in gene expression data which is non-negative matrix factorization (NMF). NMF was first introduced by Lee and Seung [9] in its modern formulation as a method to decompose images which yielded a decomposition of human faces into reminiscent of features such as eyes, nose, etc. In this study, we use NMF to describe the tens of thousands of genes in a genome in terms of a small number of metagenes. A number of algorithms for performing NMF have been used (See [5] for the details of reviews). Here, we shall employ basics NMF and spare non-negative matrix factorization (SNMF), which has been proposed by [5] respectively and proven successful in many applications.
2 Methods In this paper, the method used to classify the gene expression data can be subdivided into two steps: feature extraction (dimensionality reduction) and classification. For feature extraction, NMF as well as SNMF are used. Support vector machines (SVM) is then used for classification. 2.1 Non-negative Matrix Factorization NMF is a linear multivariate analysis method in the following manner: Let A denote a matrix of N × M , each column of which contains an n-dimension observed data vector with non-negative values. In order to compress data or reduce dimensionality, we can find two non-negative matrix factors W and H such that
A WH
(1)
Where W is a matrix of N × k and H is a matrix of k × M . The value of k is smaller than both N and M . We can call the k columns of W basis vectors and the columns of H encoding coefficients. Also we can consider the observed data matrix A as original features, and the encoding matrix H new features based on the basis matrix W . From another viewpoint, if the rows of A represent a m-dimension observed sample vector with non-negative values, then every row of A can be see as a linear mixture of the columns of H , i.e.
a j = w j1h1 + w j 2 h2 +L w jn hn Where
(2)
a j is a row of A , hi is a row of H , w ji is the entry of W . In this model, we
can take for that the k rows of H are basis vectors and the rows of W are encoding coefficients. In this paper, we just use this idea to find a good set of basis vectors (metagenes) to represent gene expression data so that they can be reasonably regularized.
238
P. Zhang et al.
2.2 NMF Models for Gene Expression Data Now we use an N × M matrix A denoted the gene expression data obtained from a typical microarray experiment, Each row represents the expression level of all genes in one sample, and each column represents the expression level of a gene across all samples ( In microarray literature, gene expression data is usually formulated using T
the transposed matrix A ). All the entries in the gene-expression matrix are nonnegative. For gene expression studies, the number M of genes is typically in the thousands, and the number N of experiments is typically is small than one hundred. Our goal is to find a few numbers of metagenes, each defined as a positive linear combination of the N genes. We can then approximate the gene expression pattern of samples as positive linear combinations of these metagenes. To do that, using equation (1), we can factor A into W and H . Generally speaking, H is a k × M matrix, with each of the k rows defining a metagene. Entry hij represents the expression the level of gene j in metagene i. W is a N × k matrix, every row of W represents the metagene expression pattern of the corresponding sample. Entry wij represents the expression level of metagene j in sample i.
Fig. 1. The gene expression data synthesis model. Each sample in the data matrix A (the rows of A ) was considered to be a linear combination of the metagenes expression profiles in the matrix H (the rows of H ).
The NMF model of gene expression data is listed in Fig. 1. In this approach, NMF is used to find the matrix W and H such that we can express the gene expression pattern of each sample as positive linear combinations of the metagenes. After factorization, normalization is performed on the rows of W before using them for classification purpose. This is done by standardizing each row of W to have zero mean and unit square divergence (SD).
3 Experimental Results In this section, we shall demonstrate the efficiency and effectiveness of the proposed methodology described above by classifying three datasets with obvious human tumor samples. After processing the gene expression data using NMF and SNMF, the final step is to classify the data set. In this study, we use support vector machines (SVM), a more recent technique, as the classifier, which should be more suitable to classify the gene expression data considering its little samples. Furthermore, Furey et al. [11] have applied SVM to classify tumors using microarray data.
Tumor Classification Using Non-negative Matrix Factorization
239
3.1 Datasets These paper studies three cancer classification problems, all these datasets comprising two classes. For this purpose, three publically available microarray datasets are used; they are colon cancer data [1] and acute leukemia data [10] respectively. In colon cancer data, all data samples have already been assigned to a training set or test set. In the cases of datasets for which a training set and test set have not been defined in the last two datasets, two-third of each class are assigned to the training set and the rest to the test data. Also, all gene expression dates are positive. An overview of the characteristics of all the datasets can be found in Table 1. Table 1. Summary of the three datasets
Datasets Colon cancer data Acute leukemia data
Training set Class1 Class2 14 26 7 18
Test set Class1 Class2 8 14 4 9
Genes 2000 5000
3.2 Classification Results We now use the proposed methodology to classify the tumor data. Since all data samples in these three datasets have already been assigned to a training set or test set, we built the classification models using the training samples, and estimate the classification correct rates using the test set. We first performed NMF (SNMF) on Atn to produce two matrixes
Wtn and H such that Atn = Wtn H
(3)
Wtn contain the coefficients (representation) of the linear combination of metagene (rows of H ) that comprise Atn . When using SNMF, the sparseHence, the rows of
ness controlling parameter λ of W is set to 0.5. For the test set Att , we can achieve their representations by the following equation:
Wtt = Att / H
(4)
After the representations of the training and test data have been achieved, normalization is performed on the rows of Wtn and Wtt before using them for classification. At last, we used SFFS (Readers who want to know the details about it can refer to literature [12]) and SVM to select metagenes for classification. The numbers of selected features are determined using leave-one-out cross-validation on the training dataset. To obtain reliable experimental results and show comparability and repeatability for different numerical experiments, this study not only uses the original division of each data set in training and test set, but also reshuffles all datasets randomly. In other
240
P. Zhang et al.
words, all numerical experiments were performed with 20 random splitting of the three original datasets, moreover, they are also stratified, which means that each randomized training and test set contains the same amount of samples of each class compared to the original training and test set. We used our proposed method (NMF+SVM, SNMF+SVM, NMF+SFFS+SVM and SNMF +SFFS +SVM) to analyze the two gene expression data sets. For comparison, we also directly using SVM, PCA for feature extraction and SVM for classification (PCA+SVM), and using SFFS for feature selection (PCA+SFFS+SVM) to do the same tumor classification experiment. For each classification experiment, the experimental results gave the statistical means and standard deviations of accuracy on the original data set and 20 randomizations as described above. Since the random splits for training and test set are disjoint, the results may be unbiased. 3.3 Colon Cancer Data When we applied NMF and SNMF to the colon cancer data, we should first selected the value of k which is relate to the result of the experiment. Figure 2 displays the LOO-CV performance with k changing from 2 to 40, the LOO-CV performance is the best when k =10. The test set accuracy with k changing from 2 to 40 is illustrated in Figure 3, the accuracy of k =10 are also better than others. Hence, we took k =10 when we used NMF and SNMF to extract features. When we applied NMF and SNMF to the colon cancer data, we also first selected the value of k which is relate to the result of the experiment. Figure 2 displays the LOO-CV performance with k changing from 2 to 40, the LOO-CV performance is 100 90
LOOCV performance(%)
80 70 60 50 40 30 20 10 0 2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 k
Fig. 2. Illustration of the LOO-CV performance based on method 4 (NMF+SVM) of all the value of k from 2 to 40 100
Accuary(%)
80 60 40 20 0 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 k
Fig. 3. Illustration of the test set accuracy based on method 4 (NMF+SVM) of all the value of k from 2 to 40
Tumor Classification Using Non-negative Matrix Factorization
241
Table 2. Summary of the results of the experiments on colon classification problems (k=10), comprising the LOO-CV performance, the accuracy (ACC) on training and test set
No. of method
LOO-CV performance
ACC on training set
ACC on test set
1 SVM
85.42±2.94
94.38±3.04
87.50±3.43
2 PCA+SVM
86.92±3.09
91.88±2.41
88.26±4.09
3 PCA+SFFS+SVM
87.67±3.90
91.88±2.17
87.12±3.79
4 NMF+SVM
87.71±3.28
91.92±3.09
90.53±3.60
5 NMF+SFFS+SVM
87.83±2.79
91.87±3.22
90.53±3.04
6 SNMF+SVM
86.96±2.80
91.50±4.36
91.19±3.47
7 SNMF+SFFS+SVM
86.04±2.73
91.88±3.39
90.15±3.26
the best when k =10. The test set accuracy with k changing from 2 to 40 is illustrated in Figure 3, the accuracy of k =10 are also better than other. Hence, we took k =10 when we used NMF and SNMF to extract features. The classification results for tumor and normal tissues using our proposed methods are listed in Tables 2. From Table 2 we can see that, the LOO-CV performances of method 4 and method 5 are better than all other methods. On the other hand, the accuracy results of method 4 and method 5 on the test set are also better. For the accuracy on test set, Method 6 may be the best one. 3.4 Acute Leukemia Data Set When we applied NMF and SNMF to the acute leukemia data, we also should first select the value of k , We take k =10 as the same in the former dataset. The classification results are illustrated in Table3. Table 3. Summary of the results of the experiments on leukemia data classification problems (k=10), comprising the LOO-CV performance, the accuracy (ACC) on training and test set
LOO-CV performance
ACC on training set
ACC on test set
1 SVM
93.89±4.22
100±0.00
95.18±4.05
2 PCA+SVM
94.22±4.52
100±0.00
92.60±3.39
3 PCA+SFFS+SVM
92.72±3.29
100±0.00
93.86±3.29
4 NMF+SVM
94.67±3.46
100±0.00
96.58±4.05
5 NMF+SFFS+SVM
93.13±1.40
99.17±1.76
97.44±3.84
6 SNMF+SVM
94.00±3.42
99.56±1.33
96.58±4.05
7 SNMF+SFFS+SVM
93.75±2.16
99.17±1.76
96.58±4.05
No. of method
242
P. Zhang et al.
From this Table we can see that, for the accuracy on the test, NMF based dimensionality reduction is useful for classification. Among them, method 5 is the best one. This data set clearly comprises an easy classification problem, since the variances on the results caused by the randomizations are quite small compared to other data sets.
5 Conclusion NMF and SNMF have been used successfully in image analysis, text clustering and cancer class discovery and classification. In this paper, we presented NMF based methods for the classification of tumors based on non-negative microarray gene expression data. The methodologies involve dimension reduction of high dimensional gene expression data using NMF as well as SNMF, followed by the feature selection using SFFS and the classification applying SVM. We have compared the experimental results of our method and other three methods, the results show that our method is effective and efficient in predicting normal and tumor samples from three human tissues. Furthermore, these results hold under re-randomization of the samples. The challenge that remains is to define the value of k , which is related to the result of this experiment. Acknowledgements. This work was supported by the grants of the National Science Foundation of China, No. 30700161, China Postdoctoral Science Foundation, No. 20070410223, and Scientific Research Startup Foundation of Qufu Normal University, No. Bsqd2007036.
References 1. Alon, A.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. Proc. Natl Acad. Sci. 96, 6745–6750 (1999) 2. Bittner, M.: Molecular Classification of Cetaceous Malignant Melanoma by Gene Expression Profiling. Nature 406, 536–540 (2000) 3. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support Vector Machines Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinformatics 16, 906–914 (2000) 4. Hoyer, P.O.: Non-negative Matrix Factorization with Sparseness Constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004) 5. Gao, Y., Church, G.: Improving Molecular Cancer Class Discovery Through Sparse Nonnegative Matrix Factorization. Bioinformatics 21, 3970–3975 (2005) 6. Huang, D.S., Zheng, C.H.: Independent Component Analysis Based Penalized Discriminate Method for Tumor Classification Using Gene Expression Data. Bioinformatics 22, 1855–1862 (2006) 7. Comon, P.: Independent Component Analysis— A New Concept. Signal Processing 36, 287–314 (1994) 8. Chiappetta, P., Roubaud, M.C., Torresani, B.: Blind Source Separation and the Analysis of Microarray Data. Journal of Computational Biology 11, 1090–1109 (2004)
Tumor Classification Using Non-negative Matrix Factorization
243
9. Lee, D.D., Seung, H.S.: Learning the Parts of Objects by Non-negative Matrix Factorization. Nature 401, 788–793 (1999) 10. Brunet, J.P.: Metagenes and Molecular Pattern Discovery Using Matrix Factorization. Proc. Natl. Acad. Sci. 101, 4164–4169 (2004) 11. Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support Vector Machines Classification and Validation of Cancer Tissue Samples Using Microarray Expression Data. Bioinformatics 16, 906–914 (2000) 12. Frank, I.E., Friedman, J.H.: A Statistical View of Some Chemometric Regression Tools. Technimetrics 35, 109–143 (1993)
A Visual Humanoid Teleoperation Control for Approaching Target Object Muhammad Usman Keerio1, Altaf Hussain Rajpar2, Attaullah Khawaja3, and Yuepin Lu4 1
Dept. of Electrical Engineering, QUEST Nawabshah Pakistan Dept. of Mechanical Engineering QUEST Nawabshah Pakistan 3 Dept. of Electrical Engineering NED Karachi Pakistan 4 Dept. of Mechatronic Engineering Beijing Institute of Technology, Beijing,100081, China [emailprotected] 2
Abstract. Video information from camera or robot vision is not enough for some typical applications for humanoids like telesurgery or to pick up an object. The operator should know the accurate location of the robot and its target, VR technology makes it possible to monitor the robot based on virtual scene, to get the vision and location information. In this paper a visual humanoid BHR-2 teleoperation system using software Maya is developed for creating a realistic simulation environment, to observe the details of robot environment. In the current application, the BHR-2 performs a task of approaching target object might be a moving object using visual teleoperation control; helps humanoid robot to work safely and accurately even in a dark environment. The effectiveness of the proposed controlling technique is shown by simulations. Keywords: Virtual Reality, Teleoperation, Humanoid.
1 Introduction By teleoperation a supervisor can command the robot either from remote site or monitoring the interaction of the robot with environments [1]. To align the object for teleoperated object manipulation, the most important sensing mode is vision feedback. By virtual Environment (VE) techniques a 3D environment of a teleoperation worksite can be provided to visualize the robot model and corresponding objects and the movement for controlling a real robot. The failure of control algorithm or failure due to sensor noise or control errors can be checked under different conditions. Using teleoperation the perceptions from a physically remote environment can be conveyed to the human operator in a realistic manner and by virtual reality technology the perceptions from a simulated environment are conveyed to the user. Thus, teleoperation and virtual environment communities share many of the same user interface issues but in teleoperation the need for detailed world modeling is less central [2], [3].Some researchers have focused on building virtual models of the robot and rendering their configuration [4], [5].In the literature, mainly the work belongs to grasp the objects, like in [6] the software used as a graphical user interface for directly D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 244–251, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Visual Humanoid Teleoperation Control for Approaching Target Object
245
controlling or interacting with a robot Humanoid H6 operating in the real world. The application includes carrying an object from cupboard using robot vision system. Sensing and actuation is noisy and uncertain in robot domains resulting in partial knowledge about the world, so it is necessary that robots can be teleoperated to complete task even in dangerous environment. Research on the virtual reality based teleoperation for interacting the object, like [7], [8] etc, describe the virtual telerobotic control for remotely controlled excavation systems, for applications in areas such as toxic waste removal and mining. In [9] Virtual reality simulation for surgery and scientific experiments is discussed. The methods applied to interactive applications are not suitable for moving objects and rather limited for VEs. In this paper, we want to develop a model based virtual telerobotic control system, showing a complete simulation environment for the BHR-2 dynamic humanoid robot. Maya is selected for this work which provides remote viewing. It has the model function and rendering function. It has the interface to add the function of data processing or other [10], [11]. Using proposed virtual reality based teleoperation control, the human operator is able to interact with the models of the remote world, on the basis of which commands intended for the real remote system, can be formulated, rehearsed and ultimately transmitted to the remote site, for an accurate simulation and interact with the world.
2 Overview of Teleoperation System The height and the weight of the humanoid robot “BHR-2” is 160cm and 63kg respectively, it consists of a head, two arms and legs, and has total 32 DOF (Degrees of Freedom). The humanoid robot BHR-2 consisting of stereo cameras and stereo microphone and speakers in the head, torque/force sensors at wrists and ankles, acceleration sensors and gyro sensors at the trunk. The two computers are built in robot body, one is for motion control, another for information processing (such as images processing, objects characters identifying and so on) and transferring data with remote cockpit [12], [13]. The two computers are connected with a mass memory for data sharing, called memolink. When BHR-2 receives instructions from the remote cockpit via wireless LAN, or independently acts according to its vision or perceives other kinds of exoteric information such as robot head tracking a moving object by its view, the first step of BHR-2 work is that one of its computers, which is for information processing and exchanging data, disposes these information and writes the results into memolink. The second step is that the motion control computer reads the data from memolink calculates and generates the values of motion trajectory which will be used to control corresponding DC motors. The control system of BHR-2 is a real-time position control system based on RTLinux operating system. There are 4 kinds of feedback in the system which are: body sensors data of the robot, feedback by the robot vision system, real scene of the overall workspace and virtual scene monitoring system based on virtual reality [13], [14] ( See fig. 1). For walking control, the basic walking trajectory data are obtained off-line and stored in the control computer. The operator input the walking instruction to the robot by keyboard/ joystick.
246
M.U. Keerio et al.
The two orders: loose and hold tightly (object) can be executed by the hand of BHR-2. To control the robot hand remotely, two graphic symbols are designed: one order is to open and the other is to close. Operator can press one of the two symbols by mouse to generate the instruction. A master arm has been designed which has similar mechanism to that of human and the humanoid robot arm to manipulate/ control robot arm when it is teleoperated [12].
Fig. 1. Working sketch map of the remote cockpit
3 To Create Target Geometry in Maya 3.1 Virtual Robot A virtual skeleton model like the robot has been developed. After building the skeleton system, the surface of the robot has been built by the Maya Tools, attached on the skeleton, then whole robot model built completely. As the body of the robot is rigid, with the help of coordinates data of 3 or 4 markers which is attached on the robot body; we can calculate the position and the attitude data of the robot body. An algorithm is adopted to determine the position and attitude of the robot body. We have built a plug-in for Maya to obtain the motion data from a data file which is updating in time by the teleoperation platform feedback module [14]. 3.2 Virtual Furniture The modeling operations are used to make 3D shape of table, chair etc include: Sculpting (with the NURBS or Polygon sculpting tool or by moving vertices, faces, CVS, or edit points), lofting, revolving (lathing), and extruding [15]. 3.3 Rendering/Animation of 3D Objects The animation of a rigid body can be defined as arrangement of two transforms or a hierarchy of transforms. A transform is a way to position and orientate an object. It is done easily using a matrix of (1) that contains both of the operators.
positionM atrix * rotationM atrix .
(1)
A Visual Humanoid Teleoperation Control for Approaching Target Object
247
See for derivation [16].We can analyse the transform, denoting a frame as one that relates B’s coordinates pB into A’s coordinates p A B
p A as (2)
= BA Xp . B
A
X = ABT BA R .
(3)
A
The translation matrix BT contains the vector offsetting the origin of the coordinate A
system B, expressed in A. Inspecting the rotation matrix B R ; it is the basis vectors of coordinate system B expressed in A. Inversions give the final expression
( BA X ) −1 = BA R −1 ABT −1 .
(4)
When rendering, Maya takes into account all the various objects and scene attributes, and performs mathematical calculations to produce the final image or image sequence. Once we render a sequence of images, we can then play them back in sequence, producing an animation. Rendering involves many components to produce a final image: • Necessary steps are utilized while drawing the 3D polygons into a 2D image as follows. 1) 3D modeling transformations are represented by 4x4 Matrices for scale, rotate, translate, shear, etc. For example to rotate around Z-axis the following matrix shown in (5) is used. ⎡cos θ ⎢ sin θ P =⎢ ⎢ 0 ⎢ ⎣ 0 /
− sin θ cos θ 0 0
x⎤ 0 y ⎥⎥ . 1 z⎥ ⎥ 0 1⎦
(5)
2) Viewing Transformation is used to transform into 3D camera coordinate. We have the camera (in world coordinates). Camera transformation is used to map 3D world coordinates to 3D camera coordinates 3) Projection Transform is used to map 3D camera coordinates to 2D screen coordinates. For example for perspective projection, following matrix shown in (6) can be used. ⎡ ⎢ ⎢ P = ⎢ ⎢ ⎣⎢
1 0
0 1
0 0
1 1
d
x ⎤ y ⎥⎥ z ⎥ ⎥ o ⎥ ⎦
. (6)
This “division by d” showing the size varies inversely with distance. 4) Window-to-View port Transformation is used for drawing pixels (includes texturing, hidden surface, etc.) and to clip primitives outside camera’s view.
248
M.U. Keerio et al.
We have the camera (in world coordinates), taking objects from world to camera, translation matrix as shown in (3) can be used and taking objects in camera to world, inverse of translation matrix can be used as shown in (4) . Data structures that are used to record the geometrical information for the environment include the shape of the objects in the environment, their moving parts and physical properties. Data-server device is used to get this data. 3.3.1 Animating an Object To animate a ball (target object), for instance, to move a particle object's position smoothly along a curve, the hermite function is used; the advantage is that we can create various curve shapes by altering the arguments to the hermite function. One important application of such planes is for restricting 3D movements of simulated or real objects within a 3D real-world video scene. Suppose we want to create an object named ball of one particle at the origin. To guide its motion along a short upward-bound curve for the first four seconds of animation, we can write the following runtime expression (7): (see fig.2) ball.position = hermite(<<0,0,0>>,<<2,2,0>>, <<3,0,0>>, <<0,3,0>>, linstep(0,4,time)) .
(7)
Now changing the fourth argument of the expression (7) to <<0,-3,0>>:The particle (ball) can move in a pattern resembling a half-circle as shown in fig. 3.
Fig. 2. Particle’s position in a curve
Fig. 3. Moving pattern in half circle
Fig. 4. Position marker showing object location
Fig.3 shows, The tan1 vector <<0,3,0>> for start point to positive Y direction and The tan2 vector <<0,-3,0>> for negative Y direction as it approaches the end point. Fig. 4 shows a position marker appearing on the path curve indicating that a key frame has been set. The object can be repositioned along the curve. The position marker is useful for determining where the object is at a given time, so the body can
A Visual Humanoid Teleoperation Control for Approaching Target Object
249
easily be tracked. Position markers do not appear when we render the animation. For more details [10]. 3.4 The Dynamics Simulation Real-world physical interactions between objects, such as collisions between surfaces can be simulated with the help of Rigid Body Dynamics. Maya provides a means to do this type of computer animation. For example, we can simulate a bowling ball bounces on the surface table or simulate the effects of gravity when a ball falls to the surface of table as shown in fig.5. Rigid bodies have a centre of mass and a mass distribution. They can tumble and rotate about their own axes [17]. A rigid body’s state can be given as (8) ⎡ x(t ) ⎤ ⎢ R(t ) ⎥ . ⎥ Y (t ) = ⎢ ⎢ L(t ) ⎥ ⎢ ⎥ ⎣ω (t ) ⎦
(8)
The x(t ) is position vector. The rotation matrix R(t ) is used for orientating the body. Linear momentum L(t ) is the mass of the body times the linear velocity v(t ) & ω (t ) is angular momentum. The robot can find a target object by the vision in real time as shown in fig. 6. Object manipulation approach has been used in which object location is detected through stereo vision, pre-reaching and alignment of the object with hand is based on image features [18].
Fig. 5. Target object simulation
Fig. 6. Object manipulation process
4 Controlling System for Virtual Scene In this virtual reality based system, the real-time body sensor data and the motion data are transferred to teleoperation platform. By the real-time data fusion module, these feedbacks will be processed to integrated data which the 3D interface can render. Order Simulation Generation Module has been developed which includes all the information like robot whole body order which comprises of the motion type, and motion parameter information. It is the task space order [14].The order data are sent to the virtual scene first. The data will be rendered here as a predictive scene. By the teleoperation platform, the data of the joint angle will be sent back to the operator at
250
M.U. Keerio et al.
Fig. 7. BHR-2 walking
Fig. 8. BHR-2 approaching target object
real time. These data represent the motion status of the robot at real time. By rendering these data, the operator can monitor the robot motion state. When the robot executes the task, all the real-time sensor data which are rendered in the virtual scene are on file. All these data can be rendered at any time. The motion capture system can obtain the rigid body motion data at real time. Fig.7 shows the robot starts walking for a target object. Fig.8 shows robot approaching the target object which is close to his hand. For carrying an object in real time, graphic symbols or text numbers are used as described in section 2.
5 Conclusions We have analysed the behavior and the performance of a robot’s simulated activity based on a task like approaching to moving object using visualize teleoperation control. It can help: 1) Humanoid for approaching target object safely and accurately. 2) Collision detection and correction for possible grasping of an object This simulation environment will help researchers to implement and study different behaviors of the robot to perform a target task.
References 1. Sheridan, T.B.: Musings on Telepresence and Virtual Presence. Presence 1(1), 120–125 (1992) 2. Papasin, R., Betts, B.J., Del Mundo, R., Guerrero, M.: Intelligent Virtual Station. In: Proc. 7Ih Int. Symp. On Artificial Intelligence, Robotics and Automation in Space, Nara, Japan (2003) 3. Huang, A., et al.: Interactive visual method for motion and model reuse. In: Proceedings of the 1st international conference on Computer graphics and interactive techniques in Australasia and South East Asia, pp. 29–36. ACM Press, New York (2003) 4. Kofman, J., et al.: Teleoperation of a Robot Manipulator Using a Vision-Based HumanRobot Interface. IEEE transactions on industrial electronics 52 (2005)
A Visual Humanoid Teleoperation Control for Approaching Target Object
251
5. James, J., et al.: Graphical Simulation and High-Level Control of Humanoid Robots, In: the proceeding 2000 IEEE RSJ Int’I conf. on Intelligent Robot and Systems (2000) 6. Boulic, R., Mas, R.: Hierarchical Kinematic Behaviors for Complex Articulated Figures. In: Magnenat-Thalmann, Thalmann (eds.) Advanced Interactive Animation. Prentice Hall, Englewood Cliffs (1996) 7. Milgram, P., Ballantyne, J.: Real World Teleoperation via Virtual Environment Modeling. In: Proc. International Conference on Artificial Reality & Tele-existence ICAT 1997, Tokyo (1997) 8. Ballantyne, J., Greenspan, M., Lipsett, M.: Virtual environments for remote operations. In: ANS 7th Topical Meeting on Robotics & Remote Systems, Augusta, Georgia (1997) 9. Barnes, B., et al.: Virtual Reality Extensions into Surgical Training and Teleoperation. In: Proc. of the 4th Annual IEEE, Conf. on Information Technology Applications in Biomedicine, UK (2003) 10. Maya, A.: Computer Program, http://www.alias.com 11. Gould, et al.: Complete Maya programming: an extensive guide to MEL and the C++ API. Morgan Kaufmann Pub., San Francisco (2003) 12. Zhang, L., Huang, Q., Zhang, W.: A Teleoperation System for a Humanoid Robot with Multiple Information Feedback and Operational Modes. In: Proceedings of the 2005 IEEE International Conference on Robotics and Biomimetics, pp. 290–294 (2005) 13. Liu, Q., Huang, Q., Zhang, W., et al.: Manipulation of a Humanoid Robot by Teleoperation. In: Proceedings of the 5th World Congress on Intelligent Control and Automation, June 15-19, 2004, pp. 4894–4898 (2004) 14. Zhang, L., Huang, Q., Lu, Y., Jiapeng, Y., Keerio, M.U.: A Visual Tele-operation System for the Humanoid Robot BHR-2. In: Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China (2006) 15. Keerio, M.U., Huang, Q., Gao, J., Lu, Y., Yang, J.: Virtual Reality Based Teleoperation Control of Humanoid Robot BHR-2. In: Proceedings of the 2007 IEEE International Conference on Mechatronics and Automation, Harbin, China (2007) 16. Craig, J.: Introduction to Robotics: Mechanics and Control. Addison-Wesley, Reading (1989) 17. Baraff, D., et al.: Rigid Body Simulation I - Unconstrained Rigid Body Dynamics SIGGRAPH 1997 Course notes, D1-D31 (1997) 18. Rajpar, A., Huang, Q., Pang, Y.: Location and Tracking of Robot End-effector Based on Stereo Vision. In: Proceedings of the 2006 IEEE International Conference on Robotics and Biomimetics, Kunming, China (2006)
An Intelligent Monitor System for Gearbox Test Guangbin Zhang1,2,3, Yunjian Ge1, Kai Fang1, and Qiaokang Liang1,2 1
Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui, China 2 University of Science and Technology of China, Hefei, Anhui, China 3 Department of Computer Engineering, Anhui University of Architecture, Hefei, Anhui, China [emailprotected], [emailprotected], [emailprotected], [emailprotected]
Abstract. We have developed an intelligent monitor system for gearbox test for a known automobile enterprise, aiming at gearbox test-online had been an important part of auto industry production pipeline. A test of automobile gearbox based on expert system, neural network and an alternating current motor was established. The design of the system could effectively improve the precision of control and information integrity. While it could reduce the energy economization compared to the regular one at the same time. Firstly, the architecture of the test system and the user interface are presented in this paper. Then the work principles of the system is described, at last the software structure is elaborated. Keywords: gearbox, monitor, expert system, neural network.
1 Introduction During the past ten years increasing interest has been focused on the test of automobile gearbox since the automobile industry has gained enormous popularity in China. The test of automobile gearbox used by enterprises has been usually based on a single host and needed judgment of an experienced technician. However a number of problems have thus far restricted the use of this kind of test. Four of the major problems are difficulties in precise control, energy economization, intelligent operations and information integrity. In this study an intelligent test bed system of automobile gearbox based on multihosts, expert system, neural network and an alternating current motor was established, and at the same time the vector control was used to improve precision. This test bed was developed for Anhui Jianghuai Automobile Co., Ltd. (JAC). The bed got the award in building through the joint efforts of the Anhui government and Chinese Academy of Sciences, and now, the bed has been running well. The most important innovations of this bed are as follows: (1) Intelligent operation in shift action and fault diagnosis. (2)The energy consumption that could be reduced to only 10% compared to the regular test made a good response to policy of economy society. (3) The error of moment control and rotation speed restricted to 0.5% and 0.05% could provide accurate product performance parameters and decision-making for the principals of enterprise. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 252–259, 2008. © Springer-Verlag Berlin Heidelberg 2008
An Intelligent Monitor System for Gearbox Test
253
2 System Architecture 2.1 System Requirement The requirement of enterprise was that the test bed had to satisfy the need of on-line test and could be used for actual five types of gearbox, further, for extended types. The system could fulfill the test of integrated performance, drive efficiency and fatigue. For the reliability and extension of the test bed, the system was designed to run on any configuration software such as Kingview or WinCC. This kind of configuration software either provided many tool boxes such as neural network box or provided application program interface (API) for other programs. At the same time, any communication should be happened between a remote client and a host, but the message was secured through encryption because the information of test had business worth. In addition, the very important point was that the test could not bring impact on gearbox, i.e. the shift action should be softness and shift strategy should be intelligent. 2.2 System Structure Based on enterprise requirement and physical modeling, the test bed system was constituted by the mainly four parts: (1) A host for monitor, (2) controlling equipments, (3) mechanism and electric equipments, (4) remote clients. Figure 1 shows the architecture of the system with all components. The host for monitor was an industrial computer, included digital I/O cards, simulative I/O cards and 232-485 conversion cards. It was the control core of the system.
Fig. 1. Architecture of the system
254
G. Zhang et al.
Controlling equipments consisted of PLC (programmable logic controller) and transducers for the control of shift manipulator and input or load electric facilities. Mechanism and electric equipments included the motors for input and load, shelves involving driving settings, clutch, automatic shift manipulator device and a series of sensors for rotate speed, pull, stress, torque and temperature. The client was any computer in the LAN (local area network) or Internet that could monitor in a long distance. There were some assistant equipments such as the control box on the spot for debugging, system cooler, the secondary meter, and the motor for gearing joint.
3 Work Principles The system had three work modes, as follows: automatic test, manual test and fatigue test. In automatic test, there were three sub-modes: full automatic test, step shift test and transmission efficiency test that could be selected by a user. While in manual test, a user was provided computer control manual test and spot manual test. In these six modes, full automatic test was the most work type. Furthermore, the principle of this test was the most complicated one. 3.1 Full Automatic Test In the whole procedure of this test, the system was fully controlled by computer. At the first beginning, the industrial computer controlled two alternating current motors by sending running parameters to two transducers. In this period, the tested gearbox worked in load mode; the input motor ran in fixed rotate speed, the load motor ran in fixed torque. So the system provided a steady work status to gearbox and simulated true load procedure. After a scheduled time, the system began to a shift procedure. In a shift procedure, by communicating with PLC through communications card, the industrial computer had controlled the automatic manipulator that executed shifting operation. The increase or decrease of motor speed and torque had a slope that using Newton interpolation method for avoiding impact on gearbox in a shift procedure. In a whole test, data information of speed, pull, stress, torque and temperature was showed in instrument panels and the display of the industrial computer. Indeed, an entitled user could browse each parameter in a remote office or shop via network. For the safe reason, when speed or torque was higher than initialization, or other danger signal happened, the system would raise the alarm with a bell and light, and would take safeguards such as an urgent stop or natural stop based on preplan. 3.2 System Communication The communication of the host with other devices was the main communication form. There were four main kinds of communication in this system: (a) host with transducers, (b) host with PLC, (c) host with sensors, (d) host with remote clients. In this paper, the communication of host with transducers was introduced because of its representative communication protocol. The host controlled two Siemens 6SE70 transducers by using USS (universal serial interface protocol).The electrical level had to be translated to TTL because of the serial port of host using RS232, so the host had two RS485 communicated cards. The structure of USS bus was master-slave mode
An Intelligent Monitor System for Gearbox Test
255
Fig. 2. Data unit of USS protocol
with only one master station, and the communication between host and transducers was realized through polling mode. Protocol data unit was usually composed of 14 bytes, including 3 bytes header, 1 byte check sum and 10 bytes data block. Figure 2 shows the structure of protocol data unit. STX was beginning character and its value was fixed 02Hex. LGE was length character and its value was the latter bytes. ADR denoted the address of slave station such as transducer. PKW was the domain of parameters and it was used to set values of transducer parameters. PZD was data domain of process control. PZD1 was used to set or detect work state of transducer such as run, down, direction. PZD2 could be used to set or detect frequency of motor and its sign showed positive and negative rotational direction.
4 Software Structure The software of the system was divided into two parts: main control software of the host and assistant control software. The running state of the system was mostly controlled by the main control software, and then the main control software was very complex, so it was presented on this condition. The structure of the main control software is laying out in Figure 3.The main interface module was application frame of the entire software, and user could call other modules based on selected function in this module. Otherwise data processing module could run independently and could be used in other systems. 4.1 Shift Control Module The main function of shift control module was to achieve every specific step of shift procedure. So each sub-function would control hardware or judge signals, and moreover, actually the design was concretion of shift procedure. The two important subfunctions were clutch operation and shift manipulator operation.
Fig. 3. Structure of the main control software
256
G. Zhang et al.
4.1.1 Clutch Operation Sub-function When this sub-function was used, it had two parameters: open and close. If the parameter was open, the system executed following operations. The signal of clutch status was been open for a spell, and then the system detected the value of the signal. If the value were not open, then the system would set close signal for avoiding jam of dust in clutch. Usually, when the value of the signal was neither open nor close, it might be that the jam made clutch half-open, and the repeat of close and open would eliminate this fault phenomenon. If the status of clutch still was not open after three operations, then the system would give an alarm. When this kind of alarm happened, the user needed examine digital servomechanism. If the parameter was close, the system executed contrary operations of open. 4.1.2 Shift Manipulator Operation Sub-function First of all, the status of clutch must be open, and then system would set the value of shift signal based on expert system. The test of gearbox should simulate actual working conditions as possible, so expert system was used in the controller of shift manipulator .Expert system was usually constituted of four parts: information acquisition, knowledge base, inference engine, and control decision. Figure 4 shows the composition of the expert system of this test bed.
Fig. 4. Composition of the expert system
In this system, information acquisition part not only included actual input and output speed, input and output torque, actual gear, clutch status, and manipulator status, but also many setting values. The above seven data values could get a kind of compound representation of current situation after feature extraction and information processing. Expert experience and knowledge were not only qualitative, but also quantitative knowledge. And production rules were used to present knowledge base, such as: IF
An Intelligent Monitor System for Gearbox Test
257
impact on gearbox and test-bed. (2) The system should count ratio of input speed and output speed, then compare the counted ratio with the reference value of actual gear ratio. If the percent error was in allowable range, the system could just return shift success. (3) If shift operation was not success, the system should recall clutch manipulation sub-function to close clutch. When clutch status was close, the system should shift again. In the event that system tried three times unsuccessfully, it would give an alarm of shift. The repeat could avoid false gear and fault of designated position. Matching-triggering mode was used by inference engine. So in every shift manipulator operation, inference engine searched knowledge base and got the manipulator operation sets. Then the control decision part output these control signals according to rules. 4.2 Data Processing Module This module included four components: condition query, data analysis, data report, and performance curve. Among them, data analysis was the most complex part and it judged quality of gearbox automatically by analyzing the test data. At the same time the test data amount was very much about a shift procedure and a load procedure, moreover a gearbox had ten shift procedures and five load procedures at least. So BP neural network was used in data analysis. 4.2.1 Data Analysis Bp network means Back-Propagation network. This BP network of the system was a sort of three layers feed-forward neural network. The sigmoid function was generally −s
used as the transform function, it was expressed as: f ( s ) = 1 /(1 + e ) . The shift force, shift time, synchronization time, synchronization impulse, input speed, output torque, oil temperature and driving efficiency of gearbox were used to estimate the quality status of the gearbox. The foresaid eight input nodes of BP neural network came from the test signals of sensors and compound calculation amount, such as synchronization impulse. This BP neural network was made up of one input layer, one output layer and one hidden layer of nodes. And it had been proved that the three
Fig. 5. Structure of the neural network
258
G. Zhang et al. Table 1. Sample data and forecasting results
samp 1 2 3 4 5
shift shift force time 104 101 200 85 300
0.7 0.7 1.2 0.6 10
sync time
sync input output impulse speed torque
0.46 0.45 0.8 0.45 0
64 62 180 48 0
300 2500 2500 3000 3000
oil driving results temp efficiency
1000 800 600 400 400
60 70 70 75 78
0.72 0.80 0.62 0.82 0
normal normal flaw good bad
layers BP neural network model could make a correct judgment. Only one judgment result of the output layer was effective. This BP neural network structure is showed in Figure 5, and there are eight inputs, four outputs, one hidden layer with ten nodes. Table 1 is part of the samples of not normalizing feature data, the forecasting results accord with the actual gear status. But, it was emphasized that the forecasting results were about only one shift, and the judgment of gearbox should include results of all shifts. 4.2.2 Performance Curve Performance curve mainly expressed the shift procedure performance and load performance and the curve was used by user or expert system to judge performance of gearbox. FFT (Fast Fourier Transform) was the basis of curve-fitting. Of course the curve also could be used to verify correctness of data analysis via comparing curve with results of data analysis procedure by skilled users. Figure 6 shows one shift performance curve of one gearbox.
100
The force (N)
80
60
40
20
0 0.0
0.1
0.2
0.3
0.4
0.5
0.6
The tim e (s)
Fig. 6. One shift performance curve
5 Conclusion This system could fulfill gearbox test-online, and suit five types of product. At the same time, if the enterprise needed, the system could extend to test new types with
An Intelligent Monitor System for Gearbox Test
259
few changes. The system could execute many test items, such as integrated performance, driving efficiency, fatigue test, step shift. After the enterprise had used the system for a period, the enterprise and we thought that the system had following merits: The system had high security, high stability, and high intelligence because it utilized advanced electric mechanism, expert system, flexible test technology and neural network. The system had friendly and convenient user interface, and provided remote monitor function. Meanwhile, the system could be integrated in CIMS (Computer Integrated Manufacturing System) of enterprise by data interface. Based on automobile test mode of national standard, the system had every kind of test template, and could provide information support for product analysis, market forecast and decision management.
①
②
③
Acknowledgments. This project is sponsored by State 863 Projects (2006AA04Z244) and Foundation for Young Teachers of Anhui Province in China (2008jql085).
References 1. Su, W.Y., Song, Z.H.: Design Stills of Process Control Software Based on InTouch. Process Automation Instrument 23(6) (2002) 2. Peng, Z.K., Chu, F.L.: Application of the Wavelet Transform in Machine Condition Monitoring and Fault Diagnostics. Mech. systems and signal processing 18 (2004) 3. Yang, S.L., Li, W.H., Zhen, H.: Intelligent Condition Monitoring and Fault Diagnosis of a Gearbox Based on Artificial Neural Network. In: The Eighth International Conference on Electronic Measurement and Instruments (2007) 4. Shao, Y.M., Zhou, X.J.: Detection of Gearbox Deterioration Using an Evolutionary Digital Filter. In: SICE-ICASE International Joint Conference, Korea, October 18-21 (2006) 5. Shyyab, A.A., Kahraman, A.: Non-Linear Dynamic Analysis of a Multi-Mesh Gear Train Using Multi-Term Harmonic Balance Method: Sub-Harmonic Motions. Journal of Sound and Vibration 279 (2005)
Development of Simulation Software for Coal-Fired Power Units Based on Matlab/Simulink Chang-liang Liu1, Lin Chen2, and Xiao-mei Wang3 1
North China Electric Power University, Control Theory & Control Engineering Dept, 071003 Baoding, China [emailprotected] 2 North China Electric Power University, Systems Engineering Dept, 071003 Baoding, China [emailprotected] 3 North China Electric Power University, Control Theory & Control Engineering Dept, 071003 Baoding, China [emailprotected]
Abstract. Modelling and simulation is an important method in the study and design of power units. Because of the complexity of coal-fired power units, it is necessary to develop a kind of simulation software which can reflect the dynamic characteristic comprehensively. Generally, the professional simulation software of coal-fired power unit such as simulator made by professional company is very expensive and complex. In recent years, Matlab/Simulink has been applied to many research fields successfully, such as control, communication, modelling etc. In this paper, a simulation algorithm library of boiler system was built on Matlab/Simulink. The algorithm was developed with CMEX Sfunction, mask interface, MS-Function, Memory block, masked block and so on. Simulation model of a 1025t/h boiler system was constructed by the organic combination of algorithm in library. It is shown by simulation results that the algorithm library is comprehensive and universal; the dynamic characteristic of coal-fired power unit model is consistent with objects. The model can easily be applied to study control systems of power units because of the advantage of Matlab/Simulink.
1 Introduction Boiler system is a very important part of coal-fired power plant. The dynamic performance of boiler will mainly determine the performance of power unit. One of the most effective means to enhance boiler efficiency is to improve the control system. A valid boiler model is essential for such an improvement. A number of dynamic models which used to predict the behavior of boilers can be found in literature [1-4]. These models were used for dynamic simulation or controller synthesis. Because of the complexity of coal-fired power units, it is necessary to develop a kind of simulation software which can reflect the dynamic characteristic comprehensively. Generally, the professional simulation software of coal-fired power unit such as simulator made by professional company is very expensive and complex. It goes beyond the D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 260–267, 2008. © Springer-Verlag Berlin Heidelberg 2008
Development of Simulation Software for Coal-Fired Power Units
261
commonly researcher’ purchasing power. In addition, it is not convenient to study new modelling method, optimization algorithm, control strategy such as genetic algorithm, immune arithmetic, fuzzy control, predictive control, neural network control etc. Matlab/Simulink is a popular simulation software which has been applied to many research fields successfully, such as control, communication, modelling etc. So, it is necessary to develop a simulation software on Matlab/Simulink platform which is universal, convenient and with friendly user interface. This paper provides a method of developing the simulation algorithm library of boiler system based on Matlab/Simulink with CMEX S-function. Matlab is an efficient and universal platform for developing simulation software, its capability can be extended with Simulink. The simulating environment of Simulink provides the method of interactive graphic modular modelling [5][6] and embeds excellent calculous algorithm. In addition, its control system toolbox which is abundant and comprehensive has reached every field of the control system. Algorithm Library which is built based on Matlab is more comfortable for spreading and avoids repetitive modelling. On the other side, it is convenient to carry out the genetic algorithm, immune arithmetic and other complex arithmetic which can be found in Matlab used in control system.
2 Design of Simulation Algorithms Library 2.1 System Compartmentalization The coal-fired boiler of power unit is very complicated system which is composed of many equipments and subsystems. It is essential to classify the boiler into many parts by function and principle before modelling. First, with the principle of modularization, decompose the boiler system into some basic module with independent function and clear concept which can be designed, coded and debugged separately. Although the boiler system is very complicated, a sort of equipment or subsystem which has the same operating principle and operating process can be described by same dynamic model, different physical structure size and parameters can be described by coefficients. In this way, a complicated boiler system can be represented by several kinds of equipment or subsystem and the dynamic model is universal which can be used on other system without modified [7]. Second, dynamic model of each typical equipment or process is established based on mechanism analysis and programmed with Matlab language. The boiler simulation algorithms library is composed of all the dynamic models. Third, the whole system model can be built on Simulink platform by joining the module together according to the structure of real system. At the same time, the module is convenient for user to understand and apply by reason of its clear physical meaning. According to the characteristic of object, the boiler can be divided into air and gas system, drum system, superheater system and so on. Each system can be divided into subsystem farther. The structure of boiler model is shown in figure 1. 2.2 Programming with Matlab CMEX There are some disadvantages in writing programs with Matlab language such as slow executive speed and the disclosure of original code. CMEX S-function is a S-function
262
C.-l. Liu, L. Chen, and X.-m. Wang
Fig. 1. Structure of boiler Model
Fig. 2. Boiler equipment modelling process
written in C language and can be compiled into dynamic link library (DLL file). It also can solve first order differential equations without need to transform the differential equation into difference equation [2]. Simulink has provided template of CMEX S-function whose name is funtmpl_basic.c. The format is similar to MS-Function that constitute by some subfunctions. Using this template, we only need to modify subfunction if we want to compile c-mex, this improved the efficiency of software producing. The C language algorithm program should be debugged and modified according to characteristics of objects and validated by real operating data. The process is shown in Fig.2. To explain the method of how to write CMEX S-function with the template, a simplified dynamical model of drum is given as follow [4]. Drum pressure differential eguations:
dPd ρ sWe + ( ρ w - ρ s ) XrWr - ρ wWs . = d ρs d ρw dt + ρ sVw ρ wVs dPd dPd
(1)
Steam flow algebraic equation:
:
Ws = 0.06564( 1 + 522.3Pd / u 2 − 1)u 2 .
inputs u[0]= Wr ,steam/water mixture flow of drum inlet (kg/s) u[1]= Xr ,steam ratio of Wr (0~1)
;
;
(2)
Development of Simulation Software for Coal-Fired Power Units
;
263
u[2]= We ,feedwater flow (kg/s) u[3]= u ,position of turbine CV (0~100) u[4]= ρ w , saturation water density (kg/m3)
;
; u[5]= ρ , saturation steam density (kg/m3); state variable: x[0]= P ,drum pressure (MPa); outputs: y[0]= P ,drum pressure (MPa); y[2]= W ,steam flow(kg/s); s
d
d
S
Vs, Vw water volume and steam volume in drum, they vary with water level. To simplify the model they can be assumed to be constant, for 300MW power unit in this paper, Vs=200, Vw=400.
dρ w dρ s and vary with Pd, they can be calculated by dPd dPd
steam table in the program. MdlInitializeSizes is used to set the number of input, output, continuous state and discrete state and so on. The number of user preferences should be declared in this subfunction. If the expression of calculating output contains input, Macro of ssSetInputPort Direct_Feed_Through should be assigned 1. The initial value of state variable should be assigned in MdlInitializeconditions. The value of continuous state variable is calculated in Mdl_derivatves. Output of the model is calculated in MdlOutputs. Finally, compile the file into DLL file with mex instruction at the command window. The calling method of DLL file is the same as MS-function. If a MS-function file which has same name with DLL file exists in the same path, DLL file will be called prior to the MS-function. 2.3 The Packaging of Module To provide a friendly user interface, we can customize dialog box and icon for algorithm module by setting parameter on the Mask Editor. The parameter interface of drum and algorithm are shown in Fig. 3.
Fig. 3. The parameter interface and algorithm of drum
264
C.-l. Liu, L. Chen, and X.-m. Wang
Two continuous variables need to be initialized at the beginning of simulation. To convenient for user, they should be defined as user parameter in masking interface about the algorithm. In this way, continuous simulation for different operating condition can be realized through changing the initial values. Interaction between equipment may product algebraic loop in the model and cause instability in simulation. To solve this problem, the mathematical model should be transformed into form of explicit function and programming. Otherwise, memory block can be set between input and output to avoid algebraic loop.
3 Simulation Algorithms Library of Power Unit According to the structure of the coal-fired power unit, the algorithm library should include three parts: (1) special algorithm of coal-fired power unit such as airheater, primary air fan(centrifugal fan), secondary air fan and gas draft fan(axial flow fan), mill, coal-feeder, furnace, water-wall riser, drum, economizer, superheater, reheater, desuperheater, turbine, steam heater, condenser, deaerator and so on. (2) general algorithm of general device such as water tank, motor, steam or water pipe, valve and so on. (3) algorithm of steam parameters table which can be calculated with steam table directly or IAPWS-IF97, IFC-67 formula. Through modify slblocks.m file, load the algorithm library into Simulink toolbox as the Fig.4 displays. Steam parameters algorithm is used to calculate saturated temperature, saturated water enthalpy, saturated steam enthalpy and so on. In the simulation algorithms library, the dynamic time of combustion process is far less than that of heater transfer, so the dynamic process the mathematical model of furnace combustion can be neglected. Downcomer model is included into waterwall riser algorithm. Milling system adopt direct-fired mill model, coal is grinded by milling coal machine and then blown into the boiler furnace.
Fig. 4. Algorithms library of the boiler system
Development of Simulation Software for Coal-Fired Power Units
265
4 Simulation Research The simulation object is a 300MW coal fired power unit in Yangquan power plant. The boiler is subcritical, circulate naturally and has single drum. Its nominal parameter as follow: the pressure of drum is 18.2Mpa and the main steam is 17.26Mpa, the temperature of main steam is 540℃, the reheat steam is 540 ℃, main steam flow is 918t/h and reheat steam flow is 777.5 t/h. The simulation model of coal-fired power unit is shown in Fig.5. Sometimes we need to load or save the value of variables of simulation model; this can be realized with Simulink “Date Import/Export” and load/save from workspace parameter setting. Through save command at the command window we can save the final state of simulation from workspace into a mat-file. Next time, the data can be load in command window from the mat-file to workspace. In this way, the final state one time can be used as the initial value another time. Load the state of full load as initial state and adopt variable step size to simulate. The maximum step is 0.1s, at the time of 60s, add the disturbance of turbine valve5%, the results are shown in Fig.6-9. With the value of turbine increases abruptly, the steam flow increases that make the pressure of drum going down, and the drop of pressure also causes the steam flow going down. Due to the interaction between the steam flow and the pressure of drum, they drop step by step, at last achieve stable state. The drop of the pressure drum also cause the latent heat of vaporization of water increasing, but the caloric that absorbed by riser not change, so at last the amount of evaporation is smaller then before. The water level of drum ascends at first by the reason of the bulk of steam bubble in water will aggrandizing fleetly for pressure dropping, then the water level descends because abundance of steam bubble has escaped form water, at last above reasons disappear the water level ascends for evaporation is smaller then giving water. At beginning, the temperature of steam drops and then ascends, finally, is higher then before. These figures show that the respondence of model to disturbance accorded with real process of boiler system, also indicate that the model can respond the dynamic behavior of system preferably.
Fig. 5. Simulation model of power unit system
266
C.-l. Liu, L. Chen, and X.-m. Wang
Fig. 6. Main steam flow
Fig. 7. Drum pressure
Fig. 8. Drum pressure
Fig. 9. Drum water level
5 Conclusion For the boiler system, the rationally simplified dynamic model for system level simulation was set up based on energy conservation equation and mass conservation equation with the mechanism modeling and the principle of modularization. Then built a algorithm library of boiler system with the method of developing software simulation based on Matlab/Simulink. After that, construct the simulation model by the organic combination of algorithm in library. Finally, do some experimental research. The instances given manifest that the model is of good performance both in steady and dynamic state, the algorithm library is comprehensive and universal, the approach which make full use of the advantages of the C and Matlab in developing simulation software not only increase the developing efficiency greatly but also afford capacious space for simulation software connect with control system. Acknowledgements. This paper is supported by The High-Tech Research and Development Program of China (The 863 Program):2007AA041106.
References 1. Flynn, M.E., Malley, M.J.O.: A Drum Boiler Model for Long Term Power System Dynamic Simulation. IEEE Transcation Power System 14(1), 209–217 (1999) 2. Astrom, K.J., Bell, R.D., Bell.: Dynamic Models for Boiler-Turbine-Alternator Units: Data Logs and Parameter Estimation for a 160 MW Units. Dep. Automatic Contr. Lund Institute Technol., Lund. Sweden, Rep. pp. 131–137 (1987)
Development of Simulation Software for Coal-Fired Power Units
267
3. Pellegrinetti, G., Bentsman, J.: Nonlinear Control Oriented Boiler Modelling-A Benchmark Problem for Controller Design. IEEE Trans. Control Syst. Technol. 4(1) (1996) 4. Liu, C.L.: The Research and Application of Dynamic Model of Thrmal Process in Lager Scale Power Unit.Biao Ding: The doctoral dissertation of North China Electric Power University (2001) 5. Johansson, L.: A Flight Dynamics Course Based on MATLAB Computer Aircraft Assignments. Design 3, 249–259 (2000) 6. McCain, T.M.: Matlab as a Development Environment for FPGA Design. 607–610 (2005) 7. Yue, S.F., LI, P.K.: The Modular Modelling Method Based on Matlab Simulink for the Thermal System in Power Plant. Gas Turbine Technol. 18(4), 39–50 (2005)
Inconsistency Management Sylvia Encheva1 and Sharil Tumin2 1
Stord/Haugesund University College, Bjørnsonsg. 45, 5528 Haugesund, Norway [emailprotected] 2 University of Bergen, IT-Dept., P.O. Box 7800, 5020 Bergen, Norway [emailprotected]
Abstract. This work is devoted to drawing conclusions based on a set of possibly inconsistent data. Particular attention is paid to distinguishing applications of inappropriate methods from inability to solve a problem combining several methods. Intermediate truth values are used to facilitate the process of comparing degrees of certainties among contexts. Three-level nested lattices are used to facilitate the process of distinguishing all possible outcomes of tests with pre-determined number of questions and pre-determined number of answer alternatives following each question. Keywords: Decision support services, uncertainty management.
1
Introduction
Boolean logic appears to be sufficient for most everyday reasoning. However, it is certainly unable to provide meaningful conclusions in presence of inconsistent and/or incomplete input [13]. Solutions to such problems can be found applying many-valued logics. This work is devoted to drawing conclusions based on a set of possibly inconsistent data. Particular attention is paid to distinguishing applications of inappropriate methods from inability to solve a problem combining several methods. Intermediate truth values are used to facilitate the process of comparing degrees of certainty among contexts. The rest of the paper is organized as follows. Related work, basic terms and concepts are presented in Section 2. The model is described in Section 3. The paper ends with a description of the system in Section 4 and a conclusion in Section 5.
2
Background
Let P be a non-empty ordered set. If sup{x, y} and inf {x, y} exist for all x, y ∈ P , then P is called a lattice, [2]. In a lattice illustrating partial ordering of knowledge values, the logical conjunction is identified with the meet operation and the logical disjunction with the join operation. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 268–275, 2008. c Springer-Verlag Berlin Heidelberg 2008
Inconsistency Management
269
k false
true
unknown
t
Fig. 1. Three-valued logic
Many-valued logics can be obtained from the generalized Lukasiewicz logic [16]. The set of truth values with cardinality n corresponds to the equidistant rational numbers 2 n−2 1 , , ..., , 1.} {0, n−1 n−1 n−1 A three-valued logic, known as Kleene’s logic is developed in [15]. The three truth values, truth, unknown and false, where unknown indicates a state of partial vagueness. These truth values represent the states of a world that does not change. The three truth values are arranged in lattice in Fig. 1 according to degrees of truth (t) and knowledge (k). Three-valued logic is further discussed in [1], [4], [5], and [6]. A brief overview of a six-valued logic, which is a generalized Kleene’s logic, has been first presented in [17]. The six-valued logic is described in more details in [8]. In [4] this logic is further developed by assigning probability estimates to formulas instead of non-classical truth values. The six-valued logic distinguishes two types of unknown knowledge values permanently or eternally unknown value and a value representing current lack of knowledge about a state [7]. k contradiction
false
unknown
true
unknown
f
t
unknown
t
Fig. 2. Six-valued logic
270
S. Encheva and S. Tumin
Fig. 3. Two-level nested line diagram
The six truth values truth, unknown, unknownf , unknownt , contradiction, and false, are arranged in a lattice in Fig. 2. Nested line diagrams are used for visualizing large concept lattices, emphasizing sub-structures and regularities, and combining conceptual scales, [19]. A two-level nested line diagram consists of an outer line diagram, which contains in each node inner diagrams, see f. ex. Fig. 3. Seven-valued logic has been employed in reliability measure theory [14], for verification of switch-level designs in [9], and for verifying circuit connectivity of MOS/LSI mask artwork in [18]. The seven-valued logic presented in [14], known also as seven-valued relevance logic, has the following truth values - truth (i.e. valid), false (i.e. invalid), true by default, false by default, unknown, contradiction, and contradiction by default. A seven-valued logic presented in [3] is based on the following truth values: uu unknown or undefined, kk - possibly known but consistent, ff - false, tt - true, ii - inconsistent, it - non-false, and if - non-true. A truth table for the ontological operation ∨ on these truth values is presented in [3]. The seven truth values are arranged in a lattice Fig. 4. ii
tt
ff
kk
if
it
uu
Fig. 4. Lattice of the seven-valued logic
Inconsistency Management
3
271
Skills Evaluation
Students’ abilities to solve problems are evaluated based on results of tests they take. A test consists of four problems requiring skillful application techniques that involve combination of two different methods. The importance of the two methods is graded, i.e. mastering the first method is considered to be of a grater value than mastering the second method. The following stem responses to a single problem are suggested – – – –
the student’s response is correct, the student’s response is incorrect with respect to both methods, no response is provided, both methods are applied correctly but the student has not completed the solution due to unskillful combination of these methods, – only the first method is applied correctly, and – only the second method is applied correctly. The number of possible responses to such a test is calculated applying the formula n+k−1 , n = 6, k = 4. k All the 126 responses are arranged in a three-level lattice in Fig. 5. The first level is considered to be a lattice with three nodes. It is used for a detailed description of students’ abilities to solve a problem combining two different methods. No significant change in a student’s learning is assumed if the results of two consecutive tests appear in the same node of level one. The second level is based on six-valued logic. It is used for illustrating significant changes in students’ abilities to solve a problem combining two different methods. The third level is based on seven-valued logic. The seven-valued logic is both associative and commutative. This allows combining results of tests applying the rules for truth values presented in [3].
4
System Description
A Web application server framework implementation using Apache HTTP server, Python run time environment to Apache using mod python and SQLite relational database engine is proposed for the system. Students and instructors interact with the system using Web browsers through Web application interfaces provided by the system. System administrators interact directly with the application server and the database. The system provides three independent modules which can be implemented as sub-modules within the Web application framework or as an independent sub-system connected to the Web application using XML-RPC.
272
S. Encheva and S. Tumin
Fig. 5. Three-level nested lattice
Inconsistency Management
273
Instructor
Student
HTTP
Administrator
SQL HTTP SSH
DATABASE Web Application Server
XML-RPC
Test Module
Evaluation Module
Report Module SQL
Fig. 6. The system
The sub-system provides three functions implemented as independent modules – Test – Evaluation – Report These modules can run in parallel and only interact with each other through the data they place in the database. The job of the Test module is to dynamically prepare a test Web form to students by: – Randomly select four problems from a pool of problems stored in the database. – Arrange the four questions randomly within the Web page. – Place each of the six options randomly after the corresponding problem. – Save a short test summery into the database in an encoded compact form. – Present the test to students. The function of the Evaluation module is to evaluate students’ tests responses by – Calculating and placing the response into one of the nodes in Fig. 5. Save the result into the database encoded in node name for example “fi”.
274
S. Encheva and S. Tumin
– Calculating the current accumulative test score applying the ontological operation ∨ on the previous accumulative test score and the current test score. – Saving the accumulative test score into the database. The Report module provides students and instructors with utilities for reporting students’ evaluation information, f. ex. a particular student evaluation history.
5
Conclusion
This work discusses automated evaluation of students’ abilities to solve problems combining two different methods. Three-level nested lattices are used to facilitate the process of distinguishing all possible outcomes of tests with four questions and six answer alternatives to each question.
References 1. Bruns, G., Godefroid, P.: Model Checking Partial State Spaces with 3-Valued Temporal Logics. In: Halbwachs, N., Peled, D. (eds.) CAV 1999. LNCS, vol. 1633, pp. 274–287. Springer, Heidelberg (1999) 2. Davey, B.A., Priestley, H.A.: Introduction to lattices and order. Cambridge University Press, Cambridge (2005) 3. Ferreira, U.: Uncertainty and a 7-Valued Logic. In: Dey, P.P., Amin, M.N., Gatton, T.M. (eds.) The 2nd International Conference on Computer Science and its Applications, National University, San Diego, CA, pp. 170–173 (2004) 4. Fitting, M.: Kleene’s Logic. Generalized. J. Log. Comp. 1(6), 797–810 (1991) 5. Fitting, M.: Kleene’s three-valued logics and their children. Fundamenta Informaticae 20, 113–131 (1994) 6. Fitting, M.: Tableaus for many-valued modal logic. Studia Logica 55, 63–87 (1995) 7. Garcia, O.N., Moussavi, M.: A Six-Valued Logic for Representing Incomplete Knowledge. In: The 20th International Symposium on Multiple-Valued Logic, pp. 110–114. IEEE Computer Society Press, Charlotte (1990) 8. Garcia-Duque, J., Lopez-Nores, M., Pazos-Arias, J., Fernandez-Vilas, A., DiazRedondo, R., Gil-Solla, A., Blanco-Fernandez, Y., Ramos-Cabrer, M.: A Six-valued Logic to Reason about Uncertainty and Inconsistency in Requirements Specifications. J. Log. Comp. 16(2), 227–255 (2006) 9. Hahnle, R., Werner Kernig, W.: Verification of Switch-level designs with manyvalued logic. LNCS, vol. 698, pp. 158–169. Springer, Heidelberg (1993) 10. Apache HTTP Server Project, http://httpd.apache.org/ 11. Python Programming Language, http://www.python.org/ 12. SQLite, http://www.sqlite.org/ 13. Immerman, N., Rabinovich, A., Reps, T., Sagiv, M., Yorsh, G.: The boundery between decidability and undecidability of transitive closure logics. In: CSL 2004 (2004) 14. Kim, M., Maida, A.S.: Reliability measure theory: a nonmonotonic semantics. IEEE Tr. on K. and D. Eng. 5(1), 41–51 15. Kleene, S.: Introduction to Metamathematics. D. Van Nostrand Co., Inc., New York (1952)
Inconsistency Management
275
16. Lukasiewicz, J.: On Three-Valued Logic. Ruch Filozoficzny, 5 (1920). In: Borkowski, L. (ed.) 1970. Jan Lukasiewicz: Selected Works, North Holland, Amsterdam (1920) 17. Moussavi, M., Garcia, O.N.: A Six-Valued Logic and its application to artificial intelligence. In: The Fifth Southeastern Logic Symposium (1989) 18. Takashima, M., Mitsuhashi, T., Chiba, T., Yoshida, K.: Programs for Verifying Circuit Connectivity of MOS/LSI Mask Artwork. In: 19th Conference on Design Automation, pp. 544–550. IEEE Computer Society Press, Los Alamitos (1982) 19. Wille, R.: Concept lattices and conceptual knowledge systems. Comp. Math. Appl. 23(6-9), 493–515 (1992)
Neural Network-Based Adaptive Optimal Controller – A Continuous-Time Formulation* Draguna Vrabie, Frank Lewis, and Daniel Levine Automation and Robotics Research Institute, University of Texas at Arlington, 7300 Jack Newell Blvd. S., Fort Worth, TX 76118 USA Department of Psychology, University of Texas at Arlington, Arlington, TX 76019-0528 USA {dvrabie,lewis,levine}@uta.edu
Abstract. We present a new online adaptive control scheme, for partially unknown nonlinear systems, which converges to the optimal state-feedback control solution for affine in the input nonlinear systems. The main features of the algorithm map on the characteristics of the rewards-based decision making process in the mammal brain. The derivation of the optimal adaptive control algorithm is presented in a continuous-time framework. The optimal control solution will be obtained in a direct fashion, without system identification. The algorithm is an online approach to policy iterations based on an adaptive critic structure to find an approximate solution to the state feedback, infinite-horizon, optimal control problem. Keywords: Direct Adaptive Optimal Control, Reinforcement Learning, Policy Iteration, Adaptive Critics, Continuous-Time, Nonlinear Systems, Neural Networks.
1 Introduction It is well known that solving the optimal control problem is generally difficult even in the presence of complete and correct knowledge of the system dynamics, as Bellman’s dynamic programming approach suffers from the so called “curse of dimensionality” [16]. This motivated several advances in solving the optimal control problem using dual adaptive control techniques, surveyed in [9], [30], which would simultaneously improve the estimated system model parameters and improve on the suboptimal controller. Nonetheless, another difficulty appeared, posed by dual control theory, known as the exploration-exploitation dilemma [25]. In order to adaptively solve optimal control problems a new methodology, namely Reinforcement Learning (RL), was developed in the computational intelligence community and then gradually adapted to fit the control engineering requirements. Reinforcement learning means finding a control policy, i.e. learning the parameters of a controller mapping between the system states and the control signal, such that to *
This work was supported by NSF ECS-0501451, NSF ECCS-0801330 and ARO W91NF-051-0314.
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 276–285, 2008. © Springer-Verlag Berlin Heidelberg 2008
Neural Network-Based Adaptive Optimal Controller
277
maximize a numerical reward signal [25]. Reinforcement learning is defined by characterizing a learning problem which is in fact the adaptive optimal control problem. Thus, from a control engineering perspective, RL algorithms can be viewed as a class of adaptive controllers which solve the optimal control problem based on reward information which characterizes the performance of a given controller. In this paper we will focus our attention on a class of reinforcement learning algorithms, namely policy iteration. The goal of the paper is to present a new policy iteration algorithm which, without making use of complete knowledge of a system’s dynamics, will learn to approximate, in an online fashion and with arbitrary small accuracy, the optimal control solution for a general nonlinear affine in the input continuous-time system. In order to solve the optimal control problem, instead of directly solving the Hamilton-Jacobi-Bellman (HJB) equation [16] for the optimal cost and then finding the optimal control policy (i.e. the feedback gain for linear systems), the policy iteration method starts with the evaluation of the cost associated with an initial stabilizing control policy and then uses this information to obtain a new policy which will result in improved control performances. The algorithm can be viewed as a directed search for the optimal controller in the space of admissible control policies. Policy iteration algorithm was first formulated in [13]. For continuous state linear systems policy iteration algorithms were developed in [5], [19] and [26] used to find the optimal Linear Quadratic Regulator (LQR) [16]. Convergence guarantees were given in [11] and [14]. In [5] policy iteration was formulated to solve the discretetime LQR problem using Q-functions [27], [28], thus the resulting algorithm is model free. For continuous-time systems, in [19], the model free quality of the approach was achieved either by evaluating online the infinite horizon cost associated with an admissible control policy or by using measurements of the state derivatives. The policy iteration algorithm in [26] is an online technique which solves the LQR problem along a single state trajectory, using only partial knowledge about the system dynamics and without requiring measurements of the state derivative. In the case on nonlinear systems policy iteration is in fact the method of successive approximations developed in [21]. This method iterates on a sequence of Lyapunov equations which are somewhat easier to solve than the HJB equation. In [2], [3] the solution for these Lyapunov equations was obtained using the Galerkin spectral approximation method and in [1] they were solved, in the presence of saturation restrictions on the control input, using neural network approximator structures. Neural network-based structures for learning the optimal control solution via the HJB equation, namely Adaptive Critics, were first proposed in [18]. Adaptive Critics and neural network training algorithms were presented both in discrete-time, [20], and continuous-time, [10], framework. The policy iteration methods developed in [2], [3] and [1] are generally applied offline as they require complete knowledge on the dynamics of the system to be controlled. Stabilizing adaptive controllers that are inverse optimal, with respect to some relevant cost not specified by the designer, have also been derived [17]. Due to their offline character imposed by the system model requirement these methods are not sensitive to changes in the system dynamics. The algorithm that we present in this paper is a policy iteration algorithm which uses the Bellman optimality equation as a consistence relation when solving for the value associated with a given policy, and
278
D. Vrabie, F. Lewis, and D. Levine
not the regular, Hamiltonian-based, Lyapunov equation. This determines in the model free property of the proposed algorithm and grants its online implementation feature. In the next section the continuous-time optimal control problem for nonlinear systems is formulated. The new online policy iteration algorithm is then presented followed by its neural network based online implementation, on an Actor-Critic structure. The relation of the algorithm with certain learning mechanisms in the mammal brain is then discussed followed by concluding remarks.
2 The Optimal Control Problem Consider the time-invariant affine in the input dynamical system given by
x (t ) = f ( x(t )) + g ( x(t ))u ( x(t )) ; x(0) = x0
(1)
with x(t ) ∈ R n , f ( x(t )) ∈ Rn , g ( x(t )) ∈ Rn×m and the input u(t ) ∈U ⊂ R m . We assume that f ( x) + g ( x)u is Lipschitz continuous on a set Ω⊆ R n that contains the origin and that the dynamical system is stabilizable on Ω , i.e. there exists a continuous control function u(t ) ∈U such that the system is asymptotically stable on Ω . Define the infinite horizon integral cost ∞
V ( x0 ) = ∫ r ( x(τ ), u (τ ))dτ
(2)
where r ( x, u ) = Q ( x) + uT Ru with Q( x) positive definite, i.e. ∀x ≠ 0, Q( x) > 0 and x = 0 ⇒ Q( x) = 0 , and R∈R m×m is a positive definite matrix. Definition 1. (Admissible policy) A control policy μ ( x ) is defined as admissible
with respect to (2) on Ω , denoted by μ ∈Ψ (Ω) , if μ ( x ) is continuous on Ω , μ (0) = 0 , μ ( x ) stabilizes (1) on Ω and V ( x0 ) is finite ∀x0 ∈Ω . For any admissible control policy μ ∈Ψ (Ω) if the associated cost function ∞
V μ ( x0 ) = ∫ r ( x(τ ), μ ( x(τ )))dτ
(3)
is C1 then a infinitesimal version of (3) is 0 = r ( x, μ ( x)) +Vxμ T ( f ( x) + g ( x) μ ( x)), V μ (0) = 0
(4)
where Vx μ denotes the partial derivative of the value function V μ with respect to x , as the value function does not depend explicitly on time. Equation (4) is a Lyapunov equation for nonlinear systems which, given the controller μ ( x )∈Ψ ( Ω ) , can be solved for the value function V μ ( x) associated with it. Given that μ ( x ) is an
Neural Network-Based Adaptive Optimal Controller
279
admissible control policy, if V μ ( x) satisfies (4), with r ( x, μ ( x)) ≥ 0 , then V μ ( x) is a Lyapunov function for the system (1) with control policy μ ( x ) . The optimal control problem can now be formulated: Given the continuous-time system (1), the set u∈Ψ (Ω) of admissible control policies and the infinite horizon cost functional (2), find an admissible control policy such that the cost index (2) associated with the system (1) is minimized. Defining the Hamiltonian of the problem H ( x, u, Vx* ) = r ( x (t ), u (t )) +Vx*T ( f ( x(t )) + g ( x(t ))u (t ))
(5)
the optimal cost function V * ( x) satisfies the HJB equation 0 = min [ H ( x, u ,Vx* )] . u∈Ψ ( Ω )
(6)
Assuming that the minimum on the right hand side of the equation (6) exists and is unique then the optimal control function for the given problem is
u* ( x) = − R −1g T ( x)Vx* ( x ) .
(7)
Inserting this optimal control in the Hamiltonian we obtain the HJB equation in terms of Vx* 1 0 = Q( x) +Vx*T ( x) f ( x) − Vx*T ( x) g ( x) R −1 g T ( x)Vx* ( x); V * (0) = 0 . 4
(8)
This is a necessary and sufficient condition for the optimal value function [16]. For the linear system case, considering a quadratic cost functional, the equivalent of this HJB equation is the well known Riccati equation. In order to find the optimal control solution for the problem one only needs to solve the HJB equation (8) for the value function and then substitute the solution in (7) to obtain the optimal control. However, solving the HJB equation is generally difficult as it is a nonlinear differential equation, quadratic in the cost function, which also requires complete knowledge of the system dynamics (i.e. the system dynamics described by the functions f ( x), g ( x) need to be known).
3 The Policy Iteration Algorithm In order to solve the optimal control problem, instead of directly solving the HJB equation (8) for the optimal cost and then finding the optimal control policy given by (7), the policy iteration method starts by evaluating the cost of a given initial admissible policy and then makes use of this information to improve the control policy. The two steps are repeated until the policy improvement step no longer changes the actual policy. The following online reinforcement learning algorithm will solve the infinite horizon optimal control problem without using knowledge regarding the system internal dynamics (i.e. the system function f ( x) ).
280
D. Vrabie, F. Lewis, and D. Levine
First note that given an admissible policy for (1), μ ( x ) , such that the closed loop system is asymptotically stable on Ω , then the infinite horizon cost for any x(t )∈Ω is given by (3) and V μ ( x(t )) serves as a Lyapunov function for (1). The cost function (3) can thus be written as V μ ( x(t )) =
t +T
∫
r ( x(τ ), μ ( x(τ )))dτ +V μ ( x(t +T )) .
(9)
t
Based on (9) and (6), considering an initial admissible control policy μ (0) ( x) , the following policy iteration scheme can be derived: 1.
solve for V μ ( x) using (i )
V μ ( x(t )) = (i )
t +T
∫
r ( x(τ ), μ (i ) ( x(τ )))dτ +V μ ( x(t +T )), V μ (0) = 0 (i )
(i )
(10)
t
2.
update the control policy using
μ (i+1) ( x) = arg min{H ( x, μ , Vxμ )}
(11)
μ (i+1) ( x) = − R −1 g T ( x)Vxμ ( x) .
(12)
(i )
μ
which in this case is (i )
Equations (10) and (12) formulate a new policy iteration algorithm to solve for the optimal control without making use of any knowledge of the system internal dynamics f ( x) . The online implementation of the algorithm will be discussed in next section. This algorithm is an online version of the offline algorithms proposed in [2], [3], [1], algorithm inspired by the online adaptive critic techniques proposed by computational intelligence researchers [4], [20], [29]. The convergence of the algorithm is now discussed. Lemma 1. Solving for V μ of the Lyapunov equation
(i )
in equation (10) is equivalent with finding the solution
0 = r ( x, μ (i ) ( x )) +Vxμ
(i )
T
( f ( x ) + g ( x) μ (i ) ( x)), V μ (0) = 0 (i )
(13)
The proof is based on the fact that the solution of the Lyapunov equation (13),
V
μ (i )
, satisfies also equation (10), and that equation (10) has a unique solution.
Remark 1. Note that although the same solution is obtained whether solving the equation (10) or (13), solving equation (10) does not require any knowledge on the system dynamics f ( x) . From Lemma 1 it follows that the algorithm (10) and (12) is equivalent to iterating between (13) and (12), without using knowledge of the system internal dynamics.
Neural Network-Based Adaptive Optimal Controller
281
Theorem 1. (convergence) The policy iteration algorithm (10) and (12) converges to the optimal control solution on the trajectories having initial state x0 ∈Ω . Proof: In [2], [3], [1] it was shown that using policy iteration conditioned by an initial
admissible policy μ (0) ( x) , all the subsequent control policies will be admissible and the iteration (13) and (12) will converge to the solution of the HJB equation. Based on the proven equivalence between the equations (10) and (13) we can conclude that the proposed online adaptive optimal control algorithm will converge to the solution of the optimal control problem (2) without using knowledge on the internal dynamics of ▐ the controlled system (1).
4 Online Neural Network-Based Approximate Optimal Control Solution on an Actor-Critic Structure For the implementation of the iteration scheme given by (10) and (12) one only needs to have knowledge of the input to state dynamics, i.e. the function g ( x) , which is required for the policy update in equation (12); however no knowledge on the internal state dynamics, described by f ( x) , is required. In order to solve for the cost function V μ ( x) in equation (10) we will use a neural network, which is a universal approximator [12], to obtain an approximation of the (i )
value function for any given initial state x∈Ω . The cost function V μ ( x(t )) will be approximated by (i )
L
V μ ( x) = ∑ w j μ φ j ( x) = (w L μ )T φ L ( x) , (i )
(i )
(i )
j =1
(14)
a neural network with L neurons on the hidden layer and activation functions
φ j ( x)∈C1 (Ω), φ j (0) = 0 . w j μ
(i )
denote the weights of the neural network, φ L ( x) is the
vector of activation functions and w L μ is the weight vector. The issues related with the neural network approximation error will be addressed in a future paper while we continue the following derivations assuming that the neural network is an exact description of the cost function. Using the neural network description for the value function, equation (14), equation (10) can be written as (i )
w Lμ
(i )
T
φ L ( x(t )) =
t +T
∫
r ( x, μ (i ) ( x))dτ + w L μ
(i )
T
φ L ( x(t +T )).
(15)
t
As the cost function was replaced with the neural network approximation, equation (15) will have the residual error
282
D. Vrabie, F. Lewis, and D. Levine
δ Li ( x(t )) =
t +T
∫
r ( x, μ (i ) ( x))dτ + w L μ
(i )
T
[φ L ( x(t +T )) −φ L ( x(t ))].
(16)
t
From the perspective of temporal difference learning methods, e.g. [7], this error can be viewed as temporal difference residual error. To determine the parameters of the neural network approximating the cost function, in the least-squares sense, we use the method of weighted residuals. Thus we seek to minimize the objective
S =∫
(i )
Ω{μx0 }n
where
μ(i )
Ω{ x
0 }n
the
δ Li ( x)δ Li ( x)dx
(17)
denotes a set of trajectories generated by the policy μ (i ) starting from
initial
{x0 }n ⊂ Ω
conditions
. T
Φ = [φ L ( x (t +T )) − φ L ( x(t ))],[φ L ( x(t +T )) − φ L ( x(t ))]
Conditioned
by
being invertible, then we obtain
the solution w L μ = −Φ −1 [φ L ( x(t +T )) − φ L ( x (t ))], (i )
t +T
∫
r ( x( s ), μ (i ) ( x ( s )))ds
(18)
t
To show that matrix Φ is invertible the following technical results are needed.
{ }1
Lemma 2. If the set φ j
{∇φ
T j (f
}
+ gu )
N
1
N
is linearly independent and u∈Ψ (Ω) then the set
is also linearly independent.
For the proof see [2]. We now introduce a lemma proving that Φ can be inverted.
Lemma 3. Let μ ( x)∈Ψ (Ω) such that f ( x) + g ( x) μ ( x) is asymptotically stable. If the set
{φ j }1
N
is linearly independent then ∃T > 0 such that ∀x(t )∈Ω the set
{φ j ( x(t ), T ) = φ j (x(t +T )) −φ j (x(t ))}1
N
is also linearly independent.
The proof is by contradiction with the result in lemma 2. Based on the result of Lemma 3, conditioned by an excitation requirement related to the selection of the sample time T, the parameters Wi of the cost function can be calculated using only online measurements of the state vector and the integrated reward over a finite time interval. The control policy is updated at time t+T, after observing the state x(t+T) and it will be used for controlling the system during the time interval [t +T , t + 2T ] ; thus the algorithm is suitable for online implementation from the control theory point of view. Figure 1 presents the structure of the system with optimal adaptive controller.
Neural Network-Based Adaptive Optimal Controller
283
Critic Cost function V ( x ) ZOH T
T
T
V
V = Q( x) + uT Ru
Actor Controller
μ ( x)
u
x System x = f ( x) + g ( x )u; x0
Fig. 1. Structure of the system with adaptive controller
It is observed that the update of both the actor and the critic is performed at discrete moments in time. However, the control action is a full fledged continuous-time control, with its constant gain updated at discrete moments in time, since the critic update is based on the observations of the continuous-time cost over a finite sample interval. As a result, the algorithm converges to the solution of the continuous-time optimal control problem.
5 Relation of the Proposed Algorithm with Reward-Based Learning Mechanisms in the Mammal Brain The adaptive algorithm based on policy iteration is implemented on an actor-critic structure [18], [29]. The way in which the actor-critic structure performs continuoustime closed loop control while searching for optimal control policies points out the existence of two time scales for the mechanisms involved: a fast time scale which characterizes the continuous time control process, and a slower time scale which characterizes the learning processes at the levels of the critic and the actor. Thus the actor and critic structures perform tasks at different operation frequencies in relation with the nature of the task to be performed. The fact that is not surprising given that the actor-critic structure was inspired by the way in which the reward based learning takes place in the mammal brain. Different oscillation frequencies are connected with the way in which different areas of the brain perform their functions of processing the information received from the sensors [15]. Low level control structures must quickly react to new information received from the environment while higher level structures slowly evaluate the results associated with the present behavior policy. Another feature of the online policy iteration algorithm presented in this paper is related with the nature of information, i.e. a computed temporal difference (TD) error signal, required for the learning process to take place at the critic level. In relation to this there exist a number of reports, e.g. [22], [23], which argue that the dopamine signal produced by basal ganglia structures in the mammal brain encodes the TD error between the received and the expected rewards and the fact that this dopamine signal favors the learning process by increasing the synaptic plasticity of certain groups of neurons.
284
D. Vrabie, F. Lewis, and D. Levine
A third, and most distinctive, attribute of the adaptive optimal control algorithm concerns the value of the sample time used for obtaining the reward information for the Critic learning process. Lemma 3 indicates that the learning process at the Critic level is conditioned by certain values of the reward signal sampling. Choosing the value of the sample time is generally considered to be a technical requirement of online algorithms and is related with the well known persistency of excitation requirement which grants asymptotic convergence for the learning process. It was thus even more surprising to learn that there exists in the brain a mechanism, described in [6] and verified against experimental data, which supports the existence of a variable sample time for the reward signal. The connection between the learning mechanisms in the mammal brain and the learning structures and algorithms developed for control engineering purposes provides a strong argument in favor of a desired collaboration between the engineering fields of computational intelligence and control, and cognitive science.
6 Conclusion In this paper we presented a new adaptive controller based on a reinforcement learning algorithm, namely policy iteration, to solve on-line the continuous time optimal control problem without using knowledge about the system’s internal dynamics. Several remarks relating the proposed algorithm with reinforcement learning mechanisms in the mammal brain have been included.
References 1. Abu-Khalaf, M., Lewis, F.L.: Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach. Automatica 41(5), 779–791 (2005) 2. Beard, R., Saridis, G., Wen, J.: Galerkin Approximations of the Generalized HamiltonJacobi-Bellman Equation. Automatica 33(12), 2159–2177 (1997) 3. Beard, R., Saridis, G., Wen, J.: Approximate Solutions to the Time-Invariant HamiltonJacobi-Bellman Equation. Journal of Optimization Theory and Application 96(3), 589–626 (1998) 4. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, MA (1996) 5. Bradtke, S.J., Ydestie, B.E., Barto, A.G.: Adaptive Linear Quadratic Control Using Policy Iteration. In: Proc. of ACC, pp. 3475–3476, Baltimore (June 1994) 6. Brown, J., Bullock, D., Grossberg, S.: How the basal ganglia use parallel excitatory and inhibitory learning pathways to selectively respond to unexpected rewarding cues. J. Neuroscience 19, 10502–10511 (1999) 7. Doya, K.: Reinforcement Learning In Continuous Time and Space. Neural Computation 12(1), 219–245 (2000) 8. Feldbaum, A.A.: Dual control theory I-II, Autom. Remote Control 21, 874–880, 1033– 1039 (1960) 9. Filatov, N.M., Unbehauen, H.: Survey of adaptive dual control methods. IEE Proc. Control Theory and Applications 147(1), 118–128 (2000)
Neural Network-Based Adaptive Optimal Controller
285
10. Hanselmann, T., Noakes, L., Zaknich, A.: Continuous-Time Adaptive Critics. IEEE Trans. on Neural Networks 18(3), 631–647 (2007) 11. Hewer, G.: An Iterative Technique for the Computation of the Steady State Gains for the Discrete Optimal Regulator. IEEE Trans. on Automatic Control 16, 382–384 (1971) 12. Hornik, K., Stinchcombe, M., White, H.: Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks 3, 551– 560 (1990) 13. Howard, R.A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge (1960) 14. Kleinman, D.: On an Iterative Technique for Riccati Equation Computations. IEEE Trans. on Automatic Control 13, 114–115 (1968) 15. Levine, D.S., Brown, V.R., Shirey, V.T. (eds.): Oscillations in Neural Systems. Lawrence Erlbaum Associates, Mahwah (2000) 16. Lewis, F., Syrmos, V.: Optimal Control. Wiley, New York (1995) 17. Li, Z.H., Krstic, M.: Optimal design of adaptive tracking controllers for nonlinear systems. In: Proc. of ACC, pp. 1191–1197 (1997) 18. Miller, W.T., Sutton, R., Werbos, P.: Neural networks for control. MIT Press, Cambridge (1990) 19. Murray, J.J., Cox, C.J., Lendaris, G.G., Saeks, R.: Adaptive Dynamic Programming. IEEE Trans. on Systems, Man and Cybernetics 32(2), 140–153 (2002) 20. Prokhorov, D., Wunsch, D.: Adaptive critic designs. IEEE Trans. on Neural Networks 8(5), 997–1007 (1997) 21. Saridis, G., Lee, C.S.: An Approximation Theory of Optimal Control for Trainable Manipulators. IEEE Trans. on Systems, Man and Cybernetics 9(3), 152–159 (1979) 22. Schultz, W., Dayan, P., Read Montague, P.: A Neural Substrate of Prediction and Reward. Science 275, 1593–1599 (1997) 23. Schultz, W.: Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioral ecology. Current Opinion in Neurobiology 14, 139–147 (2004) 24. Slotine, J.J., Li, W.: Applied Nonlinear Control. Prentice-Hall, Englewood Cliffs (1991) 25. Sutton, R.S., Barto, A.G.: Reinforcement Learning – An introduction. MIT Press, Cambridge (1998) 26. Vrabie, D., Pastravanu, O., Lewis, F.L.: Policy Iteration for Continuous-time Systems with Unknown Internal Dynamics. In: Proc. of MED (2007) 27. Watkins, C.J.C.H.: Learning from delayed rewards. PhD Thesis, University of Cambridge, England (1989) 28. Werbos P.: Neural networks for control and system identification. In: IEEE Proc. CDC 1989 (1989) 29. Werbos, P.: Approximate dynamic programming for real-time control and neural modeling. In: White, D.A., Sofge, D.A. (eds.) Handbook of Intelligent Control, Van Nostrand Reinhold, New York (1992) 30. Wittenmark, B.: Adaptive dual control methods: An overview. In: 5th IFAC Symp. on Adaptive Systems in Control and Signal Processing, pp. 67–73 (1995)
On Improved Performance Index Function with Enhanced Generalization Ability and Simulation Research Dongcai Qu1, Rijie Yang2, and Yulin Mi3 1
Department of Control Engineering of Naval Aeronautical and Astronautical University, Yantai, 264001, P.R. China [emailprotected] 2 Department of Electronic and Information Engineering of Naval Aeronautical and Astronautical University, Yantai, 264001, P.R. China [emailprotected] 3 Department of Training of Naval Aeronautical and Astronautical University, Yantai, 264001, P.R. China [emailprotected]
Abstract. It was one of most main performance that identification model of Artificial Neural Network (ANN) was Generalization Ability. It also was one of key question researched by domestic and foreign concerned experts in the recent years. Generalization ability of ANN’s identification model had concerned with many factors, and appropriate designed performance index function was an important influence factor. After common performance index function was analyzed based on the mean error function smallest principle, a kind of improved performance index function was obtained through joined the power values to the time delay information. The massive simulation researches show that improved performance index function is effective to enhance generalization ability of ANN’s models.
1 Introduction The basic meaning of ANN’s Generalization Ability was that: ANN's Identification Model having been studied still had correct response ability to the testing or working sample not contained in the training sample set (but had the same distribution form), also could gave correct output to the untrained test or working sample date after the ANN model was trained by a small quality of training sample, or its output could satisfy the precise demand of the problems unresolved. As the sample data not studied and trained was always abound, the quality of the generalization ability of the ANN’s identification model became very important. So how to improve the ability was one of key questions researched by domestic and foreign concerned experts who had made some research achievement in the recent years. However, synthetically speaking, most of the achievement was limited in the theory lay of qualitative analysis[1].In fact, as a very complex question, the quality of the ability had concerned with many factors. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 286–293, 2008. © Springer-Verlag Berlin Heidelberg 2008
On Improved Performance Index Function
287
Mainly, the design of performance index function's form was researched in stress in this text; certain simulation result was given to the people relatively for reference.
2 Analysis of Relationship between Generalization Ability and Structure, Training Method and Performance Index Function Training ANN model was not to let it remember in brief the training sample that had been learned, but make the ANN model discover and recover the inside regulation concerning environment implicit at the training sample mainly through studying training sample, and then it could give right exportation for testing or working sample. Now the simple analysis was carried on to the relationship between the generalization ability and network structure, training method and design of the performance index function [1-5]. The generalization ability of ANN and the network structure was closed-related, if the structure parameter of ANN was far less than the training sample set, the opportunity of the sample became over fitting would be small, which was beneficial to developing generalization of network, if not, the chance of ANN learn the noise of the system would be big, for example, if the network was trained by the smallest function index of mean error and error inside the training set minimum was pursued excessively, ANN would remember some noise or specific example with the phenomenon of over fitting and couldn't study real system regulation, making the generalization ability of network weak. But it is a pity that up to now there was not any mature theories to guide how to design the network structure with fit scale of parameter, mostly the theories was gathered together by experiment according to the difficulty of the problems solved, or got by carrying on simulation and choosing the network structure with less error through designing some kind of performance index function [1-3] [5-6]. A proper training method of ANN model was also effective to improving the generalization ability. If the network was trained by the method of normal method and the method of stopping training [7-9] (or called the method of stopping training advanced) or the others, the generalization ability of ANN model could be improved. At present a kind of performance function was designed basing on minimizing mean error usually, the function incarnated the distance between expectation response and practice response, however, as the solution of a function resumed from finite sample was countless in fact, the problem was in-posed usually, and the normal method added a normal item limiting the complexity of approaching function based on the standard error item, on the premise of keeping studying precision, the network structure would be predigested with reason by the normal item to improve the network generalization ability. In the angle of checking in advance distributing, after the performance index function of the power values parameter was endowed with the meaning of checking in advance probability, its minimum equaled to the maximum of power values parameter's checking behind probability through Bayesian analysis[10]. The corresponding relation between the form of normal item in common use and the checking in advance distributing of the ANN power values was usually described by Gaussian, Laplace and Cauchy distributing, however, the analysis and description of those checking in advance distributing was complicated and the quantum of computing was very heavy,
288
D. Qu, R. Yang, and Y. Mi
what's more, whether it satisfied actual distributing rule depended on the idiographic analysis of the idiographic problem. The basic idea of the stopping-training-if-best method was: first of all, sample set was divided into training sample set and validation sample set or testing set (testing set could be chosen), then ANN was trained by the training sample set, making some kind of the performance index function lest, and when ANN was trained, the validation sample set was used to watch and control it. At the beginning of the training, the common validation error would minish along with the minishing of the training error, but if the ANN model began to come into excessive training, the validation error would increase gradually, when validation error increased to a certain degree, the training of the ANN model would stop in advance, at this time the training function returned ANN model when validation error was lest. The method of stopping training if best was a hidden normal method [11-12], it was apt to design low complicated network and could improve the generalization ability of ANN. The key of that training method was to make sure the proper point of stopping training, so the training sample set and the validation sample set should be divided with reason, but the rationality was hard to control. Now the sample set was divided by simulation usually, its rationality was evaluated by the calculatingly testing and validation error.
3 On Improved Performance Index Function After the existing ANN training algorithm was analyzed, it was found that when ANN was trained, in order to make the performance index function basing on some kind of the least error obtain the lest value, most of the algorithm achieved that by seeking the best searching direction and optimizing the length of step, just as BP algorithm applied abroad in BP Multilayer Feedforward Network model, in fact, the realization of the algorithm BP MFN model's function approaching function was that after BP MFN model was trained, a mapping from the training sample set to the space of network power value, realizing the mapping Z P → θˆ , Z P = {[u (i ), y (i ) i = 1,L, P ]} was the appointed training sample set, θˆ was the forecasting value of network power value’s congregate θ , that mean the basic strategy of BP algorithm was to make error function of some form smallest, most of which was the principle of mean error function lest based on (1)
J (θ , z p ) =
、
1 2N
N
NL
∑∑ (d
( p) i
− y i( p ) ( L ) (t ))
(1)
N =1 i =1
Thereinto, d i( p ) y i( p ) ( L ) (t ) was the expectation output and the practical output of the ANN output layer’s ith output nerve cell under the pth training mode separately, NL was the nerve cell number of the network output layer. Some kind of the minimum rule was used just like the formula (2):
θ ( i +1) = θ (i ) + η (i ) s (i )
(2)
θ (i ) was the current power values space, s (i ) was the searching direction, η (i ) was the length of the searching step.
On Improved Performance Index Function
289
The superior power values set of the network was acquired, just like the formula (3):
θˆ = arg min J P (θ , Z P )
(3)
θ
Under the principle of mean error minimum, although the data of the training sample was synthesized preferable by ANN model, the compound performance of the test untrained or the working sample set was not very ideal, sometimes the error was very big, that meant that the generalization ability had to be developed. Therefore, after the quondam performance index function of the network was improved, the improved performance index function added with power values to the time delay information was got, whose structure was formula (4):
J P (θ , Z P ) =
1 2N
N
∑ (d N =1
( P) i
− yi( P ) ( L ) (t ))T (di( P ) − yi( P ) ( L ) (t )) +
1 T θ εθ 2N
(4)
Thereinto, ε was the delay time among the network’s power values parameter, ε matrix could be adopted, ε=αI. Considering that network structure parameter should be predigested as much as possible, just like the appropriate power values to the time delay information was joined on the basis of the performance index function in existence, which meant that large numbers of the different values of the network power values was added, then after time delay information was added by power values, some redundancy power values would minish and moreover be exterminated, consequently, the network structure would be retrenched and optimized and the generalization ability of the network model would be developed inevitably. In fact, it could be deemed that a kind of noise was added to input sample set indirectly, however, when the standard difference error of the input noise was lesser, which equaled to the change-if-just method of the ANN structure design [13-14] (normal coefficient was related to the standard error difference of the noise), after that, the affixation punishment item would produce the smoothness effect, and then, over fitting could be avoided, producing smooth in-out curve. So it was clear that adding network power values to the time delay information could develop the network generalization ability[4-5], [15].
4 Simulation Researches Acquiring the training or testing sample set: firstly, one BP MFN network (the number of the hidden layer was 1) with the structure of three layer was designed, which made that BP MFN network approach the nonlinear function 2π 3π f ( x) = 0.7 sin( x) + 0.8 cos( x) .The hidden layer transferring function of the 1.5 2.5 network adapt double curve tangent function, the transferring function of the output layer adapt the linear system. In the meantime, in order to improve the anti-jamming character of the network model, Gauss Noise whose mean value was 0 and square error was 0.2 was added to the nonlinear function, then simulation computation was carried
290
D. Qu, R. Yang, and Y. Mi
through. Based on that function, 300 groups of the training sample and 300 groups of the testing sample could be chosen which was shown in the Fig 1. Simulation: for the formula (4), ε=0.02 I, namely, the fixed delay information of 0.02 was added. After the simulation computation was carried through, its training error, testing error, generalized error FPE estimate and generalized error LOO [16-17] could be acquired, and then it was compared with the corresponding error of the no delayed network, its simulation data was shown in the Table 1. Training samples f(x Testing samples
x
f(x x Fig. 1. Sample datum of training and testing of the function f(x) Table 1. Simulation computation data table of ANN model generalization ability
hidden dimension With or not delay Train error Test error FPE error LOO error
15
10
6
No delay
Delay
No delay
Delay
No delay
Delay
0.016329 0.024789 0.022244 0.021733
0.018657 0.021853 0.021135 0.021141
0.017348 0.024447 0.021347 0.021098
0.018993 0.021764 0.021294 0.021276
0.018896 0.02077 0.021451 0.021294
0.019096 0.021037 0.021022 0.021034
From above Table 1, had been adopted the improved performance index function, though a few error data got bigger, but most of testing error, FPE error and LOO error of ANN identification model were reduced, and the generalization ability of the network model was improved. If there was not usable testing set, the FPE estimate of the training set could be adopted directly, namely was the FPE estimate of the generalized error. If the testing set was usable, its FPE estimate could also offer the important information of the ANN’s generalization ability. When the performance index function adopted the form of the formula (1) without normal, the formula (5) could be adopted to carry on simulation calculation to the FPE estimate of the generalized error. If not, the formula (6) could be instead of the formula (5) [18]
J FPE =
N +μ J (Wˆ )} N −μ
(5)
J FPE =
On Improved Performance Index Function
291
N + η1 J (Wˆ )} N + η 1 − 2η 2
(6)
Over here:
1 −1 ˆ ⎡ ˆ ˆ ˆ ⎤ ⎢ R (W )( R(W ) + N ε ) R (W )( R (W ) ⎥ η1 = tr ⎢ ⎥ ⎢+ 1 ε ) −1 ⎥ ⎣⎢ N ⎦⎥ ⎡
1
⎤
η 2 = tr ⎢ R(Wˆ )( R(Wˆ ) + ε ) −1 ⎥ N ⎣ ⎦
(7)
(8)
Thereinto, Wˆ was the power values of the network, μ was the general power values parameter of the network, N was the sum of the sample, η1 and η2 was the effective network parameter, and η1≈η2
, R(Wˆ ) was approximate Hessian matrix, tr was the
trace of its matrix. The formula (9) was adopted to calculate J (Wˆ ) : 1 J (Wˆ ) = 2N
N
∑ (d
( P) i
− y i( P ) ( L ) (t )) T (d i( P ) − y i( P ) ( L ) (t ))
(9)
N =1
Compared with the PEP estimate, the leaving one method based on the improved CV (Cross Validation) algorithm, namely was sample estimate LOO, could provide more accurate estimate of the ANN generalization ability. Its LOO adopted the formula (9) to carry on simulation calculation [5] ,[17].
1 J Loo = J (Wˆ )}, 2N ) W = arg min N −1 (W , Z N \ {u (t ), y (t )}) W
ż-training error; ×-testing error
Standard SSE Delay time information
Fig. 2. Simulation curve of ε error based on training and testing Sample datum (n=15)
(10)
292
D. Qu, R. Yang, and Y. Mi
ż-training error; ×-testing error
Standard SSE
Delay time information
Fig. 3. Simulation curve of ε error based on training and testing Sample datum (n=10) ż-training error; ×-testing error
Standard SSE Delay time information
Fig. 4. Simulation curve of ε error based on training and testing Sample datum (n=6)
In order to observe more how the delayed information ε among the ANN model’s power values affected the network generalization ability, the ANN structure model with different ε was simulated, in the meantime hidden layer dimension was set the network structure of 15,10 and 6 dimension separately to compare different hidden layer dimension’s identification effect to the nonlinear function. The simulation curve of its ANN model’s training /testing error together with the delayed information ε among the power values was showed in the figure 2~figure 4.
5 Conclusion Through above analysis and simulation, after the improved performance index function is adopted, and the generalization ability of the network is developed ,that is to say, the function is effective for developing the generalization ability of network; however, from simulation curve, it is clearly that only proper ε parameter being chosen is effective to improve the generalization ability of the network; as a method to improve the generalization ability of the ANN model, now it is confirmed by experience and related
On Improved Performance Index Function
293
simulation means on how to chose the proper ε parameter, which needs abundant experience and various simulation, so it needs to be discussed more on theory.
References 1. Wei, H.K.: Theory and Method of Framework Design of the Neural Networks. National Defense Industry Press, Beijing (2001) 2. Yuan, P.F.: Complexity of Capability and Learning and Computation of the Neural Networks. University of Tsinghua, Beijing (2001) 3. Yuan, P.F., Zhang, N.Y.: Neural networks and analog evolvement computation. University of Tsinghua, Beijing (2001) 4. Zhang, N.Y., Yuan, P.F.: Neural Networks and Fuzzy Control. University of Tsinghua, Beijing (1998) 5. Larsen, J., Hansen, L.K.: Generalization Performance of Regularized Neural Network Models. In: Proc. of the IEEE Workshop on Neural networks for Signal Proc. IV, Piscataway, New Jersey, pp. 42–51 (1994) 6. Mass, W.: Neural Nets with superlinear VC-dimension. Neural Computation, 877–884 (1994) 7. Girosi, F., Jones, M., Poggio, T.: Regularization Theory and Neural Network Architecture. Neural Computation, 219–269 (1995) 8. Williams, P.M.: Bayesian Regularization and Pruning using a Laplace Prior. Neural Computation, 117–143 (1995) 9. Sjoberg, J., Ljung, L.: Overtraining, Regularization, and Searching for Minimum in Neural Networks. In: Preprint IFAC Symp. on Adaptive Systems in Control and Signal Processing, Grenoble, France, pp. 669–674 (1992) 10. Mackay, D.: A Practical Bayesian framework for Backpropagation Networks. Neural Computation, 448-472 (1992) 11. Yang, O., Lin, Q.: The Discussion on Improving Generalization Ability of a Feedforward Neural Network. Journal of Nanpin Teachers College, 60–63 (2007) 12. Sjoberg, J., Ljung, L.: Overtraining, Regularization, and Searching for a Minimum, With Application to Neural Networks. International Journal of Control, 1391–1407 (1995) 13. Bishop, C.M.: Training with Noise is Equivalent to Tikhonov Regularization. Neural Computation, 108–116 (1995) 14. An, G.: The Effect of Adding Noise during Backpropagation Training on a Generalization Performance. Neural Computation, 643–671 (1996) 15. Krogh, A., Hertz, J.: A Simple Weight Decay Can Improve Generalization. NIPS 4, 950–957 (1992) 16. Norgaard, M.: Neural Network Based System Identification Toolbox, Ver. 2, Tech. Report 00-E-891, Department of Automation, Technical University of Denmark (2000) 17. Norgaard, M., Ravn, O., Poulsen, N.K.: Neural Networks for Modeling and Control of Dynamic Systems. Springer, London (2000) 18. Ljung, L.: System Identification –Theory for the User. Prentice-Hall, Englewood Cliffs (1987)
A Fault Diagnosis Approach for Rolling Bearings Based on EMD Method and Eigenvector Algorithm Jinyu Zhang and Xianxiang Huang Xi’an Research Institute of High-tech Xi’an, P.R. China [emailprotected]
Abstract. Fault diagnosis of rolling bearings is still a very important and difficult research task on engineering. After analyzing the shortcomings of current bearing fault diagnosis technologies, a new approach based on Empirical Mode Decomposition (EMD) and blind equalization eigenvector algorithm (EVA) for rolling bearings fault diagnosis is proposed. In this approach, the characteristic high-frequency signal with amplitude and channel modulation of a rolling bearing with local damage is first separated from the mechanical vibration signal as an Intrinsic Mode Function (IMF) by using EMD, then the source impact vibration signal yielded by local damage is extracted by means of a EVA model and algorithm. Finally, the presented approach is used to analyze an impacting experiment and two real signals collected from rolling bearings with outer race damage or inner race damage. The results show that the EMD and EVA based approach can effectively detect rolling bearing fault. Keywords: Empirical Mode Decomposition, Eigenvector Algorithm, Source Impact, Rolling Bearing, Fault Diagnosis.
1 Introduction Rolling bearings are very important and damageable components in rotating machinery. Many diagnosis methods of rolling bearings [1]-[6] have been presented. We know some reasons such as wear, fatigue, corrosion and overload may result in local damage faults of rolling bearings when machinery operates. An impacting takes place, and a natural vibration with special frequency is excited when load acts on a damage position. The natural vibrations are usually high frequency and work on AM mode. At the same time, the periodic impact itself also propagates in the machinery in wave mode. Therefore, the observed vibration signals on engineering are usually very complicated, in which they may include impacting, modulation, channel characteristic and noises. It is difficult to fulfill a right diagnosis. Many research results indicate the high frequency modulation signals are correlative to the fault types of rolling bearing, and separating these signal ingredients is the key in successful diagnosis. Therefore, the envelope analysis approach was proposed [1], [2], and have gained better effect, also have become traditional fault diagnosis method of rolling bearings. But in order to get better diagnosis, the envelope analysis approach first need to choose a correct D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 294–301, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Fault Diagnosis Approach for Rolling Bearings Based on EMD and EVA
295
narrowband filter with right center frequency and bandwidth to separate the high frequency signal, in which the requirement is too high to the spot engineers, and leads to fail to apply this technique. In order to solve the problem, many improved envelope approaches have been presented, e.g. the wavelet envelope approach [5] and EMD based envelope approaches [4], [6]. In wavelet envelope, the wavelet decomposition is used as band-pass filter group to separate the high frequency signal, then the envelope analysis is used to extract the fault features, finally, the faults of bearings are diagnosed. But the wavelet analysis also has shortcomings and lacks necessary flexibility. Unlike wavelet, the EMD based approach is a signal decomposition approach based on the local characteristic of signal. It may decompose a complicated signal into a sum of several limited number Intrinsic Mode Function (IMF). The frequency components of each IMF is not only correlative to sampling frequency, but also follows signal itself change. So EMD approach is an adaptive signal processing approach, and fit to the preprocessing and further envelope analysis. But this approach can not eliminate the effect of channel. Especially, the channel effect of an impacting produced by low speed and heavy load bearings with local damage can’t be ignored. Blind equalization is a good selection to balancing the channel effect [7], [8]. Combining the advantages of EMD and BE approaches, a new fault diagnosis approach based on EMD and BE EVA is proposed in this paper. The organization of this paper is as follows. In section 2 and 3, the basic algorithms of EMD and EVA are respectively reviewed. In section 4, a new fault diagnosis approach based on EMD and BE EVA is presented. In section 5, an experiment and two real bearing fault signals are examined. Finally, the conclusions are stated in section 6.
2 Empirical Mode Decomposition Empirical Mode Decomposition is a new signal decomposition technique [9]. It often proved remarkably effective for signal processing of bearing vibration [3]-[6]. The starting point of the EMD is to consider oscillations in signals at a very local level. In fact, if we look at the evolution of a signal x(t) between two consecutive extrema (say, two minima occurring at times t− and t+), we can heuristically define a (local) highfrequency part {d(t), t− ≤ t ≤ t+}, or local detail, which corresponds to the oscillation terminating at the two minima and passing through the maximum which necessarily exists in between them. For the picture to be complete, one still has to identify the corresponding (local) low-frequency part m(t), or local trend, so that we have x(t) = m(t) + d(t) for t− ≤ t ≤ t+. Assuming that this is done in some proper way for all the oscillations composing the entire signal, the procedure can then be applied on the residual consisting of all local trends, and constitutive components of a signal can therefore be iteratively extracted. Given a signal x(t), the effective algorithm of EMD can be summarized as follows [9]: 1) Identify all extrema of x(t); 2) Interpolate between minima (resp. maxima), ending up with some envelope emin(t) (resp. emax(t)); 3) Compute the mean m(t) = (emin(t)+emax(t))/2; 4) Extract the detail d(t) = x(t) − m(t); And 5) Iterate on the residual m(t).
296
J.Y. Zhang and X.X. Huang
In practice, the above procedure has to be refined by a sifting process [9] which amounts to first iterating steps 1 to 4 upon the detail signal d(t), until this latter can be considered as zero-mean according to some stopping criterion. Once this is achieved, the detail is referred to as an Intrinsic Mode Function (IMF), the corresponding residual is computed and step 5 applies. By construction, the number of extrema is decreased when going from one residual to the next, and the whole decomposition is guaranteed to be completed with a finite number of modes. Modes and residuals have been heuristically introduced on “spectral” arguments, but this must not be considered from a too narrow perspective. First, it is worth stressing the fact that, even in the case of harmonic oscillations, the high vs. low frequency discrimination mentioned above applies only locally and corresponds by no way to a pre-determined sub-band filtering. Selection of modes rather corresponds to an automatic and adaptive (signal-dependent) time-variant filtering.
3 EVA of Blind Equalization The fundamental idea of BE is to derive the equalizer characteristics from the received signal without knowing the source or the incoming channel of the received signals. As we know, the vibration generated by a defective component is often overwhelmed by noise and interference. Even though a sensor is intentionally mounted on a particular component of a machine in order to collect the vibration generated by that component, the collected signal is an aggregation of a number of vibrations. These undesirable vibrations are generated by other components which are adjacent to the inspected component. And they are transmitted to the sensor via mechanical linkages or paths. The periodic impulses we are interested in can also propagate internally in the machine to the sensor. The characteristic of the propagation path, which is named as composite channel, is unknown. However, one can assume that the periodic impulses acting to the machine internally are a propagation of vibration or stress waves. The composite channel can be described by the causal possibly mixed-phase response h(k). According to common theories in digital communications, one can establish a block diagram as shown in Fig. 1. n(k) d(k)
equalizer v(k)
h(k) composite channel
received sequence
e(k) FIR-(l)
x(k)
equalized sequence
Fig. 1. Block diagram of the blind equalization
The sequence of the received signal v(k) is the result of applying convolution to the original input signal d(k) with the impulse response system h(k) plus the disturbance or noise n(k) as:
v(k ) = h(k ) ∗ d (k ) + n(k ) = ∑h(i)d (k − i) + n(k ) . i
(1)
A Fault Diagnosis Approach for Rolling Bearings Based on EMD and EVA
297
In order to recover the original signal, a blind equalizer is added to the block diagram as shown in Fig. 1. The equalizer e(k) works as an inverse filter of h(k). Unlike any ordinary equalizer, the blind equalizer performs without knowing the original input or any known training sequence or desired signal as in any common adaptive filter. The equalized sequence x(k) is the result of convolution of the received sequence v(k) with the equalizer e(k) as:
x(k ) = e(k ) ∗ v(k ) .
(2)
The objective of creating such block diagram and the model is to find the Finite Impulse Response (FIR) of the equalizer or the inverse filter e(k) that will make x(k) as close as possible to the delayed original signal d(k-k0) defined in the equation of Mean Square Error (MSE) as: (3)
2
MSE (e, k 0 ) = E[ x(k ) − d (k − k 0 ) ] , where k0 is the delay of the input signal.
There are a number of algorithms developed for blind equalization. The blind equalization using generalized eigenvector algorithm (EVA) is adopted here [10]. This algorithm uses a virtual equalizer f(k), which it is similar to e(k) as shown in Figure 1, as a reference system for estimating its output y(k). The EVA takes the twodimensional fourth order cross-cumulant as a maximum “cross-kurtosis” quality function. A closed-form expression in guise of generalized eigenvector problem can be derived by means of optimizing the above problem by:
C 4yv e EVA = λRvv e EVA ,
(4)
C 4yv is the cross-cumulant matrix. The coefficient vector eEVA =[eEVA(0),L, eEVA(l)]T is obtained by choosing the eigenvector of Rvv−1C 4yv ,
where
which is associated with the maximum magnitude eigenvalue λ, is called the “EVA-(l) solution” as commonly defined in the field of blind equalization. In addition, Jelonnek et al [10] has proven that this is the case that the magnitude of combined impulsive response w(k)=h(k)*f(k) if only if it has adopted its maximum value wm=max{|w(k)|} only once, then the EVA-(l) solution is unique. In practice, an iterative adjustment to the coefficients of the reference system is required to guarantee the unique condition. The EVA comprises the following steps: 1) Set the virtual equalizer or the reference system to f (k) = δ (k − ⎣l / 2⎦) and the iteration counter to i=0. From v(0), …, v(L-1), estimate the ˆ . (l+1)×(l+1) matrix Rvv ⇒ R vv yv (i) ˆ yv . 2) Determine y(k)=v(k)*f (k) and estimate the (l+1)×(l+1) matrix C4 ⇒C 4 ˆ and Cˆ yv substituted by R and C yv in (4), calculate e by 3) With R vv 4 vv 4 EVA ˆ −1Cˆ yv . Let e ( i ) (k ) denote choosing the most significant eigenvector of R vv EVA 4 the equalizer coefficients associated with e EVA . And (0)
298
J.Y. Zhang and X.X. Huang ( i +1)
4) Load eEVA ( k ) into the reference system, that is, let f (k ) = eEVA (k ) . Increment the iteration counter to i=i+1. As long as i< I repeat steps 1) to 4), or else stop the iteration. (i )
(i )
4 Fault Diagnosis Approach Based on EMD and EVA Because the Intrinsic Mode Function IMF1, IMF2, …, IMFn from EMD respectively represent Intrinsic oscillation mode of signals, and they adaptively contain the component from high frequency to low frequency in turn, also possess good orthogonality [9]. Therefore, the Intrinsic Mode Functions from EMD at least conclude a periodic impulsive impacting dominant signal we are very interested in. the EVA of blind equalization is a deconvolution algorithm. It may effectively balance the convolution of signal, and extract pure original impacting signal. So the combination of these two algorithms may effectively use themselves advantages to reduce all kinds of noise and interference. And it may extract purer original impacting signal to get an exact diagnosis. The basic steps of this approach are as follows. 1) EMD approach is used to separate the observed signal of bearing into several intrinsic mode functions IMF and residual R; 2) Choose a correct IMFi that contain high frequency impulsive impacting signal, apply EVA to it and extract the impacting signal; And 3) Compute the fault features of the impacting signal by time or frequency domain analysis methods, and diagnosed the fault.
5 Results 5.1 Tests on Extracting Impulses In order to demonstrate the performance of the approach in extracting impulses embedded in the collected vibration signal, an experiment was conducted. The experiment generated a series of impulses at a near equal time interval period by hammering a big worktable using a hammer used for impact test. An accelerometer was placed on the table to collect the vibration triggered by the hammer. The vibration waveform of a working plant on the worktable was induced as background noise to the collected impulses. The observed signal of impulse series and noise is displayed in the top diagram of Fig. 2. The extracted signal is shown in the bottom diagram of Fig. 2. It is obvious that the impulse series have been extracted with small residuals. The experiment has demonstrated that the proposed approach is capable of extracting impulses from vibration signals with strong background noise. 5.2 An Inner Race Fault of Rolling Bearing We know that a periodic impacting vibration is excitated when rolling components of bearing roll through an inner race bearing with small local damage such as crack,
A Fault Diagnosis Approach for Rolling Bearings Based on EMD and EVA
299
Observed Signal 10
V
5 0 -5 -10
1000
2000
3000
4000
5000
6000
5000
6000
Extracted Impacting Signal 10
V
5 0 -5
1000
2000
3000 ms
4000
Fig. 2. Experiment Result
spalling and indentation. In order to examine the approach, we choose two real bearing faulty data. The bearing was seeded with fault using electro-discharge machining. Fault ranging from 0.18 mm to 0.5 mm in diameter were introduced separately at the inner raceway, and outer raceway. Firstly, we choose the signal of rolling bearing in which we have known its fault is in inner raceway in advance. Observed Signal A/V
10 0 -10
500 1000 Extracted Impacting Signal
1500
500 1000 Impulsive Feature
1500
500 1000 Sampling Series
1500
A/V
10 0 -10
A/V
10 0 -10
Fig. 3. Result of an inner race fault
The top waveform and middle waveform shown on Fig. 3 are respectively the observed signal and the extracted impulsive impacting signal using our approach. We may find, because of the effect of composite channel and strong background noise, the observed signal structure is very complicated, the feature of impacting signal is inexplicit. It is difficult to extract the feature and diagnose the fault. After using our
300
J.Y. Zhang and X.X. Huang
approach to the signal, the impulsive impacting features become clearer and the periodicity is better. Finally, a common wavelet soft threshold denoise method is applied to the extracted signal. The bottom waveform of the fig.3 is the finally results. It is obvious that the impacting features are clearer. We may find the average impacting frequency is 110.7Hz by a simple computation. This is very close to the real fault characteristic frequency, 110.5Hz, of the bearing with inner race damage. The impacting force is also clear. 5.3 An Outer Race Fault of Rolling Bearing The fault diagnosis of an outer raceway damage is similar to inner race. The three subfigures (top, middle and bottom) of fig. 4 is respectively a real observed signal of the bearing with outer race damage mentioned in previous subsection, the extracted signal and final original impacting feature signal. Similarly, we may find, after applying our approach to the observed signal, the impacting fault signal distorted by other vibration, noise and structure is effectively extracted, the noise is fully reduced, and the impacting period stand out. The average impacting frequency is 72.7Hz. It is rather close to fault characteristic frequency, 73.1Hz, of this rolling bearing with outer race damage. The impacting energy is also clear. Observed Signal
A/V
10 0 -10
500 1000 Extracted Impacting Signal
1500
500 1000 Impulsive Feature
1500
500 1000 Sampling Series
1500
A/V
10 0 -10
A/V
10 0 -10
Fig. 4. Result of an inner race fault
6 Conclusions Basing on previous results, we may distinctly find that the proposed approach is fully efficient to impacting faults of rolling bearings. The original impacting characteristic that correlates to local damage may be effectively extracted from complicated machinery vibration and strong background noise. Because the approach only needs one sensor and does not restrict the observed position, this approach is valuable in practice.
A Fault Diagnosis Approach for Rolling Bearings Based on EMD and EVA
301
References 1. Brown, D.N.: Envelope Analysis Detects Bearing Faults before Major Damage Occurs. Pulp. and Paper 63, 113–117 (1989) 2. Radcliff, G.A.: Condition Monitoring of Rolling Element Bearings Using the Enveloping Technique. Machine Condition Monitoring 23, 55–67 (1990) 3. Rubini, R., Meneghetti, U.: Application of the Envelope and Wavelet Transform Analyses for the Diagnosis of Incipient Faults in Ball Bearings. Mechanical System and Signal Processing 15, 287–302 (2001) 4. Yonggang, X., Zhengjia, H., Taiyong, W.: Envelope Demodulation Method Based on Empirical Mode Decomposition with Application. Journal of Xian Jiaotong University 38, 1169–1172 (2004) 5. Qiang, G., Xiaoshan, D., Hong, F.: An Empirical Mode Decomposition Based Method for Rolling Bearing Fault Diagnosis. Journal of Vibration Engineering 20, 15–18 (2007) 6. Yong, L., You-rong, L., Zhi-gang, W.: Research on a Extraction Method for Weak Fault Signal and Its Application. Journal of Vibration Engineering 20, 24–28 (2007) 7. Tse, P.W., Zhang, J.Y., Wang, X.J.: Blind Source Separation and Blind Equalization Algorithms for Mechanical Signal Separation and Identification. Journal of Vibration and Control 12, 395–423 (2006) 8. LEE, J.-Y., NANDI, A.K.: Extraction of Impacting Signals Using Blind Deconvolution. Journal of Sound and Vibration 232, 945–962 (2000) 9. Huang, N.E., Shen, Z., Long, S.R., Wu, M.C., Shih, H.H., Zheng, Q., et al.: The Empirical Mode Decomposition and Hilbert Spectrum for Nonlinear and Nonstationary Time Series Analysis. Proc. Roy. Soc. London A 454, 903–995 (1998) 10. Jelonnek, B., Boss, D., Kammeyer, K.D.: Generalized Eigenvector Algorithm for Blind Equalization. Signal Processing 61, 237–264 (1997)
An Adaptive Fault-Tolerance Agent Running on Situation-Aware Environment SoonGohn Kim1 and EungNam Ko2 1 Division of Computer and Game Science, Joongbu University, 101 Daehakro, Chubu-Meon, GumsanGun, Chungnam, 312-702, Korea [emailprotected] 2 Division of Information & Communication, Baekseok University, 115, Anseo-Dong, Cheonan, Chungnam, 330-704, Korea [emailprotected]
Abstract. The focus of situation-aware ubiquitous computing has increased lately. An example of situation-aware applications is a multimedia education system. Since ubiquitous applications need situation-aware middleware services and computing environment keeps changing as the applications change, it is challenging to detect errors and recover them in order to provide seamless services and avoid a single point of failure. This paper proposes an Adaptive Fault Tolerance Agent (AFTA) in situation-aware middleware framework and presents its simulation model of AFT-based agents. The strong point of this system is to detect and recover error automatically in case that the session’s process comes to an end through a software error.
1 Introduction We can describe ubiquitous computing as the combination between mobile computing and intelligent environment is a prerequisite to pervasive computing [1]. Context awareness is an application software system’s ability to sense and analyze context from various sources; it lets application software take different actions adaptively in different contexts [2]. In a ubiquitous computing environment, the concept of situation-aware middleware has played very important roles in matching user needs with available computing resources in transparent manner in dynamic environments [3, 4]. Although the situation-aware middleware provides powerful analysis of dynamically changing situations in the ubiquitous computing environment by synthesizing multiple contexts and users’ actions, which need to be analyzed over a period of time, it is difficult to detect errors and recover them for seamless services and avoid a single point of failure. Thus, there is a great need for fault-tolerance algorithm in situationaware middleware to provide dependable services in ubiquitous computing. This paper proposes an Adaptive Fault-Tolerance Agent (AFTA) model for situation-aware ubiquitous computing. The model aims at detecting, classifying, and recovering errors automatically. Section 2 describes situation-aware middleware as the context and fault tolerance. Section 3 denotes the AFTA architecture and algorithm. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 302–309, 2008. © Springer-Verlag Berlin Heidelberg 2008
An Adaptive Fault-Tolerance Agent Running on Situation-Aware Environment
303
Section 4 describes simulation results of our proposed AFTA model. Section 5 presents conclusions.
2 The Context: Situation-Aware Middleware and Fault Tolerance A conceptual architecture of situation-aware middleware based on Reconfigurable Context-Sensitive Middleware (RCSM) is proposed in [2]. Ubiquitous applications require use of various contexts to adaptively communicate with each other across multiple network environments, such as mobile ad hoc networks, Internet, and mobile phone networks. However, existing context-aware techniques often become inadequate in these applications where combinations of multiple contexts and users’ actions need to be analyzed over a period of time. Situation-awareness in application software is considered as a desirable property to overcome this limitation. In addition to being context-sensitive, situation-aware applications can respond to both current and historical relationships of specific contexts and device-actions.
Situation-Aware Application Objects RCSM
Optional Components RCSM Ephemeral Group Communication Service
Other Services
Core Components
O S
Adaptive Object Containers (ADCs) [Providing awareness of situation] RCSM Object Request Broker (R-ORB) [Providing transparency over ad hoc communication]
Transport Layer Protocols for Ad Hoc Networks
Sensors
Fig. 1. RCSM’s integrated components
All of RCSM’s components are layered inside a device, as shown in Figure 1. The Object Request Broker of RCSM (R-ORB) assumes the availability of reliable transport protocols; one R-ORB per device is sufficient. The number of ADaptive object Containers (ADC)s depends on the number of context-sensitive objects in the device. ADCs periodically collect the necessary “raw context data” through the R-ORB, which in turn collects the data from sensors and the operating system. Initially, each ADC registers with the R-ORB to express its needs for contexts and to publish the corresponding context-sensitive interface. RCSM is called reconfigurable because it allows addition or deletion of individual ADCs during runtime (to manage
304
S. Kim and E. Ko
new or existing context-sensitive application objects) without affecting other runtime operations inside RCSM [2]. However, it did not include fault-tolerance support in the architecture. In this paper, we propose a new fault-tolerance capability, called “Adaptive Fault-Tolerance Agent (AFTA)”, in situation-aware middleware. The field of fault-tolerant computing has evolved over the past twenty-five years. Generally, fault-tolerance system can be classified as software techniques, hardware techniques and composite techniques [5, 6]. The tolerance of software faults is in most cases more difficult than dealing with hardware faults since most software-fault mechanisms are not well understood and do not lend themselves readily to “nice” techniques such as error coding [7, 8]. Two different techniques for achieving faulttolerance in software have been discussed in the recent literature: the recovery block and N-version programming [8]. In the latter a number (N>=2) of independently coded programs for a given function are run simultaneously (or nearly so) on loosely coupled computers, the results are compared, and in case of disagreement a preferred result is identified by majority vote (for N > 2) or a predetermined strategy [9]. The recovery block technique can be applied to a more general spectrum of computer configurations, including a single computer(which may also include hardware faulttolerance) [10]. The DRB(Distributed Recovery Block) scheme originally proposed by Kim as a technique for unified treatment of both hardware and software faults and for efficiently utilizing hardware and software resources. The basic concept of the DRB is based on a combination of both the distributed processing and the RB scheme, which enables concurrent execution of try blocks. In the DRB, both the primary and backup nodes are consisted of two try blocks, i.e., a primary try block and a backup try block. These blocks must produce the same input data but they are not identical try blocks. If an error takes place in a try block due to a residual design inadequacy, the identical copy of that primary block cannot be expected to produce a correct result [11]. In spite of this current trend, however, study on fault-tolerance of application software has not actually been enough [9,12,13]. It is difficult to detect errors and recover them for seamless services and avoid a single point of failure by using conventional method. But it can be to detect errors and recover them for seamless services and avoid a single point of failure by using proposed method.
3 Adaptive Fault-Tolerance Agent (AFTA) In this section, we present an Adaptive Fault-Tolerance Agent (AFTA) model for situation-aware ubiquitous computing. The AFTA architecture is presented in Section 3.1 and its algorithm in Section 3.2. 3.1 The AFTA Architecture As shown in Figure 2, AFTA consists of AMA, UIA, SMA, ACA, MCA and FTA. AMA consists of various subclass modules. It includes creation/deletion of shared video window and creation/deletion of shared window. UIA is an agent which plays a
An Adaptive Fault-Tolerance Agent Running on Situation-Aware Environment
305
Situation-Aware Application Objects
A M A
U I A
S M A
A C A
M C A
F T A
Other Services ADCs / R-ORB Transport Layer Protocols for Ad
Fig. 2. AFTA Architecture
role as an interface to interact between the user and FTA. SMA is an agent which plays a role in connection of other agent and FTA as management for the whole information. ACA controls the person who can talk, and the one who can change the information. MCA support convenient application using situation-aware ubiquitous computing. Supplied services are the creation and deletion of the service object for media use, and media share between the remote users. This agent limits the services by hardware constraint. FTA is an agent that plays a role in detecting an error and recovering it. 3.2 The Algorithm of AFTA SMA monitors the access to the session and controls the session. It has an object with a various information for each session and it also supports multitasking with this information. SMA consists of Global Session Manager (GSM), Daemon, Local Session Manager (LSM), Participant Session Manager (PSM), Session Monitor ,and Traffic Monitor. GSM has the function of controlling whole session when a number of sessions are open simultaneously. LSM manages only own session. For example, LSM is a lecture class running on situation-aware middleware in distributed multimedia environment. GSM can manage multiple LSM. Daemon is an object with services to create session. As shown in Figure 3, you can see the single session relationship among a FTA, GSM, LSM, PSM and the application software on Situation-Aware Middleware. Our proposed AFTA model aims at supporting fault –tolerance requirements by detecting errors and recovering them in order to provide seamless services and avoid a single point of failure. An example of situation-aware applications is a multimedia education system. The development of multimedia computers and communication techniques has made it possible for a mind to be transmitted from a teacher to a student in distance environment.
306
S. Kim and E. Ko
<SMA>
(3) (4)
(5)
(7)
< Participant Session Manager>
Other Services (DOORAE Agent Layer) Fig. 3. SMA and FTA Architecture on Situation-Aware Environment
To ensure required reliability for situation-aware ubiquitous computing automatically, FTA consists of 3 steps that are an error detection, an error classification, and an error recovery. FTA consists of FDA, FCA, and FRA. FDA has a function of error detection. FCA has a function of error classification. FRA has a function of error recovery. To ensure required reliability of multimedia communication systems, FTA consists of 3 steps that are an error detection, an error classification, and an error recovery. We are first in need of a method to detect an error for session’s recovery. One of the methods to detect an error by using hooking techniques in MS-windows API(Application program Interface). When an error occurs, a hook is a point in the Microsoft Windows message-handling mechanism where an application can install a subroutine to monitor the message traffic in the system and process certain types of message before they reach the target window procedure. Windows contains many different types of hook. FCA is an agent that plays a role as an interface to interact between FDA for detection and FRA for recovery. FCA has a function which classifies the type of errors by using learning rules. After a system is detected and classified, it processes recovery. First it is decided whether it is hardware error or software error. In case of software error, it can be recoverable. The scheme of error recovery method is different each other. It can be classified as many cases. In unrecoverable case, the system has to be restarted by
An Adaptive Fault-Tolerance Agent Running on Situation-Aware Environment
(1)
307
GSM
(13) (4) Local Daemon
(5)
Remote Daemon
(8) (2)
(9) (6)
(7)
(3) LSM
(10) (12) (13)
Participant SMA
FTA (12)
Application
Media Server Media Server Instance (11)
Fig. 4. Relationship Architecture between FTA and SMA
manual when error occurred in hardware resources. In recoverable case, recoverable case classified as state insensitive and state sensitive. This approach has no considerat ion of domino effect between processes. After a system is detected and classified, it processes recovery as shown in Figure 4. First it is decided whether it is hardware error or software error. In case of software error, it can be recoverable. If an error is to be recoverable, you can create sequences below. FTA requests to GSM session information. GSM give response FTA session information. FTA requests to Daemon for recovery. Daemon announce to RemoteDaemon for recovery. Remote-Daemon announce to Participant Session Manager for recovery. Remote-Daemon receives an acknowledgement for recovery packet. Daemon receives an acknowledgement for recovery packet. Daemon creates Local Session Manager. Local Session Manager create Media server. Media server create Media server Instance. Media server Instance make an acknowledge to Local Session Manager. LSM creates application. Daemon informs GSM of an information for recovery. The strong point of this system is to detect and recovered automatically in case that the session’s process come to an end from software error.
308
4
S. Kim and E. Ko
Simulating AFTA
The AFTA simulation model has been implemented by using VISUAL C++. To evaluate the performance of the proposed system, an error detection method was used to compare the performance of the proposed model against the conventional model by using DEVS(Discrete Event System Specification) formalism. The DEVS formalism is a theoretical, well grounded means of expressing hierarchical, modular discrete event models. In DEVS, a system has a time base, inputs, states, outputs based on the current states and inputs. The structure of atomic model is as follows [15,16,17]. Before system analysis, the variable that is used in this system is as follows. The letter Poll-int stands for “polling interval”. The letter App-cnt stands for “The number of application program with relation to FTA session”. The letter App_cnt2 stands for “The number of application program without relation to FTA session”. The letter Sm-t-a stands for “ The accumulated time to register information in SM”. We can observe the result value through transducer. Conventional method: 2*Poll_int*App_cnt Proposed method: 1*Poll_int Therefore, proposed method is more efficient than conventional method in error detected method in case of App-cnt > 1. We have compared the performance of the proposed method with conventional method. The merit of AFTA detects an error by using hook techniques. During process of FTE session, Media Service Instance comes to an end abnormally at times. In this case, the session’s process can come to an end, but it is necessary to protect the user from error by reactivating the Media service instance. We are first in need of a method to detect error for session’s recovery. AFTA is a multi-agent system which is implemented with object oriented concept. This system detects an error by using hook techniques and classifies an error by using polling periodically processes with relation to sessions. And, it is to classify the type of errors automatically by using learning rules. The characteristic of this system is to use the same method to get back again it as it creates a session. The strong point of this system is to detect and recover error automatically in case that the session’s process comes to an end through a software error.
5 Conclusions This paper proposes an Adaptive Fault Tolerance Agent(AFTA) algorithm in situation-aware middleware framework and presents its simulation model of AFT-based agents. AFTA is a system that is suitable for detecting, classifying ,and recovering software error based on distributed multimedia education environment as FTA by using software techniques. This method detects an error by using hooking techniques. The purpose of this research is to return to a healthy state or at least an acceptable state for FTA session. It is to recover application software running on situation-aware ubiquitous computing automatically. The purpose of AFTA system is to maintain and recover for FTA session automatically. In the future work, fault-tolerance system will be generalized to be used in any environment, and we will progress the study of
An Adaptive Fault-Tolerance Agent Running on Situation-Aware Environment
309
domino effect for distributed multimedia environment as an example of situationaware applications.
References 1. Hung, N.Q., Ngoc, N.C., Hung, L.X., Lei, S., Lee, S.Y.: A Survey on Middleware for Context-Awareness in Ubiquitous Computing Environments. Korea Information Processing Society Review, 97–121 (2003) 2. Yau, S.: Reconfigurable Context-sensitive Middleware for pervasive Computing. In: IEEEE Pervasive Computing, 33–40 (July-September 2002) 3. Yau, S.S., Karim, F.: Adaptive Middleware for Ubiquitous Computing Environments. Design and Analysis of Distributed Embedded Systems. In: Proc. IFIP 17th WCC, vol. 219, pp. 131–140 (August 2002) 4. Yau, S.S., Karim, F.: Contention-Sensitive Middleware for Real-time Software in Ubiquitous Computing Environments. In: Proc. 4th IEEE Int’l Symp. Object-Oriented Real-time Distributed Computing (ISORC 2001), pp. 163–170(May 2001) 5. Victor, P., Bill, D.: Fault- Tolerant Computing: Introduction to Fault-Tolerant Computing. Ch. 1, IEEE Computer Society Order Number 677,Library of Congress Number 86-46205, IEEE Catalog Number EH0254-3, ISBN 0-8186-0677-0 6. Dhiraj, K.: Pradhan: Fault-Tolerant Computer System Design. Prentice Hall, Englewood Cliffs (1996) 7. Victor, P., Bill, D.: Carroll: Fault- Tolerant Computing: Soware Fault Tolerance, Ch. 5, IEEE Computer Society Order Number 677, Library of Congres Number 86–46205, IEEE Catalog Number EH0254-3, ISBN 0-8186-0677-0 8. Krishna, C.M., Lee, Y. H.: Guest editor’s Introduction: Real-time Systems. Comput. (May 1991). 9. Elmendorf, W.R.: Fault-tolerant programming. In: Digest of the 1972 International Symposium on Fault-Tolerant Computing, pp. 79–83 (1972) 10. Randell, B.: System structure for software fault-tolerance. IEEE Trans. Software Engineering SE-1, 220–232 (1975) 11. Randell, B.: System Structure for Software Fault Tolerance. IEEE Trans. Sofware Engineer SE-1(12), 116–1129 (1984) 12. Watabe, K., Sakata, S., Maeno, K., Fukuoka, H., Ohmori, T.: Distributed Desktop Conferencing System with Multi-user Multimedia Interface. IEEE JSAC 9(4), 531–539 (1991) 13. Hecht, H.: Fault-tolerant software for real- time application. ACM Computing Surveys 8, 391–407 (1976) 14. Hagan, M.T., Demuth, H.B., Beale, M.: Neural Network Design, pp. 4–3. PWS Publishing Company (1996) 15. Zeigler, B.P.: Object-Oriented Simulation with Hierarchical, Modular Models. Academic Press, San Diego (1990) 16. Cho, T.H., Zeigler, B.P.: Simulation of Intelligent Hierarchical Flexible Manufacturing: Batch Job Routing in Operation Overlapping. IEEE Trans. Syst. Man, Cybern. A 27, 116– 126 (1997) 17. Zeigler, B.P., Cho, T.H., Rozenblit, J.W.: A Knowledge-based Environment for Hierarchical Modeling of Flexible Manufacturing System. IEEE Trans. Syst. Man,Cybern. A 26, 81–90 (1996)
Dynamic Neural Network-Based Pulsed Plasma Thruster (PPT) Fault Detection and Isolation for Formation Flying of Satellites A. Valdes and K. Khorasani Department of Electrical and Computer Engineering Concordia University Montreal, QC H3G 1M8, Canada {a_valde,kash}@ece.concordia.ca
Abstract. The main objective of this paper is to develop a dynamic neural network-based fault detection and isolation (FDI) scheme for the Pulsed Plasma Thrusters (PPTs) that are used in the Attitude Control Subsystem (ACS) of satellites that are tasked to perform a formation flying mission. By using data collected from the relative attitudes of the formation flying satellites our proposed “High Level” FDI scheme can detect the pair of thrusters which is faulty, however fault isolation cannot be accomplished. Based on the “High Level” FDI scheme and the DNN-based “Low Level” FDI scheme developed earlier by the authors, an “Integrated” DNN-based FDI scheme is then proposed. To demonstrate the FDI capabilities of the proposed schemes various fault scenarios are simulated. Keywords: fault detection and isolation, dynamic neural networks, formation flying, pulsed plasma thrusters.
1 Introduction Development of a fault detection and isolation (FDI) scheme for unmanned space vehicles is a challenging problem. Traditionally, near-Earth unmanned spacecraft sends periodic batch of data to ground stations where the data is analyzed in order to determine the health status of the on-board subsystem. When a fault is detected, additional analyses must be performed to isolate the fault. This entire process is a time-consuming task which is also very costly. Due to these considerations, there is a real interest in developing autonomous fault diagnostic approaches for on-board spacecraft subsystems especially for the attitude control subsystem (ACS. Literature on FDI for spacecraft provides various scenarios for faulty components of the ACS of single spacecraft ([1]-[7]). However, there is practically no work on FDI for multiple spacecraft missions such as formation flying spacecraft. The performance of the formation flying system is determined by the precision of the maneuvers performed by each spacecraft in the formation. The attitude maneuvers are performed by the attitude control subsystem (ACS) of the spacecraft which is composed of the formation control law, sensors and actuators (e.g. thrusters). D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 310–321, 2008. © Springer-Verlag Berlin Heidelberg 2008
Dynamic Neural Network-Based PPT Fault Detection and Isolation
311
Malfunctions in any of these components can affect the performance of the formation. Therefore, early detection of faults and isolation of faulty components becomes extremely important. Pulsed Plasma Thruster (PPT) is a type of actuator that is used for attitude control of various formation flying missions ([8]-[11]). In this paper, an FDI scheme based on dynamic neural networks (DNN) is developed. Based on relative attitudes of the formation flying spacecraft, our proposed FDI scheme is capable of detecting which spacecraft is affected by the fault. An integrated FDI scheme composed of a “High Level” FDI scheme and a “Low Level” FDI scheme developed in [12] is proposed. The resulting “Integrated” FDI scheme can take advantage of the strengths of each scheme and at same time reduce their weaknesses. In order to demonstrate the capabilities of our proposed FDI schemes, a formation flying under various fault scenarios is analyzed. The results demonstrate that the “Integrated” FDI scheme exhibits better fault detection and isolation capabilities than individual “Low” and “High” level FDI schemes. The paper is organized as follows: In Section 2, the formation flying system using pulsed plasma thrusters (PPTs) considered in this paper is presented. In Section 3, the dynamic neural network-based FDI scheme which uses data collected from the ACS of formation flying spacecraft is developed. In order to test our proposed FDI scheme, simulated fault scenarios are performed and the results are presented and discussed at the end of this section. In Section 4, the DNN-based FDI scheme developed in [12] is introduced, its strengths and weaknesses are discussed and the motivations for proposing an integrated FDI scheme are established. The development of our “Integrated” FDI scheme is presented and its performance is evaluated and compared to the scheme proposed in [12]. In Section 5, the conclusions and the contribution of our developed FDI scheme proposed in this paper are presented.
2 Formation Flying Satellites Formation flying is a type of mission in which each spacecraft must perform precise orbital and attitude maneuvers in order to fulfill the mission requirements. From the control point of view, a formation of spacecraft can be defined as “a set of more than one spacecraft in which any of the spacecraft dynamic states are coupled through a common control law”. This definition is complemented with the following two conditions: at least one spacecraft of the formation must (i) track a desired state profile relative to another satellite, and (ii) the associated control law must at minimum depend upon the state of this other satellite. The formation flying control architecture can be categorized into: leader/follower, virtual structure, multiinput/multi-output (MIMO), cyclic and behavioral. In the leader/follower architecture, one of the spacecraft is designed as the leader and the rest are followers. The leader control system uses absolute measurements to perform its maneuvers meanwhile the followers use relative measurements. Each follower has individual controllers that are connected hierarchically with the leader reducing the formation flying control to individual tracking controllers. Followers must change their positions and attitudes based on the position and the attitude of the leader. In this paper, we propose a near-Earth formation flying mission that is composed of three spacecraft with the leader/follower control architecture. To perform rotational
312
A. Valdes and K. Khorasani
maneuvers, our proposed formation flying mission uses the so-called six-independent pulsed plasma thrusters (PPT) configuration. In this configuration each PPT only generates a torque about a single axis of the spacecraft where independent control actuation is achieved. Specifically, the thrusters PPT1, PPT2, PPT3, PPT4, PPT5 and PPT6 generate torque in the +x-axis, -x-axis, +y-axis, -y-axis, +z-axis and –z-axis directions, respectively. The proposed formation flying mission and the sixindependent PPT configuration is shown in Figure 1. S Cl
S Cf2
PPT5 ybody
PPT2 PPT4
S C f1 zbody
xbody
PPT3 PPT6
PPT1
Fig. 1. Near-Earth formation flying mission composed of three six-independent PPT configuration spacecraft (S/Cl: leader spacecraft, S/Cf1: follower 1 spacecraft, and S/Cf2: follower 2 spacecraft)
Pulsed plasma thrusters (PPTs) are accurate, inexpensive and simple actuators that can be used for different purposes such as station-keeping, attitude control and orbit insertion and drag make-up. As shown in Figure 2, the main components of the PPT are the capacitor, the electrodes, the igniter and the spring. Once the igniter is discharged, the capacitor voltage that appears across the electrodes creates a current which ablates and ionizes the fuel bar into a plasma slug. Finally, the plasma is accelerated by the Lorentz force (J x B) due to the discharge current and the magnetic field. Anode Fuel Bar
Thrust Igniter Cathode
Spring Capacitor
Fig. 2. Six-independent pulsed plasma thruster (PPT) configuration
During normal operations, only the electrical variables (i.e. capacitor voltage and discharge current) and temperature of the PPT thrusters are measurable. As indicated in [12], typically PPT thrusters are grouped into pairs sharing the same capacitor and therefore both PPTs cannot generate thrust pulses at same time. By means of magnetometers and gyroscopes sensors the leader spacecraft (S/Cl) can measure its absolute angular rotations and velocities. For the follower spacecraft (i.e. S/Cf1 and S/Cf2) it is necessary to measure the relative attitudes with respect to the
Dynamic Neural Network-Based PPT Fault Detection and Isolation
313
leader, therefore follower spacecraft must be equipped with Autonomous Formation Flying (AFF) sensors [13]. Beside these measurements, the number of pulses generated by each PPT and the instant (time) when pulses are generated are also recorded by each spacecraft. This operational register is a three-state signal where “+1” represents a pulse generated in the positive direction of the i-th axis and “-1” represents a pulse in the negative direction. No pulses are represented by state “0”. Table 1 shows the set of variables defined above (where l represents the leader spacecraft and f,j represent the j-th follower spacecraft with j=1 or 2). Table 1. Attitude Variables and Sequence of Pulses of the j-th Follower Spacecraft Variable f ,j l q1 f ,j l
:
Description Angular rotation about the x-axis (S/Cf,j w.r.t. S/Cl)
q2
:
Angular rotation about the y-axis (S/Cf,j w.r.t. S/Cl) Angular rotation about the z-axis (S/Cf,j w.r.t. S/Cl) angular velocity about the x-axis (S/Cf,j w.r.t. S/Cl)
q3
:
f ,j l
Δω x
:
f ,j l
Δω y
:
angular velocity about the y-axis (S/Cf,j w.r.t. S/Cl)
f ,j l
Δω z
:
Angular velocity about the z-axis (S/Cf,j w.r.t. S/Cl)
f ,j l PPT 1 / PPT 2
:
Sequence of pulses about the x-axis (S/Cf,j w.r.t. S/Cl)
f ,j l PPT 3 / PPT 4
:
f ,j l PPT 5 / PPT 6
:
Sequence of pulses about the y-axis (S/Cf,j w.r.t. S/Cl) Sequence of pulses about the z-axis (S/Cf,j w.r.t. S/Cl)
f ,j l
T T T
The fault diagnostic analysis, development and experiments that are presented in the following sections are performed for the formation flying satellites that are introduced in this section.
3 Formation Flying Fault Detection and Isolation System In this section an FDI approach for the formation flying problem is developed. Dynamic neural networks are employed to model the relative attitude of follower spacecraft with respect to the leader spacecraft in a formation flying mission. Using this neural network model, residual signals are generated for detecting the existence of faults in the actuators of the followers. An important advantage of this FDI scheme is that only data from the follower’s ACS is used to detect faults in the actuators. 3.1 Design of Neural Network FDI Scheme The neural network considered in this paper is a multilayer perceptron network with dynamics neurons. As presented in [14]-[19] these special neurons allow the network to achieve dynamics properties. Figure 3 shows the general structure of the so-called Dynamic Neuron Model (DNM) [16]-[27]. The set [u1(k), u2(k),……, un(k)]T and W = [w1, w2,……, wn]T are the input and weight vectors, respectively. An Infinite Impulse Response (IIR) Filter is introduced
314
A. Valdes and K. Khorasani u1 ( k ) u2 ( k ) un ( k )
w1
% k) y(
x( k )
w2
Σ
IIR Filter
y( k ) g F(⋅)
wn
Fig. 3. Dynamic neuron model
to generate dynamics in the neuron such that activation of a neuron depends on its internal states [16]-[19]. The block g F( ⋅ ) is the activation function of the neuron. The parameter g, is the slope of the nonlinear activation function represented by F(·). The dynamic model of the above neuron is described by the following set of equations: n
x( k ) = ∑ wi ui ( k ) i =1
r
r
i =1
i =0
% k ) = −∑ ai y( % k − i ) + ∑ bi x( k − i ) y( % k )) y( k ) = F ( g ⋅ y(
(1)
where the signal x(k) represents the input to the filter, the coefficients ai, i = 1,2,....,r and bi, i = 0,1,....,r are the feedback and feed-forward filter parameters, respectively, and r is the order of the filter. Finally, ỹ(k) represents the output of the filter which is the input to the activation function. In order to collect data for the training phase, different fault-free formation flying missions are simulated. Figure 4 shows the schematic representation of the proposed DNN that is used for the x-axis (i.e. roll angle). From this figure one can see that during the training phase the sequence of pulses about the x-axis generated by the pair of thrusters PPT1/PPT2 and the angular rotations about the three axes are presented to the DNNroll. The output of the network is the estimated angular velocity about the xaxis. By comparing the output of the DNNroll with the measured angular velocity about the x-axis, the estimation error is calculated and back-propagated through various layers updating the network parameters W. The training algorithm used here is the Extended Dynamic Back Propagation (EDBP) algorithm [19]. The DNNroll is trained until a termination criterion is fulfilled. In this paper, the termination criterion (t.c.) that is used for the three DNNs is the mean square error (mse) criterion. The training process above is also used for other two DNNs (i.e. DNNpitch and DNNyaw). Each DNN has a 4-10-1 structure (four neurons in the input layer, ten neurons in the hidden layer and one neuron in the output layer) with second order Infinite Impulse Response (IIR) filters and hyperbolic tangent sigmoid and linear activation functions for the neurons in the hidden and output layers, respectively. Once the training phase is completed, the parameters of the dynamic neural networks are fixed and the validation phase is initiated. Data obtained from different missions than those used for the training purpose is presented to the DNN. By comparing the estimated angular velocity by the DNN with the measured angular velocity the representation capabilities of the network are analyzed. Figure 4 also shows the DNN architecture that is used for the roll angle during the validation phase.
Dynamic Neural Network-Based PPT Fault Detection and Isolation
315
Fig. 4. dentification model for training and validation phases (Leader and Follower 1 ACS levels are common for both phases. At the lowest level, the left-hand side graph represents the training phase architecture and the right-hand side represents the validation phase architecture).
The next phase deals with the calculation of a threshold function and an FDI evaluation criterion. This threshold will be used for determining the health status of the pair of PPT thrusters. The value of the threshold is calculated by using healthy data collected from simulated formation flying missions. The calculation of the Thresholdroll is performed by using the mathematical expression given below 6
Threshold roll =
∑ SAE
roll
l =1
6
(l )
6 ⎛ ⎞ SAEroll ( l ) ⎟ ∑ ⎜ ⎟ + σ roll ⎜ max( SAEroll ( l ))l6=1 − l =1 6 ⎜ ⎟ ⎜ ⎟ ⎝ ⎠
(2)
where, SAEroll(l) is the Sum Absolute Error of the data set l=1,2,…,6 collected from the six different missions. The coefficient σ is a constant which is used to adjust the sensitivity of our FDI scheme. Equation (2) is also used for calculating the threshold value for the pitch and yaw angles. The simulated missions require the S/Cf1 and S/Cf2 spacecraft to rotate from an initial angular position (i.e. [0°, 0°, 0°]) until they reach the desirable attitudes (i.e. reference attitudes). The reference attitudes for the six missions are as following: mission (a) ([25°, 40°, 55°]), mission (b) ([20°, 45°, 60°]), mission (c) ([35°, 45°, 55°]), mission (d) ([30°, 40°, 50°]), mission (e) ([30°, 45°, 55°]), and mission (f)
316
A. Valdes and K. Khorasani
([35°, 50°, 60°]), respectively. After calculating the SAE values and using the coefficients σroll = 1.208, σpitch = 2.450, and σyaw = 1.032, the thresholds obtained are: Thresholdroll = 135.00, Thresholdpitch = 50.00, and Thresholdyaw = 76.00, respectively. Finally, our proposed FDI scheme for detecting a single axis is represented in Figure 5. In this scheme, the attitudes of the S/Cf1 are applied to the DNNs and the estimated angular velocities are compared with the actual measurements and the corresponding SAE values are calculated. Next, the SAE values are compared with the corresponding thresholds and the health status of the pairs of thrusters are obtained. f ,1 l i ,PPT ( + ) / PPT ( − )
T
q (k ) q (k ) q (k )
f ,1 l 1 f ,1 l 2 f ,1 l 3
(k )
f ,1 l
Δωi ( k + 1) Health Status
SAEi
DNNi f ,1 l
PPT ( + ) PPT ( − ) Fault Detection
Δωi ,EST ( k + 1)
Threshold i Fig. 5. FDI scheme for the S/C1 satellite of the formation flying (subscript i denotes the i-th angle (roll, pitch or yaw), PPTi(+) denotes the PPT thruster that generates thrust in the positive direction of the i-th axis (i.e. PPT1, PPT3 and PPT5) and PPTi(-) denotes the PPT thruster that generates thrust in the negative direction of the i-th axis (i.e. PPT2, PPT4 and PPT6))
3.2 Simulations Results of the FDI Schemes In this section simulations are conducted for evaluating our proposed DNN based FDI scheme for the formation flying mission. The fault types considered are as following: Fault Type 1: Loss of elasticity is a spring’s failure which affects the deflection of the spring reducing the pressure applied to the propellant bar. This type of failure may change the amount of propellant mass consumed in each pulse. Fault Type 2: The ablation process transforms the solid propellant into the exhaust plasma, but small portions of the propellant may not be transformed, resulting in particles which are added to the inner face of the electrodes. After several pulses, this situation may lead to degradations of the PPT performance. Fault Type 3: Due to wear and tear, conductivity of the wires, capacitor and electrodes may decrease. As a consequence of this, the amount of thrust produced may be changed in an unpredictable manner. To evaluate the performance of our proposed FDI scheme three formation flying missions that are affected by the above faults (i.e. faults affecting the PPTs of the S/Cf1) are simulated. For these cases, the reference attitudes are as following: mission 1 ([25°, 30°, 40°]), mission 2 ([20°, 35°, 45°]), and mission 3 ([25°, 35°, 45°]). Table 2 shows the type of faults and the PPTs affected by the faults and their severity.
Dynamic Neural Network-Based PPT Fault Detection and Isolation
317
Table 2. General Description of the Simulated Faulty Cases Mission 1 2 3
Fault type Type 1 Type 2 Type 3
Faulty PPT PPT2 of S/Cf1 PPT3 of S/Cf1 PPT6 of S/Cf1
Severity The thrust generated by PPT2 is decreased by 15% The thrust generated by PPT3 is increased by 15% The thrust generated by PPT6 is decreased by 15%
The SAE values and the Health Status that are obtained for the follower S/Cf1 are presented in Table 3. Table 3. Health Status Results Mission 1 2 3 Threshold:
SAEroll 382.12 100.21 132.84 135.00
SAEptich 41.53 57.88 38.40 50.00
SAEyaw 45.16 39.75 279.98 76.00
Health Status PPT1/ PPT2 is detected as the faulty pair PPT3/ PPT4 is detected as the faulty pair PPT5/ PPT6 is detected as the faulty pair
According to our simulation results, low severity faults do not significantly affect the performance of the formation flying spacecraft maneuvers. In other words, the problem caused by the faulty PPT is not observable in the attitudes of the spacecraft because the ACS can fulfill the mission requirements by changing the sequence and number of pulses generated by the PPTs. Table 4 presents the number of pulses that are generated by S/Cf1 and S/Cf2 during the mission 2. Table 4. Amount of Pulses Generated by the Followers S/Cf1 and S/Cf2 Spacecraft S/Cf1 S/Cf2
PPT1/PPT2 pulses 78/510 19/21
PPT3/PPT4 pulses 232/238 88/74
PPT5/PPT6 pulses 228/248 68/78
The six PPTs of the S/Cf1 generated more pulses than the PPTs of the S/Cf2 to perform the same rotational maneuver. Due to the fact that the operational lifetime of the PPT thrusters is determined by the amount of generated pulses (i.e. number of capacitor’s discharges), this unplanned extra generation of pulses can reduce the lifetime of the formation flying mission.
4 Integrated Fault Detection and Isolation Scheme In the previous section, we developed an FDI scheme a “High Level” FDI scheme that by using the relative attitude variables abnormal spacecraft’s behavior can be detected and the pair of thrusters where the fault is injected can be identified. Unfortunately, the “High Level” FDI scheme cannot isolate the faulty actuator. On the other hand, by utilizing the DNN-based FDI scheme a “Low Level” FDI scheme for the PPT thrusters as proposed in [12] we can analyze the health status of the six thrusters pulse by pulse. Experimental results demonstrate that for these three
318
A. Valdes and K. Khorasani TPPT 1 / PPT 2 ( k ) q1 ( k ) q2 ( k )
f ,1 l f ,1 l f ,1 l f ,1 l
f ,1 l
SAEroll
q3 ( k )
Threshold roll f ,1 l PPT 3 / PPT 4 f ,1 l 1 f ,1 l 2 f ,1 l 3
T
q (k ) q (k )
(k )
f ,1 l
f ,1 l
(k )
q (k )
vi ( k )
ci ,( − ) ( k + 1 )
Δω y ( k + 1)
Pulse
ci ,est ( k )
z
Health
−1
Status
ci ,est ( k + 1 )
Δω y ,EST ( k + 1)
Threshold pitch
T q (k ) q (k )
ci ,( + ) ( k + 1 )
Δω x ,EST ( k + 1)
f ,1 l
SAE pitch
q (k )
f ,1 l PPT 5 / PPT 6 f ,1 l 1 f ,1 l 2 f ,1 l 3
Δω x ( k + 1)
f ,1 l
Upper Threshold
Lower Threshold
Δω z ( k + 1)
SAE yaw f ,1 l
Δω z ,EST ( k + 1) f ,1 l i ,PPT ( + ) / PPT ( − )
T
Threshold yaw
(k )
Fig. 6. Our proposed ‘Integrated’ FDI scheme for the S/C1 satellite in the formation flying (subscript i in the “Low Level” section of the scheme represents the i-th axis. Capacitor voltage and discharge current are represented by v and c, respectively).
types of faults the utilization of a single fixed threshold value affects the reliability of this approach. The integrated FDI scheme uses the “High Level” approach for detecting which pair of thrusters is healthy and which one is faulty. Once the faulty thruster pair is identified and based on the cause-effect relationships derived in the previous section, one can identify the possible effect of the fault on both PPTs. With this information, different threshold values (i.e. “lower threshold” and “upper threshold”) can be determined. Applying these thresholds to the “Low Level” approach one can determine which thruster is faulty, and more specifically, which one of the generated pulses is faulty. Figure 6 shows the schematic representation of our “Integrated” FDI scheme. The “High Level” FDI scheme detects the faulty pair of thrusters and the Logic Threshold Selection block counts the number of pulses that are generated by each PPT of the faulty pair and determines which threshold must be applied to each PPT. Finally, the “Low Level” FDI scheme analyzes pulse by pulse the health status of both PPTs and detects the faulty pulses generated by the PPTs. In order to demonstrate the performance of our proposed “Integrated” FDI scheme three formation flying missions are simulated. The specifications of these missions are presented in Table 5. Table 5. Specifications of the Simulated Faulty Missions for Evaluating the Performance of Our Proposed “Integrated” FDI Scheme Mission 4 5 6
reference [35°, 50°, 60°] [30°, 40°, 50°] [20°, 45°, 60°]
Faulty type (severity) Type 1 (incremental) Type 3 (incremental) Type 2 (incremental)
Faulty PPT PPT1 of S/Cf1 PPT6 of S/Cf1 PPT3 of S/Cf1
Occurrence time t = 400 sec t = 400 sec t = 300 sec
Dynamic Neural Network-Based PPT Fault Detection and Isolation
319
Table 6. Results of the “High Level” FDI Scheme for the Three Simulated Faulty Missions Mission 4 5 6 Threshold:
SAEroll 384.73 127.19 123.31 135.00
SAEptich 45.43 42.63 82.60 50.00
SAEyaw 69.30 160.73 45.60 76.00
Health Status PPT1/ PPT2 of S/Cf1 is detected as the faulty pair PPT5/ PPT6 of S/Cf1 is detected as the faulty pair PPT3/ PPT4 of S/Cf1 is detected as the faulty pair
Table 6 shows the “High Level” FDI results for the three missions. According to these results, we have positively detected the faulty thruster in all the simulated missions. Based on the results in [12] and simulations performed here the optimal values for the lower and upper Thresholds are determined as 0.0300 and 0.0370, respectively. Table 7 gives the Logic Threshold Selection block results for the three simulated missions. Table 7. Threshold Determination for the Three Simulated Faulty Missions Mission PPTi(+) (Number of Pulses; Threshold value) 4 PPT1 (739; Upper Threshold) 5 PPT5 (256; Lower Threshold) 6 PPT3 (235; Lower Threshold)
PPTi(-) (Number of Pulses; Threshold value) PPT2 (157; Lower Threshold) PPT6 (861; Upper Threshold) PPT4 (849; Upper Threshold)
The “Low Level” FDI scheme uses the threshold values that are determined by the Logic Selection Threshold block to detect the faulty pulses of the faulty actuator. Table 8 shows the results. Finally, for comparing the performance of the “Integrated” FDI scheme with the “Low Level” FDI scheme in [12] the results presented in Table 8 are evaluated by using the Confusion Matrix approach [20] as shown in Table 9. Table 8. Results of the “Low Level” FDI Scheme for the Eight Simulated Faulty Missions mission 4 5 6
PPT PPT1: PPT2: PPT5: PPT6: PPT3: PPT4:
actual/detected healthy pulses 25/26 157/157 256/256 42/40 114/112 849/849
actual/detected faulty pulses 714/713 0/0 0/0 819/821 121/123 0/0
Table 9. Performance Results for the “Low Level” and Our Proposed “Integrated” FDI Schemes
Accuracy True Healthy False Healthy True Faulty False Faulty Precision
“Low Level” FDI Scheme Performance Results 88.89% 96.09% 42.37% 57.63% 03.90% 66.67%
“Integrated” FDI Scheme Performance Results 99.82% 99.72% 00.00% 99.94% 00.00% 99.93%
320
A. Valdes and K. Khorasani
5 Conclusions A novel Fault Detection and Isolation (FDI) scheme for Pulsed Plasma Thrusters (PPTs) of the Attitude Control Subsystem (ACS) of satellites in the formation flying missions has been proposed and investigated. By means of four Dynamic Neural Networks (DNN) in each satellite the proposed FDI scheme is capable of detecting and isolating faults in the actuators (i.e. PPTs) of all the satellites which affect the precision and mission requirements for the formation flying attitudes. Since the force generated by this type of actuator cannot be measured and due to the lack of precise mathematical models, the development of a fault diagnostic system for PPTs is not a trivial effort. In this paper, we have demonstrated that our proposed FDI scheme is not computationally intensive and is a reliable tool for detecting and isolating faulty PPTs. The results obtained show a high level of accuracy (99.79%) and precision (99.94%) and the misclassification rate of the False Healthy (0.03%) and the False Faulty (0.61%) parameters are quite negligible. Therefore, the applicability of the DNN technique for solving fault diagnosis problems in a highly complex nonlinear system such as the formation flying systems has been demonstrated. Formation Flying missions are beginning to gain popularity due to the number of advantages that they provide. A significant reduction in the amount of hours spent by the ground station personnel can be achieved by implementing our proposed DNNbased FDI scheme. Therefore, the cost of the mission can be significantly reduced.
References 1. Wilson, E., Lages, C., Mah, R.: Gyro-based Maximum-Likelihood Thruster Fault Detection and Identification. In: Proceedings of the 2002 American Control Conferences (2002) 2. Wilson, E., Sutter, D.W., Berkovitz, D., Betts, B.J., del Mundo, R., Kong, E., Lages, C.R., Mah, R., Papasin, R.: Motion-based System Identification and Fault Detection and Isolation Technologies for Thruster Controlled Spacecraft. In: Proceedings of the JANNAF 3rd Modeling and Simulation Joint Subcommittee Meeting (2005) 3. Piromoradi, F., Sassini, F., da Silva, C.W.: An Efficient Algorithm for Health Monitoring and Fault Diagnosis in a Spacecraft Attitude Determination System. In: IEEE International Conference on System, Man and Cybernetics, SMC (2007) 4. Larson, E.C., Parker Jr., B.E., Clark, B.R.: Model-Based Sensor and Actuator Fault Detection and Isolation. In: Proceedings of the American Control Conference (2002) 5. Joshi, A., Gavriloiu, V., Barua, A., Garabedian, A., Sinha, P., Khorasani, K.: Intelligent and Learning-based Approaches for Health Monitoring and Fault Diagnosis of RADARSAT-1 Attitude Control System. In: IEEE Int. Conf. on System, Man and Cybernetics, SMC (2007) 6. Guiotto, A., Martelli, A., Paccagnini, C.: SMART-FDIR: Use of Artificial Intelligence in the Implementation of a Satellite FDIR. In: Data Systems in Aerospace DASIA 2003 (2003) 7. Holsti, N., Paakko, M.: Towards Advanced FDIR Components. In: Proc. DASIA (2001)
Dynamic Neural Network-Based PPT Fault Detection and Isolation
321
8. Pencil, E.J., Kamhawi, H., Arrington, L.A.: Overview of NASA’s Pulsed Plasma Thruster Development Program. In: 40th AIAA/ASME/SAE/ASEE Joint Propulsion Conf. & Exhibit. (2004) 9. Bromaghim, D.R., Spanjers, G.G., Spores, R.A., Burton, R.L., Carroll, D., Schilling, J.H.: A Proposed On-Orbit Demonstration of an Advanced Pulsed-Plasma Thruster for Small Satellite Applications. Defense Technical Information Center OAI-PMH Repository (1998) 10. McGuire, M.L., Myers, R.M.: Pulsed Plasma Thrusters for Small Spacecraft Attitude Control. In: NASA/GSFC Flight Mechanics/Estimation Theory Symposium (1996) 11. Zakrzwski, C., Benson, S., Cassady, J., Sanneman, P.: Pulsed Plasma Thruster (PPT) Validation Report. NASA/GSFC (2002) 12. Valdes, A., Khorasani, K.: Dynamic Neural Network-based Pulsed Plasma Thruster (PPT) Fault Detection and Isolation for the Attitude Control System of a Satellite. In: Proceedings of the World Congress on Computational Intelligence (2008) 13. Tien, J.Y., Purcell, G.H., Amaro, L.R., Young, L.E., Aung, M., Srinivasan, J.M., Archer, E.D., Vozoff, A.M., Chong, Y.: Technology Validation of the Autonomous Formation Fying Sensor for Precision Formation Flying. In: Proceedings on the 2003 IEEE Aerospace Conference, vol. 1 (2003) 14. Li, Z.Q., Ma, L., Khorasani, K.: Dynamic Neural Network-Based Fault Diagnosis for Attitude Control Subsystem of a Satellite. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 308–318. Springer, Heidelberg (2006) 15. Al-Dein Al-Zyoud, I., Khorasani, K.: Neural Network-based Actuator Fault Diagnosis for Attitude Control Subsystem of an Unmanned Space Vehicle. In: International Joint Conference on Neural Networks (2006) 16. Korbiez, J., Obuchowicz, A., Patan, K.: Network of Dynamic Neuron in Fault Detection Systems. In: IEEE International Conference on Systems, Man, and Cybernetics, SMC (1998) 17. Patan, K., Parisini, T.: Identification of Neural Dynamic Models for Fault Detection and Isolation: the Case of a Real Sugar Evaporation Process. J. Process Control 15(1), 67–79 (2005) 18. Ayoubi, M.: Nonlinear Dynamic Systems Identification with Dynamic Neural Networks for Fault Diagnosis in Technical Processes. In: IEEE International Conference on Systems, Man, and Cybernetics, SMC (1994) 19. Patan, K.: Fault detection of actuators using dynamic neural networks. In: 2nd Damadics Workshop on Neural Methods for Modelling and Fault Diagnosis (2003) 20. Computer Science 831: Knowledge Discovery in Databases, http://www2.cs.uregina.ca/~dbd/cs831/notes/confusion_matrix/ confusion_matrix.html
Model-Based Neural Network and Wavelet Packets Decomposition on Damage Detecting of Composites Zhi Wei1, Huisen Wang2, and Ying Qiu1 1
School of Mechanical Engineering, Hebei University of Technology, Tianjin 300130, China [emailprotected] 2 Tianjin Navigation Instruments Research Institute, Tianjin 300131, China
Abstract. Model-based neural network (MBNN) is used along with wavelet packets decomposition to detect internal or hidden damage in composites. In consideration of internal delaminations with different sizes and locations typical finite element model is used to acquire training data of neural networks. Delamination-induced energy variations are decomposed by wavelet packets to enhance damage features. The predicted delamination size and location are selected as output of neural networks. In order to acquire target signals, forced vibration test is conducted. Based on experimental result, damage-induced energy variation of response signal is analyzed and the relationship between damage and physical performance is related. Test result shows that the proposed method is effective to investigate internal damage state in composites. Keywords: MBNN, composites, delamination, detecting, wavelet packets.
1 Introduction During manufacturing process or a result induced by external loads in operation or service, delamination is a major concern for in-service composites and has the most potentially danger in the hidden or initial state. Researchers have studied various aspects of the delamination process including changes in dynamic response but there is still a completely different kind of difficulties for the nondestructive testing of this damage due to the inherently complex structure, especially for internal delamination in the early stage. Many applications of Neural network (NN) deal with problems of pattern recognition [1]. The candidate models for structures with various types of damage are designated as patterns for damage identification. These different patterns can be organized into pattern classes according to the location and severity of damage. To effectively detect different damages in composites the method of extraction of features is an important element [2]. However, according to our investigations it is difficult to extract features directly from the magnitude spectrum of tested signals of specimens. Wavelet packets (WPs) as a library of orthogonal bases for square– integrable real functions are applied in signal processing with good time-frequency D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 322–328, 2008. © Springer-Verlag Berlin Heidelberg 2008
MBNN and Wavelet Packets Decomposition on Damage Detecting of Composites
323
localization. Therefore, in this paper, the energy spectrum of signal decomposed by WPs is used as the feature input of the NNs. NNs must be trained using a great number of input-output data of intact and various damaged samples in order to identify the status of a structure and to detect damage or to monitor the health of the structure. As the limitation of measured signals of dynamic responses from real-world structure under different episodes of damage, the model-based NN (MBNN) is proposed [3], [4], [5]. For a large amount of computed training data the effectiveness of MBNN system is dependent on the accuracy of the structural model. In order to enhance the accuracy of the modeling, in this paper, a three-dimensional finite element model for the multi-layer composite plates is proposed.
2 Computation Model The To compute the modal parameters such as natural frequency, modal shape and modal strain of each mode finite element model is used in order to acquire necessary input data to train NN in consideration of effects of material anisotropy and damage. To model multi-layer composites composed of moderate thickness of plate, an eightnode rectangular thin plate element is used to consider the transverse shear stress on plate performance. To ensure the material continuity, displacements and their variations of each pair of coincident nodes on the upper and lower adjacent laminae have to be equal in the whole process of computation for the intact plate. When the plate is delaminated, those nodes within the delamination region are left unconnected to each other. WP decomposition of a signal has better localization effect than that of wavelet and is, therefore, used to adaptively choose the corresponding frequency bandwidth according to the characteristics of the detection signal and to enhance the resolution both in frequency and time domains for damage identification of composites. Differences of signals between intact and damaged structures are generally insignificant in the early state of damage, extraction of damage index is still difficult directly from the measured form of signal (even decomposed by WPs). Therefore, we use the energy spectrum analysis to enhance the features of damage. The second order norm of an original signal f(t) is f
2 2
=
∫
2
f ( x) dx
(1)
R
Then, for allowable wavelet ψ, there is
∫∫
2
Wφ f (a, b) / a dbda = f
2 2
(2)
R
Thus, there exists an equivalent relationship between the energy of wavelet transform and that of the original signal. Therefore, it is reliable to express energy variation in the original signal by energy spectrum of WPs. Hence, in the energy spectrum of WPs, the sum of square of the decomposed signal is selected as the energy feature within every subspace. In subspace V2 j i (the ith frequency span of the jth layer) the
324
Z. Wei, H. Wang, and Y. Qiu
result of WP decomposition is expressed by {Si(k), k=1,2,…,M}, and its energy is expressed as M
E 2 j i = ∑ S i (k )
2
(3)
k =1
where M is sample length in the subspace. Let
E20 j i and E2d j i represent the energy
spectrums of signals measured from intact and damaged samples, respectively, the dimensionless index ξ i = E 2d i − E 20 i E 20 i j
j
j
(4)
can demonstrate the damage-induced WP energy variation of the signal in subspace
V2 j i . As the frequency band is equal to each other for every subspace obtained by WP decomposition of the signal, a series of columns can be potted using ξi as spectrum in every frequency span. Let the sum of all the columns be equal to 1 in a particular layer of decomposition, each of the columns can, therefore, represent the percentage of the sum of energy variation in the subspace about the total of the considered layer. In this case, the height of every column, ξi, is substituted by hi = ξ i
N
∑ξ
k
k =1
(5)
where N=2j is the total number of subspaces in the jth layer.
3 NN Model The three-layer Levenberg-Marquardt backpropagation topology is used to assess delamination size and location in composites. The input layer which receives the computed or measured features as form of WP energy spectrum, the hidden layer selected for data processing and the output layer gives the result of analysis. The input layer has N neurons presented with percentage variations of WP energy spectrum of the response signal. The hidden layer has Nh neurons and the output layer has only one neuron which expresses the assessment of delamination size or location by the particularly nondimensionalized area or distance, respectively. The inputoutput training data sets are FE computation results for selected samples. For each training sample, say ith sample, the learning algorithm is designed to recursively minimize the error function of Ei =
1 ( y i − oi ) 2 2
(6)
where yi and oi are the target output and the actual output values of the ith training sample, respectively. The nonlinear sigmoidal function is adopted as the transfer function of the hidden and output layers. For training sample i the output value of the output layer is
MBNN and Wavelet Packets Decomposition on Damage Detecting of Composites Nh
N
m =1
n =1
oi = f (∑ [ wm f (∑ wnmζ n )])
325
(7)
where wm is the weight function between the mth node in the hidden layer and the output, and wnm is that between node n in the input layer and node m in the hidden layer. In order to minimize the prediction errors Ei on the ith training sample, the weights between the interconnection layers are given by
Δi wmn = ξδ im oin
(8)
where 0<ξ<1 is the learning rate coefficient, and Διwmn and δim are the actual change in the weight and the error at node m. In the output layer the node error is
δ i = ( yi − oi )(1 − oi )oi
(9)
But in the hidden layer Nh
δ im = oim (1 − oim )∑ δ in wnm
(10)
n =1
In the training procedure, a momentum factor 0<α<1 is introduced to control the oscillation, i.e., the difference of weight functions between the (k+1)th and kth training steps is computed from
Δi wmn (k + 1) = ξδ im oin + αΔi wmn (k )
(11)
and then, the new weight function is adjusted according to
wmn (k + 1) = wmn (k ) + Δwmn (k + 1)
(12)
The network propagates input through each layer until an output is generated. Then, the error between the actual output and the target output is computed by Eq. (4). The calculated error is transmitted backwards from the output layer and the weights are adjusted according Eqs. (11) and (12) in order to minimize the error. The training process is terminated when the error is sufficiently small for each training sample.
4 Experiment and Result 4.1 Samples and Experimental Setup
Delamination is considered as damage in specimens of rectangular multi-layer carbon fiber-reinforced epoxy composites made of TC12K33/S-1 prepreg tapes. Each damaged sample has only one rectangular delamination. Nine specimens altogether are prepared and divided into two groups. For the first group each damaged plate has a delamination located at a position with the same center to the others. All the plates in this group are simply two-side supported. Each sample plate in the other group is one side fixed, and each damaged plate has a delamination at an individual position. Thirty computational samples corresponding to above two groups are adopted in order to obtain the training data of NNs.
326
Z. Wei, H. Wang, and Y. Qiu
The experimental setup is shown as Fig. 1. On the top and bottom surfaces of each plate two piezoelectric patches are oppositely bonded as actuators. An accelerometer is mounted on the top surface. The average of eight repeated measurements for each case is taken as the result to reduce the influence of noise. The acceleration response data is measured when the impulse excitation is applied to the plate. After experiment, the recorded response data for each specimen are operated by WP decomposition to obtain index of damage features.
Fig. 1. Schematic of experiment setup of vibration test for delamination detecting in composites
4.2 Training Performance of NN and Its Test
The NN structures for the two groups are slightly different and trained separately using computed responses to pulse excitation for the samples. The neuron numbers of input and output layers are 32 and 1, respectively. When the neuron numbers of the hidden layer are 12 and 20, respectively, the NN has the most ideal results. The error is less than 10-10 after 200 learning epochs. The input data of a vibration system will unavoidably contain some amounts of errors or uncertainties due to error of measurement, transducers, human mistakes, etc. To appraise the damage-tolerant abilities of the proposed method, a set of the testing data is created by adding ±1% to ±10% random uniformly distributed samples to the training data as the noised input. The test results using the noised data clearly show that the error containing data indeed influence the recognition accuracy. The relative errors are bellow 10%, i.e., the proposed method has an accuracy over 90%. 4.3 Test of Specimens
After successful training and verification of the two networks, they are tested using experimental data of the prepared specimens. The measured data obtained using vibration experiment as stated earlier are fed into the NNs. The output from the networks is shown as Fig. 2. Although the output of the NN from the computed data is better than that from the measured data, the latter is satisfactory enough as the absolute of the worst relative error is less than 8%.
MBNN and Wavelet Packets Decomposition on Damage Detecting of Composites
327
1.0
NN results
0.8 0.6 0.4
Actual target NN output (by computed data) NN output (by measured data)
0.2 0.0 -0.2
Errors
2
3
4
0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00
5
7
8
9
Sample number Relative error (by computed data) Relative error (by measured data)
2
3
4
5
7
8
9
Sample number
Fig. 2. Plot of NN results of delamination assessment for selected samples
5 Conclusion This paper presents a vibration damage assessment method using MBNN for multilayer composites. As more training data can be computed using corresponding FE model, the NN structure is simple and fast learning. As the relative WP energy change induced by damage is selected as the input of the NNs, it is convenient to obtain majorities of responses as training data for samples with different damages and conditions using the non-damping FE model. The delamination-induced variation of WP energy is consistent with that of natural frequency, and is proved effective to recognized delamination area and location using the proposed MBNN. The results of this study show that the presented method for determination of damage state is quite promising. This study has attractive application to damage assessment of composites, especially for active detection because of the ability of smart material to provide excitation to the structure without requiring much additional equipment.
References 1. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1998) 2. Cios, K.J., Tjia, R.E.: Application of Neural Networks in the Acousto-ultrasonic Evaluation of Metal-Matrix Composite Specimens. In: International Joint Conference on Neural Networks, Singapore, vol. 2, pp. 993–998 (1992) 3. Perlovsky, L.I.: Model-based Neural Network for Target Detection in SAR Images. IEEE Transactions on Image Processing 6(1), 203–216 (1997)
328
Z. Wei, H. Wang, and Y. Qiu
4. Cai, N., Hu, K., Xiong, H., Li, S., Su, W.: Image Segmentation of G Bands of Triticum Monococcum Chromosomes Based on the Model-based Neural Network. Pattern Recognition Letters 25(3), 319–329 (2004) 5. Cai, N., Yang, J., Hu, K., Xiong, H.: MRF-MBNN: A Novel Neural Network Architecture for Image Processing. In: Wang, J., Liao, X.-F., Yi, Z. (eds.) ISNN 2005. LNCS, vol. 3497, pp. 673–678. Springer, Heidelberg (2005) 6. Daubechies, I.: Orthonormal Bases of Compactly Supported Wavelets. Communic. Pure Appl. Math. 41(7), 909–996 (1998)
A High Speed Mobile Courier Data Access System That Processes Database Queries in Real-Time Barnabas Ndlovu Gatsheni and Zwelakhe Mabizela Vaal University of Technology, P Bag X021, Vanderbijlpark 1900, South Africa [emailprotected], [emailprotected]
Abstract. A secure high-speed query processing mobile courier data access (MCDA) system for a Courier Company has been developed. This system uses the wireless networks in combination with wired networks for updating a live database at the courier centre in real-time by an offsite worker (the Courier). The system is protected by VPN based on IPsec. There is no system that we know of to date that performs the task for the courier as proposed in this paper.
1 Introduction This paper addresses challenges faced by Meal-Trans Courier Company (also called courier centre) that receives credit cards from banks for forward transmission to the banks’ clients. A Courier refers to a messenger who delivers credit cards to recipients in a remote location. Currently, the courier centre receives over 20 000 credit cards per time from 3 banks (shown in Figure 1) for distribution to the banks’ clients. Once credit cards have been sorted out to their destinations and then read into the database, credit card recipients are contacted by telephone to arrange delivery. The credit cards that can be delivered are assigned a Courier. The Courier on his way to deliver credit cards specified in his trip sheet takes with him a Consolidation Manifest and a Delivery Note. Upon delivery, the credit card recipient is requested to produce a certified copy of a valid identification document and proof of residence in the form of an electricity bill. At the courier centre these documents are scanned as shown in Figure 1 into a Tracking Database System for verification with the Financial Intelligence Centre, a process that takes place at least 3 days after a successful credit card delivery. In addition, it takes the Courier 7 hours to report back with copies of recipient documents to the courier centre after a delivery. After 3 days and also after 7 hours a lot of damage will have been done in case the credit card landed on wrong hands. What is needed is a system that will automatically exchange data in real-time between the courier centre and a Courier when the recipient signs the delivery note on receiving the credit card. This transaction model must not allow unauthorized access, destruction and alteration of data and it must be able to cope with mobility issues. This system will provide seamless connection between the Courier and the courier centre. Put in other words, this model will assist the Courier to relay data to the courier centre in real-time. Unlike current courier services that are based on expensive GPRS, a mix of wireless technologies is used to achieve a lower cost model. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 329–336, 2008. © Springer-Verlag Berlin Heidelberg 2008
330
B.N. Gatsheni and Z. Mabizela
Bank 1 ccards
Bank 1 ccards
Bank 1 ccards
into courier database Fast sorting
delivery /recipients FICA verification Fig. 1. The mobile courier data access system (MCDA). ccards means credit cards.
Section 2 is on relevant wireless technologies; section 3 outlines relevant computer security technologies. Section 4 is on experimentation. Section 5 is the Conclusion.
2 Wireless Technologies Portable computing wireless devices and fixed wireless broadband access to LANS offered by Wireless Fidelity (Wi-Fi) shown in Table 1 (for the “last mile”) [1] in combination can offer high rates of Courier data in real-time. Wi-Fi also called IEEE802.11 offers low deployment costs, ease-of-use, superior network scalability, reliability in harsh environments, “anytime - anywhere computing”, “always on” connectivity and a high signal to noise ratio (SNR), - key recipes for the MCDA. Table 1. Selected wireless technologies
IEEE802.11 G B N WiMAX 802.16a WiMAX 802.16e
Freq/GHz 2.4 2.4 2.4 10 – 66 2–6
Data rate 54 Mb/s 1-11 Mb/s 600 Mb/s 1.5-70 Mb/s 1.5-70 Mb/s
Range 100m 100m 50m 30 000m 30 000m
With Wi-Fi system’s access points (AP), wireless network traffic can be transmitted over a fixed network.Wireless local area networks (WLANS) can deploy APs interconnected either by wired Ethernet or by means of high bandwidth point-topoint wireless links. Although Wi-Fi hotspots can extend the reach of the Internet in
A High Speed MCDA System That Processes Database Queries in Real-Time
331
remote areas, they tie users to the presence of an AP. Furthermore, Wi-Fi signals can propagate through walls from APs and thus making the network vulnerable to hackers. Despite these setbacks, Wi-Fi is potentially a key driver for the MCDA. 2.1 Wireless Systems in Combination The Courier requires mobile and high-bandwidth untethered Internet connection as he moves from point to point. Wi-Fi and 3G [6] integrated with GSM or GPRS module gives seamless coverage, mobility and uninterrupted connectivity. 3G provides mobility. GPRS provides internet access “anywhere – anytime”, but transferring data over GPRS networks costs about 70 times that for wireless networks. In addition, the provision of a seamless service between Wi-Fi and GPRS/3G networks suffers from authentication problems. Thus GPRS will be used where both GSM and Wi-Fi fail. Wi-Fi systems and GSM in combination give seamless coverage in the short and wide area respectively. This seamless coverage has been exploited for roaming between LANs, by Nokia in its phone card and in mobile phones for delivering passwords that control access. The transition from GSM to WLAN avails more bandwidth, however, bandwidth reduces from 54 Mbps to 9.6 Kbps when the situation is reversed and this is a weakness of this configuration. The related work is in telemedicine where clinical data is captured in a remote location and sent to a hospital data collection centre. GSM’s key is kept in the SIM card and hence there is no need of key updating. However, for GSM there is no protection between the visitor and home location register (VLR and HLR) and also mobile personal digital assistant (PDA) cannot authenticate VLR, thus exposing the system to hackers. In the MCDA, data will be encrypted in the PDA to overcome this. The GSM’s lack of the non-repudiation feature is not key as PDAs used by Couriers have a unique identification number. 2.2 Mobile PDAs and Access Points (APs) A Courier will carry a mobile PDA from point to point. The PDAs and the AP are equipped with the NIC (IEEE802.11 /GSM adaptor) and IEEE802.11/GSM card respectively. They have a 6-byte MAC address in their adapters and this address is not encrypted thus a hacker can change a MAC to get anonymous Internet access. MAC is a physical layer of a wireless connection between fixed and mobile devices. For a mobile PDA to connect to backbone, when it is within the range of an AP it engages the scanning protocol and its operating system automatically connects to the AP. This AP notifies the old AP of this change via a distribution system. To avoid collisions, a PDA can send a probe frame to all APs and then select one of the APs. Despite the forward and backward communications due to the network protocols that deplete the PDA’s power supply, this configuration will be used for the MCDA in section 4. APs can provide high speed connectivity over wireless links to the mobile PDAs. The PDA supports one wireless interface switches between cellular and Wi-Fi systems. When the WLAN signal to noise ratio (SNR) from an AP coverage area falls below a threshold, the system switches to GSM or GPRS. In section 4 an API is included to inform applications about the bandwidth capabilities of the active wireless interface.
332
B.N. Gatsheni and Z. Mabizela
2.3 Mobile Worldwide Interoperability for Microwave Access (WiMAX) WiMAX [6] mobile wireless (IEEE 802.16e-2005) provides broadband services. Unlike DSL and Cable, WiMAX can be rapidly and cost effectively deployed. It supports wide area mobility via scalable orthogonal frequency division multiple access (OFDMA), thus WiMAX simultaneously supports fixed, portable and mobile models. Scalable OFDMA enables operators to offer “on the go” broadband Internet connectivity beyond Wi-Fi hotspots. WiMAX has good QoS, mesh and smart antenna that maximises use of the spectrum. In Mesh networking data hops from point to point circumventing obstacles and thus improves coverage by a single base station. IEEE802.16a enables non-line-of-sight connections and it supports high data rates (shown in Table 1) on a channel of 1.5 to 20 MHz. This allows WiMAX to adapt to available spectrum and channel widths in different regions. IEEE802.16a operates like Wi-Fi but at higher speed over greater distances (in Table 1) and for far many users and thus it is cost effective. WiMAX in combination with Wi-Fi to be called Wi-FiMAX supports IEEE802.11a/b/g networks and WLAN based on IEEE802.11n [3] which delivers up to twice the range and 8 times the performance of IEEE802.11a/ and g. IEEE802.11n enables bandwidth intensive applications to be transferred across the Wi-Fi and WiMAX networks. In addition, Wi-FiMAX users remain connected to the metropolitan area when they leave the hotspot. Wi-FiMAX offers transparency of service between Wi-Fi in hotspots and WiMAX in the metropolitan areas. Multiple inputs multiple outputs (MIMO) is incorporated in IEEE 802.11n and IEEE 802.16e-2005 standards. Antennas for MIMO are shared by Wi-Fi and WiMAX in Wi-FiMAX to reduce costs and noise. Wi-FiMAX through MIMO exploits multipath propagation to increase data throughput and range and to reduce bit error rates using the bandwidth and transmission power of single input single output (SISO). WiMAX supports VLAN, IPv4, IPv6, Ethernet and ATM services and it thus offers services to both data and voice. It can provide a better wireless backhaul to connect IEEE802.11 WLANS and hotspots to the Internet. WiMAX tower (coverage area of 8000 km2) are not available locally. In addition, a standard for IEEE802.16 mobile clients is still under development. The MCDA will benefit from this standard as it enables a hand-off between base stations and the Courier can roam between service areas. The PDA can switch connection from 802.11b to 802.16, wired to IEEE802.11 etc. WiMAX will be used on MCDA once these problems have been overcome. In this work, the MCDA will be implemented using Wi-Fi, GSM, 3G and GPRS.
3 Security Technologies The wired equivalent privacy protocol (WEP) [4] offers security between a host and a wireless AP. With WEP, all users in an organisation share one symmetric key, thus, if the key is compromised the security for all users is at risk. However, since the Courier is not transmitting enough traffic over a WEP-encrypted link for an intruder to piece up data for cracking the security system, WEP can be used for MCDA. Wi-Fi protected access (WPA) [4], which provides confidentiality and key distribution (an improvement on WEP), offers user security through extensible
A High Speed MCDA System That Processes Database Queries in Real-Time
333
authentication protocol (EAP) and the IEEE802.11x for port-based access control. EAP provides the architecture for upper layer authentication (ULA) [4] protocols. ULA in the MCDA will facilitate mutual authentication exchange between a PDA and an AP. ULA will also generate keys for use on this wireless link. EAP messages can be encapsulated over this wireless link and then decapsulated at the AP. The message is re-encapsulated at this AP using Remote Authentication Dial-in User Service (RADIUS) protocol for transmission over UDP/IP to the courier centre. What makes WPA attractive is its temporal key integrity protocol (TKIP) which defends against replay attack, data integrity, weak key attacks and it avoids key reuse. WEP and WPA encrypt data only on the wireless link. In contrast, virtual private networks (VPN) [4] can encrypt data from the PDA to the VPN server (through wired and wireless links). A VPN is ideal if a PDA wants to communicate with only one server, which is the case with MCDA, otherwise a VPN tunnel must be established to each server which is expensive. Thus data between the Courier and the courier centre is sucured by VPN. The public key infrastructure (PKI) [8] through the CA generates, distributes and controls public keys and assures the ownership of a public-key. PKI’s certification revocation list (CRL) verifies the validity of the certificate. If the PDA is to retrieve a CRL with each certificate from the CA, the real-time constraint is affected. However, PKI-based authentication information can be stored in smart cards that can be inserted into a reader. IPsec’s Internet Security Association and key management (ISAKMP) for automatic key management can establish a bidirectional secure channel between the PDA and the Courier when it uses negotiation in aggressive mode (AM). In AM mode it facilitates the internet key exchange (IKE) [9] deployment in UMTS which speeds up the IKE transaction at a cost of less security. With VPN the IKE protocol operate in a mobile UMTS environment. In SA negotiation, the PDA generates a cookie to prevent flooding attacks. What makes UMTS network attractive for MCDA is its use of dynamic IP addresses and its seamless access by mobile users to high data rate transmission for internet/intranet applications. On demand VPN deployment over the UMTS is suited to a Courier as he moves from point to point. However, UMTS increases the vulnerability of the network to attacks by intruders from mobile devices. IKE provides secure key determination via a public key system. This facilitates authentication, selection of the encryption to be used, protection against replay and protection against flooding attacks. IKE has built-in timeouts and re-try mechanisms which promote recovery. IKE clears collisions when the life-times of the tunnels expire. All these attributes are key to the success of the MCDA. IPsec supports fast symmetric cryptography and one way hash functions. IPsec headers (ESP, AH, IP) for tunneling and encapsulation increases the packet size, the ratio of the header size to payload size and reduces the effective bandwidth, network performance, increases router internal delays and queuing delay. In addition, the time needed to build IPsec headers and apply the encryption to the payload introduces delay to packet transmission and is bad for the real-time constraint for the MCDA. However, the Tunnel ESP mode has less overhead both for packet authentication and encryption. In section 4 the mobile PDA establishes an IPsec tunnel and generates requests which are forwarded to the Windows 2003 server at the courier centre and thus the PDA traverses firewalls, accesses courier LAN and conveys sensitive data securely. Key management and the periodical updating of the public key is difficult on mobile PDAs due to hardware limitations. In addition, a PDA has small memory
334
B.N. Gatsheni and Z. Mabizela
capacity, it uses low bandwidth wireless networks and thus it cannot support complex encryption/decryption which depletes battery power. A large cipher kernel size, a large key size and long iterations executed by the cipher kernel result in a strong cipher at the expense of a decrease in encryption/decryption speed, in throughput and in system capacity. In the MCDA a trade off will be struck between security (computational load) and throughput in order to optimise the MCDA system real-time requirements. Captive portals allow access after user authentication and thus in MCDA, can let the PDA receive IP packets from the DHCP server via a Wi-Fi link, however, displaying of terms of use on a web page, and a login screen for access can affect the MCDA’s real-time constraint. A router allowed the PDA to change its IP address or make use of mobile IP. Firewalls cannot block attacks originating from mobile subscribers. In addition, roaming involves changing source address, the static configuration of firewalls results in discontinuity of service for the mobile PDA.
4 The MCDA System Design The system comprises the Courier; the Windows 2003 server, Wi-Fi hotspots [2]; 3G, Integrated Wi-Fi, GPRS and GSM; a mobile PDA; a database developed using MySQL; code in java for exploiting the capabilities of mobile devices, IEEE802.11i in Table 1 which supports roaming, offers stronger encryption, has an extensible set of authentication mechanisms and key distribution mechanism. 4.1 Experimentation Two students each equipped with a mobile PDA acted as Couriers distributing radio frequency identification (rfid) tags (based on ISO 14443) [7] which represented credit cards from 3 banks. Each PDA had a unique number that represented the Courier’s identity (ID) number (no need for separate entry in database for each Courier for cards delivered). Each Courier had 45 passive RFID tagged cards [5] (representing 15 cards from each bank). There were 90 RFID tags distributed equally among the 3 banks. A range (labels) of UIDs [5] were allocated to each bank. A database was created using MySQL to receive data from the Courier. 90 students representing credit card recipients were positioned in different parts of the campus; on receiving these cards (from Couriers) signed on the PDA. Of the 90, 10 were rogue recipients. The other inputs into the PDA were the UID number of the rfid tag (representing a credit card), 2 rfid tags (different from those for credit cards) representing a copy of the recipient’s electricity bills and the ID respectively. The data was encrypted from the PDA in a remote location through to the Windows 2003 server at the courier centre. A firewall with an in-built VPN (secured with IPsec tunnelling) provided security between the PDA through the ISP to the courier centre. Security policy guidelines were developed that prescribe suitable protection for information flow. The PDA (with Wi-Fi and GPRS /GSM) used 2 clusters of APs. Each cluster consisting of 3 APs created a large hotspot. The hotspots A and B which were 300 metres apart were linked via the Ethernet and then linked to courier database via an internet service provider (ISP). The PDA in a remote location linked to these hotspots wirelessly. MTN, a mobile phone company using GSM /GPRS provided another link
A High Speed MCDA System That Processes Database Queries in Real-Time
335
between the courier centre and the remote Courier. GSM and GPRS acted as a failover for Wi-Fi. At the host, APs and IEEE 802.11 replaced the Ethernet switch and the Ethernet adapter NIC card respectively. A Java code was written to perform failover, i.e. where the signal for the AP was very weak or absent, then data was sent through the GSM. The database loosely linked to a 24 hour service of the 3 banks did have a specimen signature and a hashed ID number of the recipient. 4.2 Results On the observation of the database at the courier centre, the following results were achieved: there were 80 UIDs, 80 different signatures, 80 ID numbers, 80 copies of bills, 80 copies of IDs ; 10 alarms (highlighted in red in the database). When a wrong signature or ID number was sent, an alarm was raised and a signal was sent to the bank with the hashed value of the card to block it. When the Courier moved from hotspot A to hotspot B, there was a momentarily shift to GSM and then to Wi-Fi. This momentarily shift to GSM from Wi-Fi was expected as within the 300m where there was no influence or a hotspot (where signal from both hotspots was very weak). 4.3 Discussion of Results The MCDA achieves 100 % performance as verification of the transaction is instant, whereas with the current system, the courier centre had to send information to banks 3 days after the credit cards had been delivered. What is also new is the use of wireless systems in combination and failover for a mobile courier to deliver real-data in realtime to the courier centre to provides cheap wireless communication. These savings are passed on to clients, thus making the banks competitive. The MCDA automated the verification of the copy of recipient’s bills and ID, the courier centre database on receiving a wrong ID number or signature raises an alarm and a signal is sent to the respective bank which automatically blocks the credit card. Thus, there is a 100 % savings on potential losses due to unauthorised transactions by rogue recipient of a credit card. In addition, there is a reduced Courier turnaround time of more than 80% for delivering parcels. We used IEEE802.11g instead of 802.11n has high data transfer rates but could not be delivered on time.
5 Conclusions A high-speed query and transaction processing model has been developed. This model is secure and gives reliable access and for the very first time updates or a live database in real-time by an offsite worker (Courier) from a remote location while coping with mobility issues. The likelihood of credit card fraud due to a rogue recipient has been reduced to zero percent. A live database is a collection of records that can be remotely accessed and updated to return the desired results in real-time. The functions from mobile computing have been adapted for the MCDA. The MCDA system offers customized security services to data traffic and guarantees interworking with existing network infrastructure. Thus PDAs and APs execute the same security protocols as servers in the wired internet thus achieving end-to-end security.
336
B.N. Gatsheni and Z. Mabizela
5.1 Further Work A handwritten signature of a recipient might be replaced with biometric in the MCDA system which can counter complex attacks. Biometric is attractive because of the oneto-one mapping it constructs with an individual and biometric data cannot be forged.
References 1. Haidong, X., Brustoloni, J.C.: Secure and Flexible Support for Visitors in Enterprise Wi-Fi Networks. In: IEEE GLOBECOM 2005 proceedings, pp. 2647–2652 (2005) 2. Efstathiou, E.C.G., Polyzos, C.: A Self-Managed Scheme for Free Citywide Wi-Fi. In: Proceedings of the Sixth IEEE International Symposium on a World of Wireless Mobile and Multimedia Networks (WoWMoM 2005) (2005) 3. Vaughan-Nichols, S.J.: Will the New Wi-Fi Fly? Technology News, 16 –18 4. Hol, K.J., Dyrnes, E., Thorsheim, P.: Securing Wi-Fi Networks, IEEE Computer Society Magazine, 28–34 5. Gatsheni, B.N., Rengith, B.K., Aghdasi, F.: Automating a student class attendance register using radio frequency identification in South Africa. In: IEEE International Conference on Mechatronics (ICM), Kumamoto (2007) 6. Jindal, S., Jindal, A., Gupta, N.: Grouping Wi-MAX, 3G and Wi-Fi for Wireless Broadband. IEEE, Los Alamitos (2005) 7. Finkenzeller, K.: RFID HandBook, 2nd eds (2003) 8. Hunt, R.: PKI and Digital Certification infrastructure, pp. 234–239. IEEE, Los Alamitos (2001) 9. Perlman, R., Kaufman, C.: Analysis of the IPsec Key Exchange Standard, pp. 150–156. IEEE, Los Alamitos (2001)
A Scalable QoS-Aware VoD Resource Sharing Scheme for Next Generation Networks Chenn-Jung Huang, Yun-Cheng Luo, Chun-Hua Chen, and Kai-Wen Hu Department of Computer & Information Science College of Science National Hualien University of Education [emailprotected]
Abstract. In network-aware concept, applications are aware of network conditions and are adaptable to the varying environment to achieve acceptable and predictable performance. In this work, a solution for video on demand service that integrates wireless and wired networks by using the network aware concepts is proposed to reduce the blocking probability and dropping probability of mobile requests. Fuzzy logic inference system is employed to select appropriate cache relay nodes to cache published video streams and distribute them to different peers through service oriented architecture (SOA). SIP-based control protocol and IMS standard are adopted to ensure the possibility of heterogeneous communication and provide a framework for delivering real-time multimedia services over an IP-based network to ensure interoperability, roaming, and end-toend session management. The experimental results demonstrate that effectiveness and practicability of the proposed work. Keywords: video-on-demand, next generation network, quality of service, fuzzy logic
1 Introduction In the recent years, server load and network bandwidth are major performance issues in streaming video over the Internet. VoD resource sharing strategies such as broadcasting [1], patching [2], etc. can significantly improve the performance of VoD servers. A majority of existing VoD systems which follows Client-Server model is not scalable since servers always become the bottleneck as the requests increase. To alleviate servers’ traffic load, several multimedia distribution techniques, such as mirroring, caching and content distribution [3], have been developed and deployed to ease the traffic load of the servers in the literature. Recent research and experiments reveal that there are enough resources in the Internet to support large-scale media streaming in a peer-to-peer fashion [4]. It was reported that two major problems need to be addressed when we split published videos into segments and distribute them to different peers in a heterogeneous peer-to-peer network: (1) How to distribute and cache segments, taking into consideration that peers offer different resources and may leave at any time. (2) How to efficiently find the desired segments [4]. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 337–344, 2008. © Springer-Verlag Berlin Heidelberg 2008
338
C.-J. Huang et al.
The focus of this work is an attempt of choosing appropriate cache relay nodes to cache published video streams, distributing them to different peers through service oriented architecture (SOA), and using session initiation protocol (SIP) to ensure the possibility of heterogeneous communication. IP Multimedia Subsystem (IMS) [5] standard is adopted here because IMS employs SIP to access agnostic network and a framework for delivery of real-time multimedia services over an IP-based network can be consequently established to ensure interoperability, roaming, and end-to-end session management. It is well known that real-time streaming multimedia applications such as voice over IP, online games and VoD, often require fixed bit rate and are delay sensitive. The QoS guarantees for multimedia applications are especially important when networks resource is limited. The transmission overhead between a request node and a cache relay node is thus adopted as the essential parameter of the proposed VoD resource sharing scheme because it is an important factor to ensure the QoS guarantees for the scheme. Notably, fuzzy logic inference system is employed in this work to select appropriate cache relay nodes to cache published video streams and distribute them to different peers through SOA. The reason of using fuzzy logic technique is that it has been used to solve several resource assignment problems efficiently in ATM and wireless networks in the literature [6]. It is expected that the application of fuzzy logic technique in this work can assist in building a reliable and robust communication network environment among the heterogeneous networks and steadily transmitting video streaming by unicast mode. A series of experiments were conducted and the experimental results exhibit the feasibility of the proposed work. The remainder of this paper is organized as follows. Section 2 presents the architecture of the scalable QoS-aware VoD resource sharing scheme in next generation networks. The simulation results and analysis are given in Section 3. Conclusion is made in Section 4.
2 Architecture of Scalable Qos-Aware Vod Resource Sharing Scheme Figure 1 shows the architecture of the QoS-aware VoD resource sharing scheme using IMS and SOA. The IMS core control signaling is managed by the call session control function (CSCF) module. The CSCF module handles all signaling messages of the IMS terminals and lets the IMS terminals connect to the SIP server. Then, the SIP server will communicate with CSCF by using SIP to manage all SIP transactions, register users and provide services to them. The cache relay nodes are chosen when a VoD service request arrives. When the user requests the video watching service, the system will use SIP to send the querying packets and ask the SIP server for the list of cache relay nodes. Then the request node will also use SIP to send the query messages to all the cache relay nodes in the list. After receiving the confirmation message from the cache relay nodes, the request node will select the most appropriate cache relay node by using a fuzzy-logic based cache relay node selection module and then ask the chosen cache relay node to forward video streams. If no appropriate cache relay node is found, the request node will receive the video streams from the multimedia proxy in
A Scalable QoS-Aware VoD Resource Sharing Scheme for Next Generation Networks
339
the wire-lined environment instead. In case the stream channels of the multimedia proxy are all occupied, the request will wait for service before its preset timeout. After the request node starts receiving the video streams from the cache relay node or the multimedia proxy, the request node will be asked to evaluate whether it is suitable to serve as a cache relay node and notify its administrating SIP cluster proxy server. Video stream from cache node
SIP control signal
SIP server
Cache relay node
Service requester
Multimedia content server
Service layer
Multimedia proxy server Control layer CSCF IMS Transport layer
WiMAX
HSUPA
WiFi
Wired network
Mobile entity
Fig. 1. Architecture of the QoS-aware VoD resource sharing scheme
2.1 Fuzzy Cache Relay Node Selection Module In order to reduce the control overhead, a two-stage fuzzy cache relay node selection algorithm is employed to find the most appropriate relay nodes for forwarding VoD streams. Fig. 2 shows the input-output mapping for the fuzzy cache relay node selection module. The inputs to the fuzzy logic inference system include the cache relay node’s stability, lifetime, idle level, and the transmission overhead. The output of the fuzzy logic system is the estimated appropriateness level of the cache relay node. The appropriateness level of each candidate cache relay node is collected by the request node to determine the most appropriate cache relay node to forward the video streams.
Fig. 2. Architecture of fuzzy cache relay node selection module
The derivation of the cache relay node’s stability will be discussed in Section 2.4. The lifetime of the cache relay node represents the longest time period of the cache relay node that keeps on-line in the historical records. The distribution of cache relay
340
C.-J. Huang et al.
node lifetimes in peer-to-peer environment can be expressed by Pareto Cumulative distribution function as given in [7]. −1
F ( x ) = 1 − (1 +
α ⋅ ( x − 0.5) α ) ,α > 1 β
where x represents the lifetime, α represents heavy-tailed user lifetimes and for scale parameter to change the mean of the distribution. The idle level of each cache relay node is determined by,
(1) β
stands
x2
P( x ) = 1 N max − N used
1 −2 e 2π
(2)
N where denotes the counts of the mobile entity currently serving as the cache relay nodes, whereas N represents the upper bound that a mobile entity can serve as the cache relay nodes. N is determined by, x=
used
max
max
N max =
Bmax Bavg
(3)
where Bmax denotes the maximum bandwidth that the cache relay node can offer, and Bavg is the average bandwidth that a request node needs for video watching. 2.2 Estimation of Request Node’s Suitability to Serve as a Cache Relay Node Fuzzy logic inference system is adopted to determine the suitability of the request node that serves as a cache relay node. The buffer size of the request node, its computing capacity specified in terms of CPU clock speed and its stability are used as the three inputs to the fuzzy logic system. The output parameter of the inference engine is the estimated suitability of the request node. Notably, the prediction of request node’ stability will be explained in Section 2.3. 2.3 Prediction of Mobile Entities’ Stability The predictions of the stability of the cache relay node and that of the request node are expected as one essential input to the two fuzzy logic systems discussed in Sections 2.1 and 2.2, respectively. Derivation of the prediction of the mobile entities’ stability is discussed as follows. 2.3.1 The Probability Distribution This work assumes that each mobile entity is assigned a built-in GPS, which helps us to easily obtain speed and location information of the mobile entity. Meanwhile, the direction of mobile entity is assumed to be influenced mainly by its current speed vn and its current acceleration a. Three cases are considered in order to forecast the direction of the mobile entity at the next time period, Case 1: When a > 0 , it implies that the mobile entity speeds up its movement. The mobile entity has higher probability to keep moving in the same direction than to change direction. The probability of moving to each direction is determined by Normal distribution. The probability distribution function (PDF) is given by: π (x− )2 ⎧ 2 − ⎪θ π π 1 2 e 2σ dx if - ≤ θ ≤ ⎪∫ 2 2 ⎪ − π 2π σ θ ( ) = 2 ⎨ P QA π (x− )2 ⎪ 3π 2 − ⎪ 2 1 π 3π 2 e 2σ dx if ≤ θ ≤ ⎪∫ 2 2 ⎩ θ 2π σ
(4)
A Scalable QoS-Aware VoD Resource Sharing Scheme for Next Generation Networks
341
where θ is the complementary angle between MH’s current direction and its direction at next time period, and σ is the variance of θ . Case 2: When a < 0 , it implies that the mobile entity slows down, and the mobile entity has higher probability to change direction instead of keeping the same direction during the next period of time. The probability of each moving direction is determined by Skewness distribution, and the PDF is given by π 1 ( x − ×| |) 2 ⎧ θ 2 a − ⎪ 1 2 e 2σ dx ⎪ ∫ 2 π σ ⎪ −π ⎪ 2 π 1 ( x − ×| |) 2 ⎪ 2 a θ − 1 2 ⎪ 1− e 2σ dx ∫ ⎪ 2 π σ −π ⎪ 2 P (θ ) = ⎨ π 1 QA ( x − ×|1+ |) 2 ⎪ θ 2 a − 1 2σ 2 ⎪ e dx ∫ ⎪ −π 2π σ ⎪ 2 π 1 ⎪ ( x − ×|1+ |) 2 2 a ⎪ θ − 1 2σ 2 e dx ⎪1 − ∫ ⎪ −π 2π σ 2 ⎩
if θ <
if θ >
π 2
π 2
×|
1 | a
×|
1 | a
(5) if θ <
if θ >
π 2
× | 1+
1 | a
× |1+
1 | a
π 2
where is the complementary angle between MH’s current direction and its direction at next time period, σ denotes the variance of θ , and a is the acceleration rate. Case 3: When v = 0 , it implies that the mobile entity stops moving, the probability of each direction should be identical. The probability of the mobile entity’s moving direction is determined by Uniform distribution as illustrated, and the PDF is given by: 1 P (θ ) = (6) 360 n
QA
2.3.2 Position Forecast The forecast of the mobile entity’s position at the next time period is used to derive the disconnection probability in this work. The location of the mobile entity is first predicted by using the mobile entity’s current speed, vn and the probability distribution derived in the preceding subsection, 1 1 + sin θ d (θ ) = (v n × t + at 2 ) × ( ) × P(θ ) 2 2
(7)
where vn is the mobile entity’s current speed, t denotes the spanned time intervals, a is the acceleration, and θ is the angle between the normal of v and the mobile entity’s next moving direction. n
2.3.3 Disconnection Forecast The disconnection probability is derived based on the location of the mobile entity at the next time period. Take Fig. 3 as an example. The total area encircled by the black line in the figure represents the possible location that the mobile entity can reach at the next time period. Notably, the mobile entity that is currently located at dot Q will disconnect with the three WLANs as given in Fig. 3 when the mobile entity reach the area that is not covered by the three WLANs. To simplify the computation, we approximate the areas of BCD and EFG in Fig. 3 by two triangles. Accordingly, the disconnection probability of the mobile entity at the next time period can be derived by,
342
C.-J. Huang et al. θ1 +θ 2
P( Areano _ signal ) =
Areano _ signal Areatotal
( =
θ 1 +θ 2
1
4
∫ d (θ ) dθ − θ∫ r dθ ) + 2 (∑ b × h ) θ i
1
1
2π
∫ d (θ ) dθ
i =1
i
(8)
where α and β are the values of weight for the aspect of application, then the α and β will be changed in differ aspect, cap is computing capability, disk is disk cache size, P(d ) represent stably
Fig. 3. Maximal distance that the mobile entity can move during the next time period in an example of Case 1 in Section 2.3.1, in which v = 16 m s and a = 1
3 Simulation Results and Analysis We ran a series of experiments wherein arrival rate was varied from 50-250 requests per minute with server capacity fixed at 500 streams. It is assumed that the system contains 200 videos, each is 120 minutes long and the relative frequency of the individual video is stretched exponential distribution [8]. In accordance with the mobility model [9], the users can move without limitations inside the whole scenario according to a random walk model with the velocity and acceleration of each node moves range from 0 m/s to 15 m/s and from 0 m/s2 to 2 m/s2, respectively. In the analysis of our resource sharing and scheduling policies, the following performance measures are identical to the ones adopted in [10]: Blocking probability, dropping probability and average latency time. The compared schemes include the proposed VoD resource sharing scheme (SSVoD), the pure unicast VoD resource sharing scheme (UVoD) and a state-of-the-art multicast scheme in the literature, LAGRANGE-based multicast VoD resource sharing scheme (LMVoD) [11]. 3.1 Simulation Result Figures 4 to 6 show the comparisons of blocking probability, dropping probability, and average latency time, respectively. It can be observed that the proposed SSVoD scheme achieves less blocking probability than the other two schemes. The multimedia proxy server becomes the bottleneck in the pure unicast and LAGRANGE-based multicast VoD resource sharing schemes because the video streams are delivered merely by the multimedia proxy server in these two schemes. The later requesters will tend to leave the VoD system owing to their impatience when the bandwidth resources of multimedia proxy server in these two schemes are all occupied by early arrival. Fig. 5 shows that the dropping probabilities of the proposed SSVoD and pure unicast scheme are close because more new arriving clients under UVoD scheme are
A Scalable QoS-Aware VoD Resource Sharing Scheme for Next Generation Networks LMVoD
UVoD
SSVoD
150
200
343
1 0.8 Blocking 0.6 Probability 0.4 0.2 0 50
100
250
Client Arrival Rate(Requests/Minute)
Fig. 4. The blocking probability for the three compared schemes under varied client arrival rates LMVoD
UVoD
SSVoD
100
150
200
0.6 0.5 0.4 Dropping 0.3 Probability 0.2 0.1 0 50
250
Client Arrival Rate(Requests/Minute)
Fig. 5. The dropping probability for the three compared schemes under varied client arrival rates
blocked as exhibited in Fig. 4 and more bandwidth of each base station in UVoD scheme can be assigned to the clients that handoff from neighboring base stations. Figure 6 shows that the average latency of our proposed SSVoD scheme is lower than those of the other two schemes. The average latency of the pure unicast VoD resource sharing scheme is much higher than the other two schemes because the unicast scheme has to provide a channel to serve each individual requester and the limited bandwidth is easier to be exhausted than the other two schemes. LMVoD
UVoD
SSVoD
100
150
200
1 0.8 Average 0.6 Latency (Mintues) 0.4 0.2 0 50
250
Client Arrival Rate(Requests/Minute)
Fig. 6. The average latency for the three compared schemes under varied client arrival rates
Based on the experimental results as given in Figs. 4-6, it can be seen that the proposed SSVoD scheme achieves better performance than LMVoD and UVoD in terms of three performance metrics, including blocking probability of the new requests, dropping probability of handoffs, and the average latency. Nevertheless, the computation overhead of the prediction of mobile entities’ stability and the employment of fuzzy logic systems is much higher than the other two schemes. The reduction of the computation overhead is thus the focus of the future work.
344
C.-J. Huang et al.
4 Conclusions In this work, a solution for video on demand service that integrates wireless and wired networks by using the network aware concepts is proposed to reduce the blocking probability and dropping probability of mobile requests. Fuzzy logic inference system is employed to select appropriate cache relay nodes to cache published video streams and distribute them to different peers through service oriented architecture (SOA). Notably, the prediction of mobile entities’ stability that is required as one essential input to the fuzzy logic system is also derived. A series of simulations was run to compare the proposed VoD resource sharing schemes. The experimental results demonstrate that effectiveness and practicability of the proposed work in terms of the performance metrics, including blocking probability, dropping probability and average latency delay. Acknowledgements. The authors would like to thank the National Science Council of the Republic of China, Taiwan for financially supporting this research under Contract No. NSC 96-2628-E-026-001-MY3.
References 1. Azad, S.A., Murshed, M.: An Efficient Transmission Scheme for Minimizing User Waiting Time in Video-On-Demand Systems. IEEE Communications Letters 11(3), 285– 287 (2007) 2. Kong, C.W., Lee, J.Y.B., Hamdi, M., Li, V.O.K.: Turbo-slice-and-patch: an algorithm for metropolitan scale VBR video streaming. IEEE Transactions on Circuits and Systems for Video Technology 16(3), 338–353 (2006) 3. Ho, K.M., Poon, W.F., Lo, K.T.: Performance Study of Large-Scale Video Streaming Services in Highly Heterogeneous Environment. IEEE Transactions on Broadcasting 53(4), 763–773 (2007) 4. Liu, J.C., Rao, S.G., Li, B., Zhang, H.: Opportunities and Challenges of Peer-to-Peer Internet Video Broadcast. Proceedings of the IEEE 96(1), 11–24 (2008) 5. 3GPP: Technical Specification Group Services and System Aspects. IP Multimedia Subsystem (IMS), Stage 2, TS 23.228 6. Hirota, K.: Industrial Applications of Fuzzy Technology. Springer, Heidelberg (1993) 7. Leonard, D., Zhongmei, Y., Rai, V., Loguinov, D.: On Lifetime-Based Node Failure and Stochastic Resilience of Decentralized Peer-to-Peer Networks. IEEE/ACM Transactions on Networking 15(3), 644–656 (2007) 8. Yu, H., et al.: Understanding user behavior in large scale video-on-demand systems. In: Proceedings of EuroSys. (2006) 9. Verdone, R., Zanella, A.: On the Effect of User Mobility in Mobile Radio Systems With Distributed DCA. IEEE Transactions on Vehicular Technology 56(2), 874–887 (2007) 10. Aggarwal, C.C., Wolf, J.L., Yu, P.S.: The Maximum Factor Queue Length Batching Scheme for Video-on-Demand Systems. IEEE Trans. on Computers 50(2), 97–110, 789– 800 (2007) 11. Yang, D.N., Chen, M.S.: Efficient Resource Allocation for Wireless Multicast. IEEE Transactions on Mobile Computing 7(4), 387–400 (2008)
Brain Mechanisms for Making, Breaking, and Changing Rules Daniel S. Levine Department of Psychology, University of Texas at Arlington, Arlington, TX 76019-0528 [emailprotected]
Abstract. Individuals differ widely, and the same person varies over time, in their tendency to seek maximum information versus their tendency to follow the simplest heuristics. Neuroimaging studies suggest which brain regions might mediate the balance between knowledge maximization and heuristic simplification. The amygdala is more activated in individuals who use primitive heuristics, whereas two areas of the frontal lobes are more activated in individuals with a strong knowledge drive: one area involved in detecting risk or conflict, and another involved in choosing task-appropriate responses. Both of these motivations have engineering uses. There is benefit to understanding a situation at a high enough level to respond in a flexible manner when the context is complex and time allows detailed consideration. Yet simplifying heuristics can yield benefits when the context is routine or when time is limited.
1 The Need for Rules How is the behavior of a human being, or intelligent machine, organized? In order to respond quickly and efficiently to complex and variable situations, it is necessary not only to plan our actions but to encode some flexible and context-dependent response patterns. When there is sufficient regularity to these behavioral patterns, we refer to them as rules, even though they may not be rules in the sense of being invariant and absolute like digital computer commands. As Levine and Perlovsky [1] noted, there are two competing sets of biological imperatives behind rule formation. On the one hand we possess a drive to understand our environment as fully as possible and create realistic internal models of the world; this has been called a knowledge instinct ([2], [3]). On the other hand, we possess a drive to simplify our cognitive processing and rely on rules of thumb, or heuristics, which can be accessed quickly and do not require careful thought ([4, 5]). The author has been engaged for several years in a research program to understand theoretically the trade-off between the knowledge instinct and effort minimization ([1], [6-8]), and more broadly, the process by which the brain decides between simpler and more complex rules of action [9]. Some tentative neural network hypotheses, involving adaptive resonance, have emerged about mechanisms for choices between levels of rule complexity. We will return to network models after a discussion of relevant results from cognitive neuroscience. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 345 – 355, 2008. © Springer-Verlag Berlin Heidelberg 2008
346
D.S. Levine
2 Brain Regions There is evidence from recent brain imaging studies [10, 11] that different brain processes can be involved in individuals who follow heuristics versus those who violate heuristics. For instance, DeMartino et al [10] looked at participants who did and did not follow the heuristics Tversky and Kahneman [4, 5] had discovered for risk taking or risk avoidance. For example, subjects asked to consider two programs to combat an Asian disease expected to kill 600 people tend to prefer the certain saving of 200 people to a 1/3 probability of saving all 600 with 2/3 probability of saving none. However, subjects also tend to prefer a 1/3 probability of nobody dying with a 2/3 probability of 600 dying to the certainty of 400 dying. The choices are identical in actual effect, but are perceived differently because of differences in frame of reference (comparing hypothetical states in one case with the state of all being alive, in the other case with the state of all dying). Tversky and Kahneman explain their data by noting that “choices involving gains are often risk averse while choices involving losses are often risk taking” ([5], p. 453). The fMRI study of DeMartino et al. [10] showed significant differences between individuals who were and were not susceptible to framing effects. This study used a monetary analog of Tversky and Kahneman’s “Asian disease” problem. Subjects had to choose between a sure option and a gamble option, where the sure option was expressed either in terms of gains (keep ₤20 out of the ₤50 they initially received) or in terms of losses (lose ₤30 out of the initial ₤50). The majority of subjects chose the sure option with a gain frame and the gamble option with a loss frame, yet significant minorities chose the gamble with a gain frame or the sure option with a loss frame, in violation of the usual heuristics. fMRI measurements showed that heuristics-violators had more activation than heuristics-followers in two areas of cortex: the orbitofrontal cortex (OFC) and anterior cingulate (ACC). Conversely, those subjects whose choices were consistent with the framing heuristic had more activation in the amygdala, the area below the cortex that is most involved with primary emotional experience. A more demanding task involving processing base rates of category membership [11] showed selective activation of another prefrontal region, the dorsolateral prefrontal cortex (DLPFC) in participants who utilized probability information correctly. To understand all these fMRI results, we note that OFC, ACC, and DLPFC, are all parts of what is generally called the brain’s executive system. OFC damage often leads to decision making deficits and socially inappropriate behavior, as in the famous 19th century patient Phineas Gage [12]. These clinical observations and animal lesion studies suggest the OFC forms and sustains mental linkages between specific sensory events, or motor actions, and positive or negative emotional states. Long-term storage of emotional valences is likely to be at connections between the OFC and amygdala [13]. The DLPFC is a working memory region, and involved in information processing at a higher level of abstraction than the OFC. For example, OFC lesions in monkeys were found to impair learning of changes in reward value within a stimulus dimension, whereas DLPFC lesion impaired learning of changes in which dimension was relevant [14].
Brain Mechanisms for Making, Breaking, and Changing Rules
347
The ACC is activated when a subject must select or switch among different interpretations or aspects of a stimulus [15]. Recent theories of ACC function have emphasized its role in detection either of potential response error or of conflict between signals promoting competing responses [16, 17]. Hence, executive regions of prefrontal cortex are more readily activated when knowledge motivations are engaged than when simplifying heuristics are employed. This is closely related to the long-established distinction in cognitive psychology between controlled and automatic processing (e.g., [18]). What is the variable that changes between these two modes? The data suggest that the interplay between the two modes can be studied by means of at least one neural network parameter that varies both between individuals and between domains in the life of the same individual. Such a parameter is vigilance, used in adaptive resonance theory [19].
3 Network Theory The adaptive resonance theory (ART) networks developed by Gail Carpenter and Stephen Grossberg are a versatile family of networks for representing interactions between multiple levels of processing. In particular, ART has been widely used as a network for linking together representations of categories and of their attributes. A very brief review of ART follows: much more detail appears in [19]. In its simplest form (Fig. 1), the ART network consists of two interconnected layers of nodes, called F1 and F2. F1 is assumed to consist of nodes that respond to input features. F2 is assumed to consist of nodes that respond to categories of F1 node activity patterns. Synaptic connections between the two layers are modifiable in both directions, according to two different learning laws. The F1 nodes do not directly interact with each other, but the F2 nodes are connected in a recurrent competitive on-center off-surround network, a common device for making choices in short-term memory. In this version, the simplest form of choice (winner-take-all) is made: only the F2 node receiving the largest signal from F1 becomes active. To compute the signal received by a given F2 node, the activity of each F1 node in response to the input pattern is weighted by the strength of the bottom-up synapses from that F1 node to the given F2 node, and all these weighted activities are added. Inhibition from the F2 layer to the F1 layer shuts off most neural activity at F1 if there is mismatch between the input pattern and the active category’s prototype. Only with a sufficiently large match are enough of the same F1 nodes excited by both the input and the active F2 category node, which is needed to overcome nonspecific inhibition from F2. If match occurs, then F1 activity is large because many nodes are simultaneously excited by input and prototype. If mismatch occurs, F2 reset shuts off the active category node. The criterion for matching is that some function representing the overlap between top-down and bottom-up patterns must be greater some positive constant r, which is called the vigilance of the network. This allows for control of the degree of abstractness or generalization: low vigilance makes the network learn broad categories, whereas high vigilance makes the network learn more specific categories.
348
D.S. Levine
ATTENTIONAL SUBSYSTEM
ORIENTING SUBSYSTEM
F2
RESET r F1 INPUT PATTERN Fig. 1. ART 1 network. Short-term memory is encoded at the feature level F1 and category level F2, and learning at interlevel synapses. The orienting system generates reset of F2 when bottom-up and top-down patterns mismatch at F1, that is, when the ratio of F1 activity to input activity is less than the vigilance r. Arrows denote excitation, filled circles inhibition, and semicircles learning. (Adapted from [10] with the permission of Academic Press.)
The ART principle is versatile and applicable to a range of brain systems. In particular, this general network principle can be adapted to the interactions among the prefrontal executive regions and amygdala discussed in the last section. Fig. 2 shows a three-layer hierarchical ART network for knowledge encoding and processing. Simple heuristics involve feedback between amygdala and OFC, and do not engage the other two prefrontal executive areas (the error detector at ACC and the complex working memory analyzer at DLPFC). The individual with higher vigilance in the pursuit of knowledge, whether this vigilance is general or domain-specific, is sensitive to mismatches between the results of those heuristics and logical truth. This leads in turn to engagement of the other executive regions. The more “vigilant” subjects in [10] had increased neural activity in OFC and ACC, but not in DLPFC as Fig. 2 might suggest. The framing-influenced monetary decision task, however, may not be complex or abstract enough to engage the DLPFC. One partial test of my theory, in progress, is an fMRI study of better versus worse decision makers on a more complex decision task, one involving competition between different abstract principles, in this case between frequency and probability [20]. We now discuss some results from decision psychology on confusion between high probability and high frequency, and our ongoing computer simulations [8] and fMRI studies [20] of these results.
4 Probability Versus Frequency A variety of decision making tasks tend to evoke two or more competing rules, one of which is normatively superior to the others. One example is a task that involves
Brain Mechanisms for Making, Breaking, and Changing Rules
349
choosing a larger probability versus a larger (absolute) frequency of either a gain or a loss (e.g., [21-23]). Yamagishi [21] found that the majority of their participants judging the riskiness of various causes of death were more influenced by the described numerosity of deaths than by the probability of death. For example, they rated cancer as riskier when it was described as killing 1,286 out of 10,000 people than when it was described as killing 24.14 out of 100 people. The phenomenon whereby the same probability is experienced as larger if it comes as a ratio of two larger numbers has been called ratio bias (e.g., [23]). Pacini and Epstein [23] found that many of their participants seemed to be aware of their ratio biases, but conflicted between emotional and rational influences on their choices. To limit our theoretical domain, we simulated a particular version of the frequency/probability decision task due to Denes-Raj and Epstein [22]. The aim of our work is to generalize from modeling this restricted case to a more general model of rule selection; in particular, selection between a simple, readily available, but nonoptimal rule and a more complex but more accurate rule.
F3(DLPFC)
ART MODULE FOR COMPLEX RULES F2 (OFC) ERROR (ACC)
ART MODULE FOR HEURISTICS
F1 (AMYGDALA) Fig. 2. Network that can either minimize effort or maximize knowledge. With low vigilance, the ART module combining F1 and F2 makes decisions based on simple heuristics. With high vigilance, discontent with outcomes of simple decisions generates activity in the orienting (error) module (ACC). ACC activity in turn may generate a search for more complex decision rules at F3 (DLPFC). (Reprinted from [7], with the permission of IEEE).
Participants in the Denes-Raj and Epstein experiment were assigned randomly either to a win condition or a loss condition. In the win condition, they were shown two bowls containing red and white jellybeans, told they would win a certain amount of
350
D.S. Levine
money if they randomly selected a red jellybean, and instructed to choose which bowl gave them the best chance of winning money. In one of the bowls, there were a total of 10 jellybeans out of which 1 was red. In the other bowl, there were a total of 100 jellybeans out of which some number greater than 1 but less than 10 were red. Hence, choice of the bowl with a larger frequency of red jellybeans was always nonoptimal, because the probability of drawing red from that bowl was less than 1/10. The loss condition used the same two bowls, but the participants were told they would lose money if they selected a red jellybean, so the bowl with more jellybeans was the optimal choice. Fig. 3 shows percentages of nonoptimal responses in both win and loss conditions. “Nonoptimal response size” in that graph means the difference between the chosen option and 10 out of 100 which was equivalent to 1 out of 10; that is, 1 represents the choice of 9 out of 100 over 1 out of 10, 2 represents the choice of 8 out of 100, etcetera. In the win condition, the majority of participants made the nonoptimal choice when the choice was 9 out of 100 (nonoptimal response size 1) versus 1 out of 10, and about a quarter still chose 5 out of 100 (nonoptimal response size 5) over 1 out of 10. Larger response sizes are not shown in the graph, but no participant chose 2 out 100 over 1 out of 10. In the loss condition, the pattern of drop-off was similar but there were significantly fewer nonoptimal choices. The authors explained the difference between win and loss conditions by noting that the loss condition involves negative affect, which leads to more careful (and therefore, at least sometimes, rational) consideration of alternatives. Our model is based on assumed functions of different areas of the prefrontal executive system, notably the anterior cingulate cortex (ACC) and dorsolateral prefrontal cortex (DLFPC). First let us describe three basic types of decision makers (DMs) on this task (with the caveat that these characterizations may either be personality-dependent, task-dependent, or both): (a) DMs who choose, say, 8-in-100 over 1-in-10 and are not aware of any reason to do otherwise; (b) DMs who choose 8-in-100 over 1-in-10 but verbalize a numerical reason for making the opposite choice; (c) DMs who correctly choose 1-in-10 over 8-in-100. Our hypothesis is that types (b) and (c) will show more ACC activation than type (a), and type (c) will show more DLPFC activation than either type (a) or (b). The hypothesis about ACC is based on that region’s role both in detection of potential errors and in response conflicts [16, 17]. The hypothesis about DLPFC is based recent fMRI studies showing DLPFC activity correlates with accurate stimulusresponse contingencies and rule-based response selection [e.g., 24]. Our neural network for simulation of Pacini and Epstein’s data is based on Fig. 2 but much simplified from that figure. It does not include detailed neural connections of the ACC and DLPFC, but incorporates differences between individuals in two
Brain Mechanisms for Making, Breaking, and Changing Rules
351
% RESPONSES 60 50 40 30 20 10
(9/100) (8/100) (7/100) (6/100) (5/100)
0 Fig. 3. Experimental results [23] on percentages of time the higher-frequency, lowerprobability alternative was chosen. Graph with dark circles: win condition; graph with gray squares: loss condition. (Adapted from [22] with permission of one of the authors.)
parameters representing ACC and DLPFC function. One or another of these parameters could correlate with the psychological construct of need for cognition, defined as “intrinsic motivation to engage and enjoy effortful cognitive activities” ([25], pp. 142-143). Decisions between two alternative gambles are based on either one of two rules, a heuristic rule based on frequencies and a ratio rule based on probabilities. The “ACC” parameter, called α, determines the likelihood of choosing the ratio rule for a given pair of gambles. If the ratio rule is chosen, the “DLPFC” parameter, called δ, determines the probability that the optimal response is made. The heuristic rule is defined by frequencies of alternatives and the fuzzy concept of “much larger than one” [26]. If ACC activity is too small, decision is controlled by the amygdala using the rule "choose k out of 100 over 1 out of 10 if k is much larger than one." The fuzzy membership function of k in the “much larger” category, called ψ(k), is a ramp function that is linear between the values 0 at k = 3 and 1 at k = 13: 0, k < 3
ψ (k ) = .1( k − 3), 3 ≤ k ≤ 13
(1)
1, k > 13
The ACC parameter α, across trials (representing all choices made by all participants), varies uniformly over the interval [0, 1]. If the function ψ(k) of (1) is less than or equal to α, the heuristic “much larger” rule is chosen. Otherwise, a rule of “largest ratio of numerator to denominator” is chosen, that is heuristic chosen if α ≤ ψ (k ) ratio rule chosen if α > ψ (k )
(2)
352
D.S. Levine
But the ratio rule does not guarantee the higher probability alternative (in this case, 1 out of 10) will be chosen. This is because of the tuning curves of numerosity detectors in the parietal cortex [27], a possible neural basis for numerical imprecision. Our algorithm assumes that numerators and denominators of both alternatives (k, 100, 1, and 10) each activate a Gaussian distribution of parietal numerosity detectors. Hence, before ratios are computed and compared, each of those numbers is multiplied by a normally distributed quantity with mean 1. To obtain the standard deviation of this variable multiplier, we assume DLPFC inputs to parietal cortex sharpen the tuning of numerosity detectors, hence set standard deviation of each normal quantity equal to .1(1-δ). Across trials we assumed δ is normally distributed with mean .5 and standard deviation .25: the wide deviation mimics the wide range in human need for cognition [26]. Hence if the ratio rule is chosen, the nonoptimal choice of k out of 100 over 1 out of 10 is made if the perceived ratio of red jellybeans to total jellybeans is higher in the first alternative than in the second alternative. Based on the Gaussian perturbations of numerators and denominators described above, this means that a nonoptimal choice is made if and only if k (1 + φ r1 ) (1 + φ r3 ) > 100(1 + φ r2 ) 10(1 + φ r4 ) where φ = .1(1 − δ ) ri , i = 1,2,3,4 are unit normals
(3)
The ratios in Eq. (3) can be interpreted as steady states of a shunting on-center off-surround network, as follows. Present the two alternatives as inputs to the network shown in Fig. 4. Assuming perfect accuracy of numerical perceptions (otherwise the values k, 100, 10, and 1 in the circles of Fig. 4 are replaced by their normally perturbed values), the activity of the node u1, representing the reward value of Bowl 1, can be described by a nonlinear differential equation with excitatory input k and inhibitory input 100-k: du1 = − λu1 + (1 − u1 )k − u1 (100 − k ) dt
(4)
where λ is a decay rate. If we assume time is short enough that λ=0, set the derivative in (4) to be 0 and solve for u1, we obtain a steady state value u1 = k/100, exactly the probability of drawing red from Bowl 1. Similarly, the steady state value of u2 comes out to be 1/10, the probability of drawing red from Bowl 2. Mutual nonrecurrent inhibition between those nodes leads to choice of the bowl with larger ui . By Eq. (2), since α is uniformly distributed across [0,1], the probability of the ratio rule being chosen for a given value of k is 1-ψ(k) as defined by Eq. (1). Assuming that the heuristic rule does not engage the ACC and thereby always leads to a nonoptimal choice, the probability of a nonoptimal choice becomes ψ(k) + (1-ψ(k))r(k)
(5)
where r(k) is the probability that the inequality (3) holds, that is, the probability of a nonoptimal choice if the ratio rule is chosen.
Brain Mechanisms for Making, Breaking, and Changing Rules
353
We graphed (5) as a function of nonoptimal response size (which = 10-k) in order to simulate the data curves in Fig. 3. This was done via Monte Carlo simulations in MATLAB R2006a, the program being run 1000 times with δ varying normally about a mean of .5 with standard deviation .25. Fig. 5 shows the results of this simulation of the win and loss conditions in the experiment of [23]. The simulation fits the data of Fig. 3 fairly closely, going from over 60% nonoptimal responses for k = 1 to slightly above 20% nonoptimal responses for k = 5. For the loss condition, the same program was run except that the probability of ψ(k) of staying with the heuristic rule was cut in half, and again there was good fit to Fig. 3. Bowl 1
Bowl 2
100-k
9
u1
u2
Nonoptimal choice
Optimal choice
Fig. 4. Network representing choice between k-out-of-100 and 1-out-of-10 assuming ratio rule. k out of 100 is interpreted as k good and 100-k bad; 1 out of 10 is 1 good and 9 bad. Probabilities of drawing red in each bowl are steady state values of Eq. (4) for node activity at u1 and its analog at u2, representing reward values of the two bowls.
% RESPONSES
60 50 40 30 20 10
(9/100) (8/100) (7/100) (6/100) (5/100)
0 NONOPTIMAL RESPONSE SIZE Fig. 5. Results of our simulation of Denes-Raj and Epstein [6] (win: black circles; lose: gray circles)
354
D.S. Levine
5 Discussion Despite the errors and information losses that heuristic rules can lead to, most psychologists believe heuristics have evolutionary value. Heuristic simplification is particularly useful when a decision must be made rapidly on incomplete information, or when the stakes of the decision are not high enough to justify the effort of thorough deliberation. Our frequency/probability network is a step toward modeling the more general process of deciding appropriate rules for decision tasks, across several levels of rule complexity. fMRI studies and neural network theories suggest that ACC is sensitive to the level of complexity of tasks, or in other words, to the potential for error if the wrong rule is chosen. If the task is determined to be relatively effortful, the ACC then recruits other brain regions, such as DLPFC, required for processing task details. We hope our modeling can ultimately lead to data-driven hypotheses about those connections.
References 1. Levine, D.S., Perlovsky, L.I.: Simplifying Heuristics versus Careful Thinking: Scientific Analysis of Millennial Spiritual Issues. Zygon (in press) 2. Perlovsky, L.I.: Toward Physics of the Mind: Concepts, Emotions, Consciousness, and Symbols. Phys. Life Rev. 3, 23–55 (2006) 3. Perlovsky, L.I.: Neural Networks and Intellect: Using Model Based Concepts. Oxford University Press, New York (2001) 4. Tversky, A., Kahneman, D.: Judgment Under Uncertainty: Heuristics and Biases. Science 185, 1124–1131 (1974) 5. Tversky, A., Kahneman, D.: The Framing of Decisions and the Rationality of Choice. Science 211, 453–458 (1981) 6. Levine, D.S.: How Does the Brain Create, Change, and Selectively Override Its Rules of Conduct? In: Kozma, R.F., Perlovsky, L.I. (eds.) Neurodynamics of Higher-level Cognition and Consciousness, pp. 163–181. Springer, Heidelberg (2007) 7. Levine, D.S.: Seek Simplicity and Distrust it: Knowledge Maximization versus Effort Minimization. In: Proceedings of KIMAS (2007) 8. Levine, D.S., Perlovsky, L.I.: A Network Model Of Rational Versus Irrational Choices On A Probability Maximization Task. In: Proceedings of the World Congress on Computational Intelligence (2008) 9. Levine, D.S.: Angels, Devils, and Censors in the Brain. ComPlexus 2, 35–59 (2005) 10. DeMartino, B., Kumaran, D., Seymour, B., Dolan, R.: Frames, Biases, and Rational Decision-making in the Human Brain. Science 313, 684–687 (2006) 11. DeNeys, W., Vartanian, O., Goel, V: Smarter Than We Think: When Our Brain Detects We’re Biased. Psychological Science (in press) 12. Damasio, A.R.: Descartes’ Error. Grosset/Putnam, New York (1994) 13. Schoenbaum, G., Setlow, B., Saddoris, M., Gallagher, M.: Encoding Predicted Outcome and Acquired Value in Orbitofrontal Cortex During Cue Sampling Depends Upon Input from Basolateral Amygdala. Neuron 39, 855–867 (2003) 14. Dias, R., Robbins, T., Roberts, A.: Dissociation in Prefrontal Cortex of Affective and Attentional Shifts. Nature 380, 69–72 (1996)
Brain Mechanisms for Making, Breaking, and Changing Rules
355
15. Posner, M., Petersen, S.: The Attention System of the Human Brain. Annual Review of Neuroscience 13, 25–42 (1990) 16. Botvinick, M., Braver, T., Barch, D., Carter, C., Cohen, J.: Conflict Monitoring and Cognitive Control. Psych. Rev. 108, 624–652 (2001) 17. Brown, J., Braver, T.: Learned Predictions of Error Likelihood in the Anterior Cingulate Cortex. Science 307, 1118–1121 (2005) 18. Shallice, T.: From Neuropsychology to Mental Structure. Cambridge University Press, New York (1988) 19. Carpenter, G.A., Grossberg, S.: A Massively Parallel Architecture for a Self Organizing Neural Pattern Recognition Machine. Comp. Vis., Graph., & Image Proc. 37, 54–115 (1987) 20. Krawczyk, D., Levine, D.S., Ramirez, P.A., Togun, I., Robinson, R.: fMRI Study of Rational versus Irrational Choices on a Ratio Bias Task. Poster submitted to Annual Meeting of the Society for Judgment and Decision Making (2008) 21. Yamagishi, K.: When a 12.86% Mortality is More Dangerous Than 24.14%: Implications for Risk Communication. Appl. Cog. Psych. 11, 495–506 (1997) 22. Denes-Raj, V., Epstein, S.: Conflict Between Intuitive and Rational Processing: When People Behave Against Their Better Judgment. J. Pers. Soc. Psych. 66, 819–829 (1994) 23. Pacini, R., Epstein, S.: The Interaction of Three Facets of Concrete Thinking in a Game of Chance. Think. & Reas. 5, 303–325 (1999) 24. Bunge, S.A.: How We Use Rules to Select Actions: A Review of Evidence from Cognitive Neuroscience. Cog., Aff., & Behav. Neurosci. 4, 564–579 (2004) 25. Curşeu, P.L.: Need for Cognition and Rationality in Decision-making. Stud. Psych. 48, 141–156 (2006) 26. Zadeh, L.: Fuzzy sets. Inf. & Control 8, 338–353 (1965) 27. Piazza, M., Izard, V., Pinel, P., LeBihan, D., Dehaene, S.: Tuning Curves for Approximate Numerosity in the Human Intraparietal Sulcus. Neuron 44, 547–555 (2004)
Implementation of a Landscape Lighting System to Display Images Gi-Ju Sun, Sung-Jae Cho, Chang-Beom Kim, and Cheol-Hong Moon Gwangju University, Gwangju, Korea [emailprotected], [emailprotected], [emailprotected], [emailprotected] http://www.gwangju.ac.kr
Abstract. The system implemented in this study consists of a PC, MASTER, SLAVEs and MODULEs. The PC sets the various landscape lighting displays, and the image files can be sent to the MASTER through a virtual serial port connected to the USB (Universal Serial Bus). The MASTER sends a sync signal to the SLAVE. The SLAVE uses the signal received from the MASTER and the landscape lighting display pattern. The video file is saved in the NAND Flash memory and the R, G, B signals are separated using the self-made display signal and sent to the MODULE so that it can display the image. Keywords: Implementation of a Landscape Lighting System to Display Images.
1
Introduction
With the recent developments that have taken place in industry, qualitative improvements have been made in our exterior spaces. This has led to the creation of cultural space from living space. This study describes an outdoor landscape lighting system suitable for various spaces depending on the environment, and aims to implement a image landscape lighting system with large-scale and low resolution characteristics. This system is based on the existing DMX512 landscape lights and full color billboards. A nighttime landscape lighting system visible from both short and long distances and capable of displaying various information and image like a billboard is described.
2
Image Display Method
This study used the method of directly sending non-compressed image files to implement videos. Compressed files have the advantage of reduced size at the risk of a slight distortion of the image. However, high class hardware equipment is necessary to decode compressed image. Therefore, the small image used in D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 356–363, 2008. c Springer-Verlag Berlin Heidelberg 2008
Implementation of a Landscape Lighting System to Display Images
357
Fig. 1. Image Storing Method
this study were bitmap image that were split up into sizes appropriate for each SLAVE and MODULE. They are transmitted through the UART (Universal Asynchronous Receiver Transmitter) and saved in the NAND Flash memory. The image in the NAND Flash memory were displayed according to the sync signal of the MASTER.
3
Hardware Design
The system implemented in this study consists of a PC, a MASTER, and multiple SLAVEs and MODULEs. The PC sets the various landscape lighting displays, and the image files can be sent to the MASTER through a virtual serial port connected to the USB. The MASTER sends a sync signal to the SLAVE, and provides the SLAVE with the landscape lighting display pattern saved in the memory. Therefore, the image file is saved in the NAND Flash memory and the self-made display signal is divided into the R, G, B signals and sent to the MODULEs so that they can display the information.
4 4.1
Software PC Program
The PC software was compiled using VISUAL C++ 6.0, and allows the landscape lighting and image display functions to be set and previews to be seen. Image files are used for the videos and the playing frame speed between the image files can be adjusted to make it possible to display the image in the form of a video. The image file format used is BMP. The characteristics of BMP are that it is a non-compressed file, and the header has the simplest structure of the many different formats. The protocol used to send the data from the PC to the MASTER has the structure shown in Table 1. In the PC, various steps have to be taken to display image files, and these are structured as shown in Table 1, and there are a maximum of 128 steps. The protocol of Table 1 is made of continuous characters and sent to the MASTER.
358
G.-J. Sun et al. Table 1. The Master protocol sent from the PC to MASTER
V ariable
F unction
Bytesused
Start(F0) Function(00 60) Address(00 01) Step number(00 FF) Display function(00 FF) Number of colors(00 30) Palette(00 FF) Display speed(00 FF) repetition(00 30) Color maintenance(00 01) Other functions(00) End(F3)
start of protocol Distinguishes the various functions Sets whether or not SLAVE is used Step number of current data Lighting display function distribution number How many colors are used What colors are used Time for each step(Minimum 100mS) How many time steps are repeated Set whether it will start off or covered Other function settings End of protocol
1 1 64 1 1 1 30 1 1 1 10 1
4.2
MASTER Program
The overall software structure of the MASTER program is shown in Figure 2. The internal Timer0 generates time Ticks regularly. When a time Tick is generated, the Flash reads the memory and generates a protocol, and increases the counter. The generated protocol is sent to the UART1 port and then it waits for another time Tick to occur. If another time Tick occurs, the counter value is increased, the next sector of the Flash memory is read and a protocol is generated, and when it comes to the last sector, the counter is set to 0 and the display function repeats itself from the beginning. Separately from this part, the MASTER always waits for data from the UART0 port connected to the PC. If data is received from UART0, the MASTER stops the Timer0 and saves the data in the internal Flash memory.
Fig. 2. MASTER software structure
Implementation of a Landscape Lighting System to Display Images
359
Table 2. The SLAVE protocol sent from the MASTER to SLAVE V ariable
F unction
Bytesused
Start(F0) Function(00 60) Address(00 01) Step number(00 FF) Display function(00 FF) Number of colors(00 30) Palette(00 FF) Display speed(00 FF) Color maintenance(00 01) Other functions(00) End(F3)
start of protocol Distinguishes the various functions Sets whether or not SLAVE is used Step number of current data Lighting display function distribution number How many colors are used What colors are used Time for each step(Minimum 100mS) Set whether it will start off or covered Other function settings End of protocol
1 1 64 1 1 1 30 1 1 10 1
The protocol that sends from MASTER to SLAVE has the structure shown in Table 2. Many steps are saved on the image display functions in the MASTER, and each step saved is structured as shown in Table 2. 4.3
SLAVE Program
Figure 3 shows the structure of the SLAVE program. The data transmitted from the MASTER enters through the UART0 port. Data is transmitted regularly and the MASTER protocol is analyzed. If it is a image display signal, the NAND Flash memory plugged into the device is read to create a image signal. If it is a landscape lighting display signal, the SLAVE algorithm generates a image video signal. The image video signal generates a protocol that fits the SLAVE format which is sent to MODULE through the UART1 port.
Fig. 3. SLAVE software structure
360
5 5.1
G.-J. Sun et al.
Test and Results PC Program
The virtual serial port connected to the USB is opened and connected to the MASTER. Figure 4 shows the user interface which allows the landscape lighting
Fig. 4. PC program
and image file to be set. It is structured in the form of individual steps and up to 128 steps can be saved. The desired color, display method, display time and number of times can be set and saved. This process is carried out sequentially from the first step to the final step, and when the last step has been carried out, it repeats itself from the first step in a continuous loop. 5.2
MASTER Control
Figure 5 shows the MASTER produced in this study. The MASTER saves the landscape lighting display function received from the PC. A sync signal is also made, so that the system operates overall as one organism. The MASTER has a 20*4LINE LCD connected to it, so that it can directly interpret correct landscape lighting display order or pattern. The pattern can be changed or the environment can be set through the switch and LCD. A multi-USB port was used to expand the USB port, and an RS485 interface was used to obtain a communication speed of 1.8432Mbps. The RS232 port of the PC only has a speed of 115.2Kbps. Therefore, the USB-232 convertor chip, FT232R, was used to create a communication speed of 1.8432Mbps.
Fig. 5. MASTER
Implementation of a Landscape Lighting System to Display Images
361
Fig. 6. SLAVE
5.3
SLAVE Control
The SLAVE uses the signal sent from the MASTER to directly create various colors and algorithms and send data to the MODULE. According to the sync signal sent from the MASTER, R, G, B data are generated in the 128 MODULEs at a speed of 115.2Kbps. The UART port is used to send the data according to the physical standard of the RS485. The reason why RS-485 was selected is because the gap between the SLAVE and MODULE is more than 5 100M, and can be up to 1Km. The micro-controller used in this study has 2 independent UART ports. UART0 is connected to the MASTER and UART1 is connected to the operational MODULE. The image signals are divided into blocks and are divided again and saved in the NAND Flash memories of the SLAVE. All of the pattern signals are generated by the SLAVE. Therefore, high speed operation is necessary and an internal PLL circuit was used to achieve an operation speed of 100MHz. 5.4
MODULE
The image and landscape lighting pattern signal generated in the SLAVE is divided into R, G, B signals and transmitted. The gap between the MODULEs is quite far, being around 5 100M according to the MODULE installation environment and location. Therefore, the RS485 interface and a micro-controller were used. Since many MODULEs are needed, an inexpensive micro-controller must be used. Figure 7 shows the part of the MODULE that is used to display the image treatment or landscape lighting pattern, and consists of 100 cells.
Fig. 7. 10*10 pixels MODULE
362
G.-J. Sun et al.
Fig. 8. MODULE lighting creation
Fig. 9. Landscape lighting system that actually adopts the plan of this study
5.5
Test Results
The test used the USB port from the PC to download data. Therefore, a sync signal was generated by the MASTER and the SLAVE operated organically. Figure 8 is a photograph that shows a image displaying various colors. This is an example of an actual product generated based on the plan generated in this study. 1 MASTER, 8 SLAVEs and 16 MODULEs were made to commercially create 8 rows for the 8 chimneys of the Busan Complex Heating Base. Figure 9 shows them operating together.
6
Conclusion
Since the invention of electric lights, people have been able to live fuller lives with nighttime activities. Lights are no longer just functional lights that light up a dark area. Lighting has been developed to take into consideration the emotions of people, harmonize with the surrounding environment and generate a new nighttime landscape. The system made in this study was divided into a PC, MASTER, SLAVEs and MODULEs. These all act together to operate as an organic system. The PC sets the various landscape lighting displays. A software program was coded in VISUAL C++ 6.0 to send data to the MASTER through the virtual serial port connected to the USB device. The MASTER uses an internal timer to create Ticks at regular intervals and sent a sync signal to the SLAVE. It also creates a pattern in a pre-determined order, according to the
Implementation of a Landscape Lighting System to Display Images
363
landscape lighting display pattern value received from the PC. The SLAVEs, of which there are a maximum of 64, recognize the sync signal sent by the MASTER and the landscape lighting display pattern set by the protocol. The signal recognized here generates R, G, B signals in the landscape lighting display signal and sends them to the MODULE. The MODULE, which is made up of a maximum of 128 cells, extracts the R, G, B signals sent from the SLAVE according to their addresses. 1 internal high speed timer is used and the colors are renewed simultaneously through 3 permanent R, G, B variables and 3 temporary R, G, B variables. In addition, the R, G, B LEDs are operated in 256 stages through the 8bit PWM control, so that the desired color can be displayed. The system generated in this study can display beautiful colors through the use of a large-scale architecture and floor lightings. It is fully recognizable from a great distance, and so can be used to maximum effect as an advertisement facility with promotional effects.
Acknowledgements This research was financially supported by the Ministry of Commerce, Industry and Energy(MOCIE) and Korea Industrial Technology Foundation(KOTEF) through the Human Resource Training Project for Regional Innovation, and the Korea Institute of Industrial Technology Evaluation and Planning(ITEP) through the Technical Development of Regional Industry.
References 1. Chang, K.: A Study on Nighttime Landscape Display Methods for Outdoor Spaces. Master’s degree thesis, pp. 1–2, 48-54, Sangmyung University (2005) 2. FTDI: FT232RL Datasheet, pp. 2–5 (2002) 3. Toshiba: NAND Flash Applications Design Guide, pp. 6 (2003) 4. Samsung: Flash Memory K9F1G08X0A Datasheet, pp. 10–30 (2006) 5. Motorola: MC34063A Datasheet, pp. 1–9 (2002) 6. Cha, Y.: One chip Micro Computer 8051, Dada Media, pp. 149–180 (1997) 7. SiLabs: C8051F12x-13x Datasheet (2006) 8. SiLabs: C8051F30x Datasheet (2006)
A Hybrid CARV Architecture for Pervasive Computing Environments SoonGohn Kim1 and Eung Nam Ko2 1 Division of Computer and Game Science, Joongbu University, 101 Daehakro, Chubu-Meon, GumsanGun, Chungnam, 312-702, Korea [emailprotected] 2 Division of Information & Communication, Baekseok University, 115, Anseo-Dong, Cheonan, Chungnam, 330-704, Korea [emailprotected]
Abstract. This paper describes a hybrid CARV software architecture that is running on situation-aware middleware for a web based distance education system which has an object with a various information for each session. There are two approaches to software architecture on which distributed, collaborative applications are based. Those include CACV(Centralized-Abstraction and Centralized-View) and RARV(Replicated-Abstraction and Replicated-View). We propose an adaptive concurrency control QOS agent based on a hybrid software architecture which is adopting the advantage of CACV and RARV for situation-aware middleware.
1 Introduction The multimedia distance education is concentrated an interest about new education methods by join an education engineering and an information communication technology[1, 2 3]. A general web-based distance system uses video data and audio data to provide synchronize between teacher and student. In a ubiquitous computing environment, computing anytime, anywhere, any devices, the concept of has played very important roles in matching user needs with available computing resources in transparent manner in dynamic environments [4]. It is difficult to avoid a problem of the seam in the ubiquitous computing environment for seamless services. Thus, there is a great need for concurrency control algorithm in situation-aware middleware to provide dependable services in ubiquitous computing. The system for a web based multimedia distance education includes several features such as audio, video, whiteboard, etc, running on situation-aware middleware for internet environment which is able to share HTML format. This paper describes a hybrid software architecture that is running on situation-aware middleware for a web based distance education system which has an object with a various information for each session. There are two approaches to software architecture on which distributed, collaborative applications are based. Those include CACV(Centralized-Abstraction and Centralized-View) and RARV(Replicated-Abstraction and Replicated-View). We D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 380–387, 2008. © Springer-Verlag Berlin Heidelberg 2008
A Hybrid CARV Architecture for Pervasive Computing Environments
381
propose an adaptive hybrid software architecture CARV which is adopting the advantage of CACV and RARV for situation-aware.
2 QOS Layered Model Traditional QoS (ISO standards) was provided by the network layer of the communication system. An enhancement of QoS was achieved through inducing QoS transport services. For multimedia communication system, the QoS notion must be extended because many other services contribute to the end-to-end service quality. The multimedia communication system consists of three layers: application, system(including communication services and operating system services), and devices(network and multimedia devices). As shown in Figure 1, the organization of QoS-layered model for the multimedia communication system includes 4 layers. The four layers consist of a user QoS layer, an application QoS layer, a system QoS layer and a network QoS layer[5].
User
(User QoS)
Application (Application QoS) System
MM Devices
(System QoS)
Networks
(Device QoS) Fig. 1. QoS Layering3
(Network QoS) Our Approach
2.1 RCSM(Reconfigurable Context-Sensitive Middleware) In the Context Toolkit, a predefined context is acquired and processed in context widgets and then reported to the application through application-initiated queries and callback functions. In this Reconfigurable Context-Sensitive Middleware(RCSM), Stephen S. Yau et al.[6] proposed a new approach in designing their middleware to directly trigger the appropriate actions in an application rather than have the
382
S. Kim and E.N. Ko
Situation-Aware Application Objects
RCSM Ephemeral Group Communication Group
SASMA
O S
FTA
SAACCA
<
SAUIA
AMA
MCA
Transport Layer Protocols for Ad Hoc Networks
sensors
Fig. 2. RCSM’s integrated components
application itself decide which method(or action) to activate based on context. RCSM provides an Object-based framework for supporting context-sensitive applications. Figure 2 shows how all of RCSM’s components are layered inside a device. All of RCSM’s components are layered inside a device. The Object Request Broker of RCSM (R-ORB) assumes the availability of reliable transport protocols; one R-ORB per device is sufficient. The number of ADaptive object Containers (ADC)s depends on the number of context-sensitive objects in the device. ADCs periodically collect the necessary “raw context data” through the R-ORB, which in turn collects the data from sensors and the operating system. Initially, each ADC
A Hybrid CARV Architecture for Pervasive Computing Environments
383
registers with the R-ORB to express its needs for contexts and to publish the corresponding context-sensitive interface. RCSM is called reconfigurable because it allows addition or deletion of individual ADCs during runtime (to manage new or existing context-sensitive application objects) without affecting other runtime operations inside RCSM. Ubiquitous applications require use of various contexts to adaptively communicate with each other across multiple network environments, such as mobile ad hoc networks, Internet, and mobile phone networks. However, it did not include concurrency control QoS support in the architecture. 2.2 QOS Layered Model for Multimedia Distance Education System Our proposed model aims at supporting concurrency control mechanism running RCSM in order to provide ubiquitous, seamless services. An example of situationaware applications is a multimedia distance education system. As shown in Figure 3, multimedia distance education systems include advances services, coordination services, cooperation services, and media services. Advances services consist of various subclass modules. This subclass module provides the basic services, while advances services layer supports mixture of various basic services. Advances services include creation/deletion of shared video window and of creation/deletion of shared window. Shared window object provides free hand line, straight line, box, text to collaboration work participant and the participants can use such as the same file in this shared windows. Coordination services include session control module, and floor control module. Session control module controls the access to the whole session. This session can be meeting, distance learning, game and development of any software. Session control also facilities the access and limits the access to the whole session. Session control module monitors the session starts, terminates, joins and invites and it also permit an another sub-sessions. Session control module has an object with an various information for each session and it also supports multicasting with this information. Floor control controls the person who can talk, and person who can change the information. Mechanism of the floor control consists of braining storming, priority, mediated, token-passing and time-out, in floor control module, it provides explicit floor and braining storming. Cooperation services include window overlays module, and window sharing module. Window overlays module is laid a simple sketching tool over a copied window. It provides all users with transparent background and tele-pointers. So, all users can point and gesture. Window sharing module is a combination of window copying, window overlays, floor control and session control. All users are able to interact through application shared by them. One user is running a single user application. The other users get to see exactly what this user sees. The application can allow different users to interact with the application by selecting one of the user’s keyboard and mouse the source of input. Media services support convenient services for application using DOORAE environment. Supplied services are the creation and deletion of the service object for media use, media share between remote user. Media services modules limit the service by hardware constraint. We assumed throughout this paper the model shown in Figure 3. This model consists 3 QoS layer: Application QoS(including application layer and DOORAE layer), System QoS(including system layer), Network QoS(including communication
384
S. Kim and E.N. Ko
layer). In this paper, we concentrate in the application QoS layer. There are several constraints which must be satisfied to provide guarantees during multimedia transmission. They are time, space, device, frequency, and reliability constraints. Time constraints include delays. Space constraints are such as system buffers. Device constraints are such as frame grabbers allocation. Frequency constraints include network bandwidth and system bandwidth for data transmission. In this paper, we discussed concurrency control constraints.
Application services (Application QOS) <Application Layer>
Cooperation services Server
Media services
<System Layer> Operating services
(System QOS)
Fig. 3. QOS Layered Model for Multimedia Distance Education System
2.3 Web Based Multimedia Distance Education System This paper proposes an URL synchronization function used in WebNote with remote collaborative education system based on situation-aware middleware for CBM (Computer Based Multimedia). As shown in Figure 4, this paper describes an integrated model which supports object drawing, application sharing, and web synchronization methods of sharing information through a common view between concurrently collaborating users. This proposed model consists of multiple view layout and each layout control, a unified user interface, and defines the attributes of a shared object.
A Hybrid CARV Architecture for Pervasive Computing Environments
385
User Interface Multiple View Manager White Board & Error Control Module
HTML Layout Module
Image Layout Engine
Web Synchronization Agent
Application Sharing Agent
Network Transport Module Fig. 4. An Integrated Model with Web Synchronization
2.4 Hybrid Software Architecture for Concurrency Control and URL Synchronization As shown in Figure 5, you can see the relationship between WebNote Instance and WebNote SM.
WebNote Instance
WebNote Instance
Daemon
WebNote Session Manager
Daemon
WebNote Session Manager
Internet
Web URL/Hook/Application Synchronization Server GSM
Session Monitor
Traffic Monitor
Fig. 5. The relationship between WebNote Instance & WebNote SM
386
S. Kim and E.N. Ko
This system is used to be one of services that are implemented on Remote Education System. This Remote Education System includes several features such as Audio, Video, Whiteboard, WebNote running on Internet environment which is able to share HTML(Hyper Text Mark-up Language). We have implemented WebNote function to do so either. While session is ongoing, almost all participants are able to exchange HTML documents. For this reason, we need the URL synchronization. To win over such dilemma for centralized or replicated architecture, a combined approach, CARV(the centralized abstraction and replicated view) architecture is used to realize the application sharing agent. This system is used to be one of services that are implemented on Remote Education System. This Remote Education System includes several features such as Audio, Video, Whiteboard, WebNote running on Internet environment which is able to share HTML(Hyper Text Mark-up Language). We have implemented WebNote function to do so either. While session is ongoing, almost all participants are able to exchange HTML documents. For this reason, we need the URL synchronization.
3 Simulation Results As shown in Table 1, conventional multimedia distance education systems are Shastra, MERMAID, MMconf, and CECED. Table 1. Analysis of Conventional Multimedia Distance Education System
Function
ShaStra UNIX Purdue Univ. USA 1994
MERMAID UNIX NEC, JAPAN
MMconf UNIX CamBridge USA
CECED UNIX SRI, International
1990
1990
1993
Server /client
Server /client
Replicated
protocol
TCP/IP
TCP/IP
Centralized or Replicated TCP/IP
CARV architecture running on RCSM Web Based running on RCSM
No
No
No
TCP/IP multicast No
No
No
No
No
OS Development Location Development Year Structure
A Hybrid CARV Architecture for Pervasive Computing Environments
387
You can see the characteristic function of each system function for multimedia distance education. A proposed main structure is distributed architecture but for application program sharing, centralized architecture is used. The problem of rapid increase in communication load due to growth in number of participants was solved by letting only one transmission even with presence of many users, using simultaneous broadcasting. Basically, there are two architectures to implement such collaborative applications; the centralized architecture and replicated architecture, which are in the opposite side of performance spectrum. Because the centralized architecture has to transmit huge amount of view traffic over network medium, its performance is reduced to contaminate the benefits of its simple architecture to share a copy of conventional application program. On the other hand, the replicated architecture guarantees better performance in virtue of its reduced communication costs. However, because the replicated architecture is based on the replication of a copy of application program, it is not suit to use for application sharing realization.
4 Conclusions The focus of situation-aware ubiquitous computing has increased lately. An example of situation-aware applications is a multimedia education system. We proposed an adaptive concurrency control QOS agent based on a hybrid software architecture which is adopting the advantage of CACV and RARV for situation-aware middleware. It described a hybrid software architecture that is running on situationaware ubiquitous computing for a web based distance education system which has an object with a various information for each session and it also supports multicasting with this information. This paper proposed a new model of concurrency control by analyzing the window and attributes of the attributes of the object, and based on this, a mechanism that offers a seamless view without interfering with concurrency control is also suggested. We remain these QoS resolution strategies as future work.
References 1. Holfelder, W.: Interactive remote recording and playback of multicast videoconferences. In: Steinmetz, R. (ed.) IDMS 1997. LNCS, vol. 1309. Springer, Heidelberg (1997) 2. Fortino, G., Nigro, L.: A Cooperative Playback System for On-Demand Multimedia Sessions over Internet. In: Proceedings of IEEE International Conference on 2000 Multimedia and Expo ICME 2000, NY, USA, July 30 - August 2, vol. I, pp. 41–44 (2000) 3. Boyle, T.: Design for Multimedia Learning. Prentice Hall Europe, Englewood Cliffs (1997) 4. Yau, S.S., Karim, F.: Contention-Sensitive Middleware for Real-time Software in Ubiquitous Computing Environments. In: Proc. 4th IEEE Int’l Symp. Object-Oriented Realtime Distributed Computing (ISORC 2001), pp. 163–170 (May 2001) 5. Steinmetz, R., Nahrstedt, K.: Multimedia: computing, communications & Applications. Prentice Hall, Inc., Englewood Cliffs (1995) 6. Yau, S.S., Wang, Y., Huang, D.: A Middleware Situation-Aware Contract Specification Language for Ubiquitous Computing. In: FTDCS (2003) 7. Saha, D., Mukherjee, A.: Pervasive Computing: a Paradigm for the 21st Century. IEEE Computer 36, 25–31 (2003)
Probability-Based Coverage Algorithm for 3D Wireless Sensor Networks Feng Chen, Peng Jiang , and Anke Xue Institute of Information and Control, Hangzhou Dianzi University, Zhejiang, China [emailprotected]
Abstract. We propose a probability-based K-coverage control approach (PKCCA) for the contradiction between the inherent uncertainty and strong faulttolerance ability and robustness of wireless sensor networks in the monitoring of three-dimensional space. The entire monitoring region is to be covered by at least K sensors with probability T. A grid distribution and greedy heuristic are introduced to determine the best placement of sensors. Our approach will be terminated when a preset upper limit on the number of sensors is reached, or the coverage task is completed. We implement our approach and compare it against traditional random and uniform deployment. The results show that PKCCA uses less sensors to complete the same coverage task or same sensors to reach higher coverage degree. We also analyze the cases of preferential coverage for subregions. PKCCA is a better solution with high reliability and robustness to detect special environment with weak propagation of signal. Keywords: Wireless sensor networks, three dimensional coverage, coverage control, K-coverage.
1 Introduction Wireless Sensor Networks (WSNs) is a new information acquisition technology emerging with the development of wireless communication technology, embedded computing technology, sensor technology and MEMS technology. It has a great application prospect in military, auto electronic, industrial control, environment monitoring, medical, intelligent household and environment monitoring (see [1],[2],[3] and [4]). The performance of WSNs depends largely on the deployment of the sensors and the life cycle associated with power consumption. At present, more and more attention has been paid on the efficient disposition of power. The focus of algorithms of deployment for WSNs proposed recently are coverage and connectivity which are the most important factors affecting the performance of sensor network and efficient utilization of energy[5]. The coverage control issue for WSNs is to optimize the distribution of all resources by placement of sensors and determination of routings under the constraints of energy of sensor nodes, communication bandwidth and calculating and processing ability of network, improving the service qualities such as perception, sensing, communication and monitoring[6].
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 364–371, 2008. c Springer-Verlag Berlin Heidelberg 2008
Probability-Based Coverage Algorithm for 3D Wireless Sensor Networks
365
Objective coverage based on grid is a universal coverage control approach for WSNs such as [7] and [8]. This approach is used to model the network with two- or threedimensional grid and complete regional / objective coverage by placing sensors on appropriate grid points when a pre-determined geographical environment is given [6]. [9] presents a grid-based coverage control algorithm to solve the sensor deployment under the constraints of price and full coverage. However, this approach relies on ”perfect” sensor detection, i.e. a sensor is expected to yield a binary yes / no detection outcome in every case. Aiming at the possibility of sensors’ failure, the relationship between coverage and connectivity for unreliable sensor networks is discussed in [10]. [11]proposes a probability-based coverage control algorithm for the inherent uncertainty associated with sensor readings. The algorithms above can only achieve 1-coverage, that is each grid point in the monitoring region should be detected by at least one sensor. But strong fault-tolerance ability and robustness are required in some applications such as military application where K-coverage is desired[12]. K-coverage means each grid point in the monitoring region should be detected by at least K sensors, where K >1. Therefore, a probability-based K-coverage control approach for three-dimensional WSNs (for short, PKCCA) is proposed in this paper. We model the three-dimensional monitoring space as a grid network and ensure each grid point to be detected with probability T by at least K sensors. Our approach provides very high reliability and simplifies calculation by developing a sensor threshold model ( for short, STM). The case of preferential coverage (preferential coverage degree and preferential monitoring precision) for some grid points is also analyzed and simulated. PKCCA can be adopted in the detection of special environment with weak propagation of signals and high reliability requirements. The remainder of the paper is organized as follows: In Section 2, STM is developed; Section 3 describes the theory and realization of PKCCA; PKCCA, the random and uniform deployment are compared in Section 4, which indicate that PKCCA can reach the same coverage with less sensors or achieve higher coverage with the same number of sensors; And the paper is concluded in Section 5.
2 Sensor Threshold Model (STM) Note that the choice of a sensor detection model does not limit the applicability of the coverage control algorithm in any way. The detection model is simply an input parameter to the placement algorithm. We adopt sensor threshold model (STM) for the monitoring of environment with weak propagation of signal. We assume that the three-dimensional monitoring region is made up of grid points and the granularity of the grid is determined by the accuracy with which the sensor placement is desired. We assume the probability of detection of a target by a sensor varies exponentially with the distance between the target and the sensor. If a target is at a distance d from a sensor, then it is detected by that sensor with probability e−α d , where the detection parameter α (α > 0)indicates the rate at which its detection probability diminishes with distance. For every two grid points i and j in the monitoring region, we define detection probability pij to denote the probability that a target at grid point j is detected by a sensor at grid point i(see [8]). The presence of obstacles in the monitoring region is not taken into consideration, which means pij = pji . Considering
366
F. Chen, P. Jiang, and A. Xue
the rapid signal attenuation in some special applications as water environment detection, the detection probability between sensor and target is set to be 0 when it is less than β. Thus, the perception range of a sensor can be regarded as a sphere whose radius is associated with the perceiving ability of sensor. Parameter β known as the monitoring range threshold determines the size of perception sphere. STM can be described by: e−α d , e−α d > β, (1) pij = 0, e−α d < β.
3 PKCCA Description We divide the three-dimensional monitoring region into an n by n by n grid which has a total of N = n3 grid points. The goal of PKCCA is to determine the minimum number of sensors and their locations such that every grid point is K-covered with probability of T. The detection precision T and coverage degree K are two inputs to the algorithm. We define a sensor detection matrix D = [pij ]N ×N for all pairs of grid points in the monitoring region which consists of n3 rows and n3 columns. Where pij is calculated by STM in Section 2. From the sensor detection matrix D, We determine the miss probability matrix M = [mij ]N ×N , where mij = 1 − pij . PKCCA uses a greedy heuristic in [11] to determine the best placement of one sensor at a time. It’s iterative, and it places one sensor in the monitoring region during each iteration until a preset upper limit on the number of sensors is reached, or sufficient coverage degree with probability T of the grid points is achieved. We define a vector L = (L1 , L2 , ...LN ) to denote the set of coverage degree of grid points. Where Li denotes the coverage degree of grid point i during the deployment. L is initialized to the all-0 vector, i.e. L = (0, 0...0) . A sensor is placed on one grid point such that the sum of the miss probabilities of the sensor for other grid points in the monitoring region is minimum. When the coverage degree of one grid point reaches an appointed value, update the miss probability matrix and delete the corresponding row and column in the miss probability matrix to decrease its order. The steps of PKCCA are outlined as follows: 1) Initialize the number of sensors as 0; 2) Place a sensor on grid point k such that Σk is minimum, where Σk = mk1 , mk2 , ...mkN , k = 1, 2, ...N ; 3) If mki < Mmax , i = 1, 2, ...N , then add 1 to Li and update the vector L = (L1 , L2 , ...LN ) ; 4) Add 1 to the number of sensors; 5) If Li has reached the specified coverage degree, delete ith row and column from M matrix, let N be N-1; 6) Return to Step 2) until Li ≥ Cov, i = 1, 2, ...N , or a preset upper limit on the number of sensors deployed is reached. For the case of preferential coverage (preferential coverage degree and preferential detection precision) of some sub-regions, following modification is taken to PKCCA respectively:
Probability-Based Coverage Algorithm for 3D Wireless Sensor Networks
367
1) The case of preferential coverage degree: set a distinct coverage degree Covi for each grid point. For Step 6) of PKCCA, the loop terminates either when a present upper limit on the number of sensors is reached, or coverage degree of each grid point is achieved. 2) The case of preferential detection precision: set a distinct detection precision Ti for each grid point, so the maximum value of the miss probability permitted for grid i i is ( 1 − Ti ). Thus, each miss probability mki compares with Mmax of point i Mmax each grid point in Step 3) of PKCCA. Where, Mmax is (1-T) denoting the maximum miss probability that permitted for any grid point and Cov is the coverage degree of the grid point excepted to achieve. The ergodic number of each node is N, so the computational complexity for PKCCA is O(mN ) where m is the number of sensors required for a given coverage of entire three-dimensional monitoring region. Since m is not known in advance, we use N as an upper bound on m and obtain the computational complexity of O(N 2 ) .
4 Simulation Results Our aim is to Optimize the number of sensors and determine their placement with the given detection precision T and coverage degree K. We divide the three-dimensional monitoring region as an 5 by 5 by 5 grid which has a total of 125 grid points. Each grid point should be detected by at least 3 sensors with probability T. Detection parameter α denotes the rate at which sensor’s detection probability to target diminishes with the distance between them. Monitoring range threshold β indicates perception range of sensor. Maximum miss probability permitted for each grid Mmax shows the detection precision of the coverage control approach. Selection of the parameters above directly influence the number of sensors required to accomplish the coverage task. For different α , β and Mmax , we compare the number of sensors deployed and the sensitivity to parameters in PKCCA, RDKCCA (Random Deployed K-Coverage Control Algorithm for Three-dimensional WSNs, that is to randomly place a sensor during each iteration until a preset upper limit on the number of sensors is reached, or sufficient coverage degree with probability T of the grid points is achieved) and UDKCCA (Uniformly Deployed K-Coverage Control Algorithm for Three-dimensional WSNs, that is to uniformly place a fixed number of sensors in the monitoring region and observe its coverage degree reached with probability T). Fig.1 compares the trend of the number of sensors required in PKCCA and RDKCCA with increase of detection parameter α. To reach 3-coverage for the same monitoring region, the number of sensors required in PKCCA and RDKCCA increases with the increasing of α . For larger α, the sensing ability of sensors is weaker from STM, thus more sensors are required to complete the coverage task. In addition, for the same α, the sensors required in PKCCA are significantly less than in RDKCCA, which is more obvious when α is very large, that is the sensing ability of sensors is very weak. Therefore, PKCCA can be applied to the monitoring environment with weak signal transmission capacity and provide high reliability. Correspondingly, the trend of the number of sensors required in PKCCA and RDKCCA with increase of monitoring range threshold β and maximum miss probability
368
F. Chen, P. Jiang, and A. Xue 1200 PKCCA RDKCCA
Number of sensors
1000
800
600
400
200
0 0.2
0.25
0.3
0.35 0.4 0.45 0.5 Detection parameter
0.55
0.6
0.65
Fig. 1. Trend of number of sensors required in PKCCA and RDKCCA with variation of α, and β =0.2, Mmax =0.4 a
b
220 200
1400 PKCCA RDKCCA
PKCCA RDKCCA
1200
180 1000
Number of sensors
Number of sensors
160 140 120 100 80
800
600
400
60 200 40 20 0.1
0.2 0.3 0.4 Monitoring range threshold
0.5
0 0.1
0.2 0.3 0.4 Maximum miss probability permitted Mmax
0.5
Fig. 2. a. The trend of number of sensors required in PKCCA and RDKCCA with increase of β, and α =0.3, Mmax =0.4; b. The trend of number of sensors required in PKCCA and RDKCCA with increase of Mmax , and α =0.3, β =0.2.
permitted Mmax is compared in Fig 2. We can see from Fig. 2.a that to reach 3-coverage for the same monitoring region, the number of sensors required in PKCCA and RDKCCA are more with larger β. For larger β, the volume of the sensing sphere of sensor in STM is smaller, that is the sensing range of each sensor is narrower, so more sensors are required to complete the coverage task. In addition, we can see that for the same monitoring range threshold β, the sensors required in PKCCA are significantly less than in RDKCCA, which is more obvious when β is very large, that is the sensing range of sensors is very small. The change of β has much influence on RDKCCA, while PKCCA shows high robustness to the change of β. From Fig. 2.b, we can see that to reach 3-coverage for the same monitoring region, the number of sensors in PKCCA and RDKCCA are less with larger Mmax . For the larger Mmax , less sensors are required to accomplish the coverage task with lower requirements. In addition, for the same Mmax , the sensors required in PKCCA are significantly less than in RDKCCA. Especially when Mmax is very small, the sensors required in PKCCA are far less than in RDKCCA to achieve the same coverage with higher requirement. The case of preferential coverage of sub-region in the whole monitoring region is shown in Fig.3. Case of preferential coverage degree and preferential detection precision of the sub-region are analyzed in Fig.4 respectively.
Probability-Based Coverage Algorithm for 3D Wireless Sensor Networks
369
Monitoring region Preferential coverage sub-region
Fig. 3. Sub-region with preferential coverage in the monitoring region a
b
1000 PKCCA RDKCCA
1000
Number of sensors
Number of sensors
800
1200
600
400
200
0 0.2
PKCCA RDKCCA
800 600 400 200
0.3
0.4 0.5 0.6 Detection parameter
0.7
0 0.1 0.2 0.3 0.4 0.5 Maximum miss probability permitted Mmax
Fig. 4. a. The trend of number of sensors required in PKCCA and RDKCCA with α in the case of preferential coverage degree, and β =0.2, Mmax =0.4; b. The trend of number of sensors required in PKCCA and RDKCCA with Mmax in the case of preferential detection precision, and α =0.3, Mmax =0.4.
We assume the grid points in preferential coverage region in Fig.3 are deemed to reach 3 - coverage, and other grid points only reach 2 - coverage. The trend of the number of sensors required in PKCCA and RDKCCA with increase of detection parameter α is compared. We can see from Fig.4.a that the number of sensors in PKCCA and RDKCCA increases with the increasing of α and the sensors required in PKCCA are significantly less than in RDKCCA, and the reason of which is as shown in Fig.1. Comparing Fig 1 with Fig 4.a, we can see that for the same α , the sensors required in PKCCA of the case of preferential coverage degree (grid points in preferential coverage region reach 3-coverage, other grid points reach 2-coverage) are far less than non-preferential case (all grid points reach 3-coverage), because the former monitoring requirement is less strict than the latter. Due to the randomness of deployment of sensors, for the same α, the qualitative conclusion as PKCCA can’t be obtained comparing the number of sensors in RDKCCA in both cases. Correspondingly, for the case of preferential detection precision, we assume the grid points in preferential coverage region in Fig.3 are deemed to be detected with probability of 95%, which means the maximum miss probability permitted for these grid points pri is 0.05, and maximum miss probability permitted Mmax for other grid points Mmax
370
F. Chen, P. Jiang, and A. Xue
10 PKCCA RDKCCA UDKCCA
9 8
Coverage degree
7 6 5 4 3 2 1 0 20
30
40
50 60 70 Number of sensors
80
90
100
Fig. 5. The coverage degree obtained in PKCCA, RDKCCA and UNKCCA with variation of number of sensors, and α =0.3, β =0.2, Mmax =0.4
are set to be varied values. The trend of the number of sensors required in PKCCA and RDKCCA with increase of Mmax is observed. From Fig.4.b, with increase of Mmax ,the sensors required in PKCCA are significantly less than in RDKCCA to achieve 3coverage for the same region, which is more obvious when Mmax is very small, that is the detection precision is very high. Comparing Fig.2.b with Fig.4.b, the sensors required in PKCCA of the case of preferential detection precision ( detection precision in preferential coverage region is 95%, and detection precision of other grid points is varied but less than 95%) are more than non-preferential case (detection precision of all grid points are varied but less than 95%), because the former monitoring requirement is more strict than the latter.Due to the randomness of deployment of sensors, for the same Mmax , the qualitative conclusion as PKCCA can’t be obtained comparing the number of sensors required in RDKCCA in both cases. In Fig.4.b, the curve of the number of sensors required in RDKCCA fluctuates for the randomness of deployment of sensors. A fixed number of sensors are placed in the three-dimensional monitoring region to achieve an integral coverage high enough. The coverage degree obtained in PKCCA, RDKCCA and UDKCCA with the varied numbers of sensors is compared in Fig.5. The integral coverage of the three approaches is larger with more sensors. For the same number of sensors, the PKCCA performs better than UDKCCA which also outperforms RDKCCA. E.g., placing 100 sensors in the monitoring region, RDKCCA can reach 3coverage, UDKCCA can obtain 6-coverage, while PKCCA can achieve 10-coverage.
5 Conclusions WSNs has a wide application in environment monitoring where the monitoring region always is a three-dimensional space and there is inherent uncertainty in sensor readings. But strong fault-tolerance ability and robustness are required in the sensor monitoring system. A probability-based K- coverage control approach for three-dimensional WSNs, PKCCA, is proposed in this paper. For variation of detection parameter α , monitoring range threshold β and maximum miss probability permitted for each grid point Mmax , the simulation results indicate that PKCCA uses less sensors than RDKCCA to complete the same three-dimensional coverage task and PKCCA can reach higher
Probability-Based Coverage Algorithm for 3D Wireless Sensor Networks
371
coverage degree than RDKCCA and UDKCCA for the same number of sensors. The case of preferential coverage (preferential coverage degree and preferential monitoring precision) for some grid points is also analyzed and simulated, which shows PKCCA outperforms RDKCCA. It is a better solution with high reliability and robustness to detect special environment with weak propagation of signals. Acknowledgment. This paper is supported by the National Natural Science Foundation of China (NSFC-60604024),the Important Project of Ministry of Education (207044), the Key Science and Technology Plan Program of Science and Technology Department of ZheJiang Province (2008C23097) , the Scientific Research Plan Program of Education Department of Zhejiang Province (20060246) and the Sustentation Plan Program of Youth Teacher in University of Zhejiang Province (ZX060221) .
References 1. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y.: Wireless Sensor Networks: A survey. J. Computer Networks 38, 393–422 (2002) 2. Ye, C., Sun, L.M., Liao, Y.: Energy Management of Sensor Networks. J. Comput. Engineer. Appl. 8, 196–198 (2004) 3. Cui, J.H., Kong, J.J., Marion, G., Zhou, S.L.: Challenges: Building Scalable and Distributed Underwater Wireless Sensor (UWSNs) for Aquatic Applications. UCONN CSE Technical Report (2005) 4. Jinag, P.: Survey on Key Technology of WSN-Based Wetland Water Quality Remote Realtime Monitoring System. J. Chinese Sensor. Actuat. 20, 183–186 (2007) 5. Liu, L.P., Wang, Z., Sun, Y.X.: Survey on Coverage in Wireless Sensor Networks Deployment. Journal of Electronics and Information Technology 28, 1752–1757 (2006) 6. Ren, Y., Zhang, S.D., Zhang, H.K.: Theories and Algorithms of Coverage Control for Wireless Sensor Networks. Journal of Software 17, 422–433 (2006) 7. Bai, H.X., Chen, X., Guan, X.H.: Preserving Coverage for Wireless Sensor Networks of Nodes with Various Sensing Ranges. In: ICNSC 2006. Proceedings of the 2006 IEEE International Conference, pp. 54–59. IEEE Press, Los Alamitos (2006) 8. Yang, B.W., Yu, H.Y., Li, H., Hou, H.F.: A Coverage-preserving Density Control Algorithm based-on Cooperation in Wireless Sensor Networks. In: WiCOM 2006, pp. 1–4 (2006) 9. Lin, F.Y.S., Chiu, P.L.: A Near-optimal Sensor Placement Algorithm to Achieve Complete Coverage/Discrimination in Sensor Networks. J. IEEE Communications Letters 9, 43–45 (2005) 10. Shakkottai, S., Srikant, R., Shroff, N.: Unreliable Sensor Grids: Coverage, Connectivity and Diameter. In: Bauer, F. (ed.) Proc. of the IEEE Infocom, pp. 1073–1083. IEEE Press, San Francisco (2003) 11. Dhillon, S.S., Chakrabarty, K., Iyengar, S.S.: Sensor Placement for Grid Coverage under Imprecise Detections. In: Proc. of International Conference on Information Fusion, USA, pp. 1581–1587 (2002) 12. Santosh, K., Ten, H.L., Jozef, B.: On K-coverage in a Mostly Sleeping Sensor Network. In: MobiCom 2004 (2004)
Simulating an Adaptive Fault Tolerance for Situation-Aware Ubiquitous Computing EungNam Ko1 and SoonGohn Kim2 1
Division of Information & Communication, Baekseok University, 115, Anseo-Dong, Cheonan, Chungnam, 330-704, Korea [emailprotected] 2 Division of Computer and Game Science, Joongbu University, 101 Daehakro, Chubu-Meon, GumsanGun, Chungnam, 312-702, Korea [emailprotected]
Abstract. This paper explains a performance analysis of an error sharing system running on multimedia collaboration in situation-aware middleware using the rule-based SES(System Entity Structure) and DEVS(Discrete Event System Specification) modeling and simulation techniques. In DEVS, a system has a time base, inputs, states, outputs, and functions. This system proposes an adaptive fault tolerance algorithm and its simulation model in situation-aware middleware framework by using DEVS. An example of situation-aware environment is illustrated in multimedia collaboration.
1 Introduction In 1991, Mark Weiser, in his vision for the 21st century computing, described that ubiquitous computing, or pervasive computing, is the process of removing the computer out of user awareness and seamlessly integrating it into everyday life. We can describe ubiquitous computing as the combination between mobile computing and intelligent environment is a prerequisite to pervasive computing [1]. Context awareness (or context sensitivity) is an application software system’s ability to sense and analyze context from various sources; it lets application software take different actions adaptively in different contexts [2]. In a ubiquitous computing environment, computing anytime, anywhere, any devices, the concept of situation-aware middleware has played very important roles in matching user needs with available computing resources in transparent manner in dynamic environments [3, 4]. Although the situation-aware middleware provides powerful analysis of dynamically changing situations in the ubiquitous computing environment by synthesizing multiple contexts and users’ actions, which need to be analyzed over a period of time, it is difficult to detect errors and recover them for seamless services and avoid a single point of failure. Thus, there is a great need for simulating fault-tolerance algorithm in situationaware middleware to provide dependable services in ubiquitous computing. This paper proposes simulating an adaptive fault-tolerance model for situation-aware ubiquitous computing. The model aims at detecting, classifying, and recovering errors automatically. In particular, it aims at simulating for classifying errors. Section 2 describes DEVS formalism. Section 3 denotes situation-aware middleware as the context, D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 372–379, 2008. © Springer-Verlag Berlin Heidelberg 2008
Simulating an Adaptive Fault Tolerance for Situation-Aware Ubiquitous Computing
373
the adaptive fault-tolerance architecture. Section 4 describes simulation results of our proposed adaptive fault-tolerance model. Section 5 presents conclusions.
2 Related Works The DEVS-Scheme environment is based on two formalism: discrete event-system specification(DEVS) and system entity structure (SES)[5,6]. In this section, DEVS and SES are reviewed. The DEVS formalism introduced by Zeigler provides a means of specifying a mathematical object called a system. The DEVS formalism is a theoretical, well grounded means of expressing hierarchical, modular discrete event models. Basically, a system has a time base, inputs, states, and outputs, and functions for determining next states and outputs given current states and inputs. In the DEVS formalism are defined by the structure [7-10]. M = < X, S, Y, δint, δext, λ, ta > where X: a set of input events, S: a set of sequential states, Y: a set of output events, Int:S->S: internal transition function, ext: Q x X -> S : external transition function λ : S -> Y: output function ta : time advance function. Basic models may be coupled in the DEVS formalism to form a multi-component model which is defined by the structure [7-10]. DN = < D, {Mi}, {Ii}, {Zij}, select > where DN: Diagraph Network, D : a set of component names, {Mi}: a component basic model {Ii}: a set, the influences of I and for each j in Ii, {Zij}: a function, the I-to-j output transition, select: a function , the tie-breaking selector. The system entity structure (SES) directs the synthesis of models from components in a model base. The SES is a knowledge representation scheme that combines decomposition, taxonomic, and coupling relationships. The SES is completely characterized by its axioms. However, the interpretation of the axioms cannot be specified and thus is open to the user. When constructing a SES, it may seem difficult to decide how to represent concepts of the real world. An entity represents a real world object that either can be independently identified or postulated as a component of a decomposition of another real world object. An aspect represents one of decomposition out of many possibility of an entity. The children of an aspect are entities representing components in a decomposition of its parent. A specialization is a node of classifying entities and is used to express alternative choices for components in the system being modeled. The children of a specialization are entities representing variants of its parent [7].
374
E. Ko and S. Kim
3 Adaptive Fault-Tolerance: Our Proposed Approach 3.1 RCSM A conceptual architecture of situation-aware middleware based on Reconfigurable Context-Sensitive Middleware (RCSM) is proposed in [2]. Ubiquitous applications require use of various contexts to adaptively communicate with each other across multiple network environments, such as mobile ad hoc networks, Internet, and mobile phone networks. However, existing context-aware techniques often become inadequate in these applications where combinations of multiple contexts and users’ actions need to be analyzed over a period of time. Situation-awareness in application software is considered as a desirable property to overcome this limitation. In addition to being context-sensitive, situation-aware applications can respond to both current and historical relationships of specific contexts and device-actions.
Situation-Aware Application Objects RCSM
Optional Components RCSM Ephemeral Group Communication Service
Other Services
O S
Core Components Adaptive Object Containers (ADCs) [Providing awareness of situation] RCSM Object Request Broker (R-ORB) [Providing transparency over ad hoc communication]
Transport Layer Protocols for Ad Hoc Networks
Sensors
Fig. 1. Overview of Situation-Aware Middleware
All of RCSM’s components are layered inside a device, as shown in Figure 1. The Object Request Broker of RCSM (R-ORB) assumes the availability of reliable transport protocols; one R-ORB per device is sufficient. The number of ADaptive object Containers (ADC)s depends on the number of context-sensitive objects in the device. ADCs periodically collect the necessary “raw context data” through the R-ORB, which in turn collects the data from sensors and the operating system. Initially, each ADC registers with the R-ORB to express its needs for contexts and to publish the corresponding context-sensitive interface. RCSM is called reconfigurable because it allows addition or deletion of individual ADCs during runtime (to manage
Simulating an Adaptive Fault Tolerance for Situation-Aware Ubiquitous Computing
375
new or existing context-sensitive application objects) without affecting other runtime operations inside RCSM [2]. However, it did not include fault-tolerance support in the architecture. In this paper, we propose simulating a new fault-tolerance capability in situation-aware middleware. 3.2 The Adaptive Fault Tolerance Architecture As shown in Figure 2, other services have many agents. Adaptive fault tolerance architecture is one of agent which included in other services. FTE(Fault Tolerance Environment) provide several functions and features capable of developing multimedia distant education system among students and teachers during lecture.
Situation-Aware Applications Objects
multimedia collaboration home study system
Other Services SEMA
AMA
COPA
CRPA
ACCA
MECA
APSA
Media control Communication control
INA EDRA Adaptive Object Containers (ADCs)
Transport Layer Protocols for Ad Hoc Networks Fig. 2. FTE in Situation-Aware Middleware
Our proposed adaptive fault tolerance model aims at supporting fault –tolerance requirements by detecting, classifying and recovering errors in order to provide seamless services and avoid a single point of failure. An example of situation-aware applications is a multimedia collaboration education system. The development of multimedia computers and communication techniques has made it possible for a mind to be transmitted from a teacher to a student in distance environment.
376
E. Ko and S. Kim
Other services agents are composed of AMA(Application Management Agent) that handles request of application, SEMA(SEssion Management Agent) that appropriately controls and manages session and opening/closing of sessions, even in the case of several sessions being generated at the same instant, COPA(Coupling Agent) that provides participants same view, CRPA(CRoss Platform communication Agent) that manages formation control of DOORAE communication protocol, ACCA(Access and Concurrency Control Agent) that manages access control and concurrently control agent, ASPA(Application Program Sharing Agent), INA(Intelligent Agent) that manages convertible media data between IBM compatible PC and Mac, MECA(Media Control Agent) that supplies user access and convenient application. The multimedia application layer includes general application software such as word processors, presentation tools and so on. And it supplements various functions such as video conference and voice conference for multimedia collaboration home study system. EDRA consists of EDA(Error Detection Agent), ECA(Error Classification Agent), and ERA(Error Recovery Agent). ECA consists of ES(Error Sharing) and EC(Error Classification). EDA detects an error by using hooking methods in MS-Windows API(Application Program Interface). When an error occurs, A hook is a point in the Microsoft Windows message-handling mechanism where an application can install a subroutine to monitor the message traffic in the system and process certain types of messages before they reach the target window procedure. Windows contains many different types of hook. As shown in Fig.3, the roles of ES(error and application program sharing) are divided into two main parts; Abstraction and sharing of view generation. <<E <ES> hook table
<ES>
application
Event Distributer Network
view /event
<ES>
Filter function
view/event <ES>
Virtual app.
Virtual app.
Filter func.
Filter func.
Fig. 3. Error and Application Sharing Process
Error and application program sharing must take different from each other according to number of replicated application program and an event command. This proposed structure is distributed architecture but for an error and application program sharing, centralization architecture is used. An error and application program sharing
Simulating an Adaptive Fault Tolerance for Situation-Aware Ubiquitous Computing
377
windows perform process communication of message form. In the middle of this process, there are couple ways of snatching message by error and application sharing agent. ESA informs SM(Session Manager) of the results of detected errors. Also, ESA activates an error for application software automatically. It informs SM of the result again. That is, ESA becomes aware of an error occurrence after it receives requirement of UIA and transmit it. As shown in Fig.4, you can see the organization of ECA. EDRA consists of EDA, ECA, and ERA. ECA has a function of an error classification. EDA has a function of error detection. ECA has a function of error classification. ERA has a function of error recovery. ECA consists of frontend, backend, analyzer, coordinator, filter, and learner. Frontend has a function of playing a role in receiving error detection information from FDA. Backend has a function of playing a role in receiving error recovery information from ERA. Coordinator informs SMA of the result. Analyzer has a function of classifying error’s information that is received from frontend. Learner has a function of classifying the type of errors by using learning rules with consideration of information from analyzer. Filter has a function of storing an error’s history information in KB from error information that is classified by learner.
EDA
frontend analyzer backend ERA
PDB ECA
learner
KB
filter coordinator SMA
Fig. 4. The organization of EDRA
4 Simulating AFT The AFT simulation model has been implemented by using VISUAL C++. To evaluate the performance of the proposed system, an error detection method was used to compare the performance of the proposed model against the conventional model by using DEVS(Discrete Event System Specification) formalism. Before system analysis, the variable that is used in this system is as follows. The letter Poll-int stands for “polling interval”. The letter App-cnt stands for “The number of application program with relation to FTE session”. The letter App_cnt2 stands for “The number of application program without relation to FTE session”. The letter Sm-t-a stands for “ The accumulated time to register information in SM”.
378
E. Ko and S. Kim
(Simulation 1) The atomic models are EF, RA1, UA1, and ED1. The combination of atomic models makes a new coupled model. First, it receives input event, i.e., polling interval. The value is an input value in RA1 and UA1 respectively. An output value is determined by the time related simulation process RA1 and UA1 respectively. The output value can be an input value in ED1. An output value is determined by the time related simulation process ED1. We can observe the result value through transducer. (Simulation 2) The atomic models are EF, RA2, and ED2. The combination of atomic models makes a new coupled model. First, it receives input event, i.e., polling interval. The value is an input value in RA2. An output value is determined by the time related simulation process RA2. The output value can be an input value in ED2. An output value is determined by the time related simulation process ED2. We can observe the result value through transducer. Conventional method: Poll_int*(App_cnt + App_cnt2) Proposed method: Poll_int*(App_cnt) + Sm_t_a Therefore, in case of App-cnt2 > App-cnt, Poll_int*(App_cnt + App_cnt2) > Poll_int*(App_cnt) + Sm_t_a That is, proposed method is more efficient than conventional method in error detected method in case of App-cnt2 > App-cnt. We have compared the performance of the proposed method with conventional method.
5 Conclusions An example of situation-aware applications is a multimedia education system. EDRA consists of EDA, ECA, and ERA. ECA has a function of an error classification. EDA has a function of error detection. ECA has a function of error classification. ERA has a function of error recovery. ECA consists of frontend, backend, analyzer, coordinator, filter, and learner. Frontend has a function of playing a role in receiving error detection information from FDA. Backend has a function of playing a role in receiving error recovery information from ERA. Coordinator informs SMA of the result. Analyzer has a function of classifying error’s information that is received from frontend. Learner has a function of classifying the type of errors by using learning rules with consideration of information from analyzer. Filter has a function of storing an error’s history information in KB from error information that is classified by learner. In the future work, fault-tolerance system will be generalized to be used in any environment, and we will progress the study of domino effect for distributed multimedia environment as an example of situation-aware applications.
References 1. Ko, E., Hwang, D.: Implementation of a Fault-Tolerant System Running on DOORAE: FTSD. In: Proceedings of IEEE ISCE 1998, Taipei, Taiwan, pp. 19–21 (October 1998) 2. Steinmetz, R., Nahrstedt, K.: Multimedia: Computing, Communications & Applications. Prentice Hall Inc., Englewood Cliffs (1995)
Simulating an Adaptive Fault Tolerance for Situation-Aware Ubiquitous Computing
379
3. Moore, M.G., Kearsley, G.: Distant Education: A Systems View. Wadsworth Publishing Company, Belmont (1996) 4. Ahn, J.Y., Lee, G., Park, G.C., Hwang, D.J.: An implementation of Multimedia Distance Education System Based on Advanced Multi-point Communication Service Infrastructure: DOORAE. In: Proceedings of the IASTED International Conference Parallel and Distributed Computing and Systems, Chicago, Illinois, USA, October 16-19 (1996) 5. Zeigler, B.P., Cho, T.H., Rozenblit, J.W.: A Knowledge-based Environment for Hierarchical Modeling of Flexible Manufacturing System. IEEE Trans. Syst. Man, Cybern. A 26, 81–90 (1996) 6. Cho, T.H., Zeigler, B.P.: Simulation of Intelligent Hierarchical Flexible Manufacturing: Batch Job Routing in Operation Overlapping. IEEE Transaction on Systems, Man, and Cybernetics-Part A: System and Humans 27, 116–126 (1997) 7. Zeigler, B.P.: Object-Oriented Simulation with hierarchical, Modular Models. Academic Press, London (1990) 8. Zeigler, B.P.: Multifacetted Modeling and Discrete Event Simulation. Academic Press Professional, Inc., Orlando (1984) 9. Zeigler, B.P.: Theory of Modeling and Simulation. John Wiley, NY (1976); reissued by Krieger, Malabar, FL, USA (1985) 10. Conception, A.I., Zeigler, B.P.: The DEVS formalism: Hierarchical model development. IEEE Trans. Software Engineer. 14, 228–241 (1988)
Color Image Watermarking Scheme Based on Efficient Preprocessing and Support Vector Machines Oğuz Fındık1, Mehmet Bayrak2, İsmail Babaoğlu1, and Emre Çomak1 1
Selcuk University, Department of Computer Engineering, Konya, Turkey 2 Selcuk University, Department of Electrical and Electronics Engineering, Konya, Turkey {oguzf,mbayrak,ibabaoglu,ecomak,Selcuk}@selcuk.edu.tr
Abstract. This paper suggests a new block based watermarking technique utilizing preprocessing and support vector machine (PPSVMW) to protect color image’s intellectual property rights. Binary test set is employed here to train support vector machine (SVM). Before adding binary data into the original image, blocks have been separated into two parts to train SVM for better accuracy. Watermark’s 1 valued bits were randomly added into the first block part and 0 into the second block part. Watermark is embedded by modifying the blue channel pixel value in the middle of each block so that watermarked image could be composed. SVM was trained with set-bits and three other features which are averages of the differences of pixels in three distinct shapes extracted from each block, and hence without the need of original image, it could be extracted. The results of PPSVMW technique proposed in this study were compared with those of the Tsai’s technique. Our technique was proved to be more efficient. Keywords: Color image watermarking, block based digital robust watermarking, support vector machines, intellectual property rights.
1 Introduction After rapid development of sharing network resources, internet usage, increment of bandwidth, digital recording and media, the severity of the intellectual property rights for these media has been grown [1]. To protect the intellectual property rights for these media can be done with digital watermarking. Digital watermarking is the imperceptible embedding of watermark bits into multimedia data, where the watermark remains detectable as long as the quality of the content itself is not rendered useless [1, 2, 3]. Here, the research work is concentrated on color image watermarking for intellectual property rights. Different methods were produced to protect the intellectual property rights of a digital media. Some of these methods are discrete fourier (DFT), wavelet (DWT), cosines (DCT) transforms and spatial domains. In spatial domain, digital watermark is obtained by modifying on brightness pixels whereas in DFT, DWT and DCT domains, the coefficients of digital media are obtained by calculation. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 398–406, 2008. © Springer-Verlag Berlin Heidelberg 2008
Color Image Watermarking Scheme Based on Efficient Preprocessing and SVM
399
Two important criterions exist to evaluate the performance of watermarking system [2]; Robustness (Bit Correct Ratio (BCR)) and imperceptibility (Peak signal to noise ratio (PSNR)). Robustness is related with the correction rate of obtained embedded watermark after a watermarked image is exposed to various attacks [2]. Tsai et al. introduced a color image watermarking technique using nonlinear support vector machine (SVM) [4]. Li et al. proposed a new technique for embedding and extracting digital watermark from an image using support vector regression (SVR) [5]. Shieh et al. proposed a robust innovative watermarking scheme in transform domain based on genetic algorithms and also considered the watermarking image quality [6]. Yu et al. proposed novel digital image watermarking technique based on neural network for color images. In this study, watermark is embedded into color image and artificial neural network (ANN) is trained with watermark and watermark embedded image. Watermark is extracted using this trained ANN [7]. Khan et al. proposed an innovative scheme of blindly extracting message bits when a watermarked image is distorted in DCT domain. In this work, watermark extraction is considered as a binary classification problem. Presented method uses machine learning technique to extract the watermark with the extracted features obtained from DCT coefficient [8]. Li and Guo proposed a watermarking scheme resistant to geometric attacks by local invariant regions which can be generated using scale normalization and image feature points [9]. The proposed watermarking technique is based on efficient preprocessing and SVM intended to improve Tsai’s method.
2 SVM Support Vector Machine (SVM) is a classification and regression method invented by Vapnik [10]. SVM has been constructed on a strong statistical learning theory. SVM has been used in many application fields such as [11, 12, 13], since it has been constructed on a strong statistical learning theory. Detailed information about SVM could be obtained elsewhere [10, 14]. In experimental studies, RBF kernel is used and σ (kernel parameter), C (regularization parameter) is set to 0.008, 880000 respectively to obtain the best classification performance.
3 The Proposed PPSVMW Method 3.1 Representation of Images In this study RGB color images are used. An RGB color images can be presented as follows;
(
)
O = OP( ixj ) IxJ Here,
(1)
i ∈ {0,1,2,..., I − 1} and j ∈ {0,1,2,..., J − 1} . I presents the row size and
J presents the column size of the image. i presents the row coordinate and j presents the column coordinate of the pixel. O p = ( R p , G p , B p ) is red, green and blue val-
400
O. Fındık et al.
ues of the color at ixj coordinates on the image.
R p , G p , B p ∈ {0,1,2,...,255} . To
convert the RGB color image to gray image the equation below is employed;
L p = (R p * 0.299, G p * 0.587, B p * 0.114 )
(2)
It is the blue channel that effects the image minimum as seen also in equation (2). The watermark is employed into this blue channel for the minimal degradation, which is composed of binary values W having dimensions as pxq.
W = (wixj ) pxq
Here,
(3)
i ∈ {0,1,2,..., p − 1} , j ∈{0,1,2,..., q − 1} and wixj ∈{0,1} .
To train SVM, the following equation is employed;
T = (t k )b Here,
(4)
t k ∈ {0,1} and b is adequate size of the array needed to train SVM.
3.2 Tsai’s SVM-Based Color Image Watermarking Method Tsai proposes a SVM-based color image watermarking (SCIW) method to obtain watermark using SVM. SCIW method separates the image into the non-overlapping blocks and uses blue channel of the color image to embed watermark into the original image. This method embeds watermark’s bits into the separated blocks using following formula;
O p = O p + α (2Wt − 1) L p
(5)
Here,
α
block,
Wt is the embedded watermark bit, LP is luminance value of the mid term of
is a constant denoting watermark strength [4],
OP is the mid term of the
the block and O P is the mid term of the block after watermark embedding. In Tsai’s study, SCIW method constructs a set of training patterns with the use of binary labels by employing three different image features. These are the differences between a local image statistic and the luminance value of the center pixel in a sliding window with three distinct shapes. Watermark is then extracted employing SVM after training it with the above mentioned three features and watermark bit. 3.1 Watermark Embedding In this study, color image is segmented into the blocks which are not overlapped. Average value is found for each block. Color image is then separated into two groups according to the average value and mid term value to train SVM more effective.
Bi if μ > OIxJ and OIxJ < δ
(6)
Sk if μ < OIxJ and OIxJ < δ
(7)
Color Image Watermarking Scheme Based on Efficient Preprocessing and SVM
401
OIxJ is the mid term of the block and μ is the average of the block. Bi and Sk are arrays which are adequate to be added 1’s and 0’s blocks respectively. δ is
Here,
one of the required parameter in the development of blocks used in preprocessing. Minority of this parameter ensures slight degradation of the original image. Embedding watermark ‘W’ into the color image ‘O’ the following procedure can be adopted; Step 1: Segment the color image into the blocks. Step 2: Prepare Bi and S k arrays according to the average and threshold value of the blocks. Step 3: Select a random block from
Bi and Sk arrays according to the value of the
pixel. Step 4: Calculate the selected pixel’s luminance value according to equation (2). Step 5: Replace the selected pixel’s blue channel value according to equation (4). Step 6: Repeat steps 3 through 5 until all bits of T and W are finished. 3.2 Watermark Extraction Training parameters for PPSVMW are composed according to the Tsai’s method [4]. The shapes used in composition of training parameters are shown in Fig 1. Also, the
(a)
(b) (xi, yi-2)
(xi-1, yi-1)
(xi, yi-1)
(xi+1, yi-1)
(xi-1, yi)
(xi, yi)
(xi+1, yi)
(xi-1, yi+1)
(xi, yi+1)
(xi+1, yi+1)
(xi, yi-1) (xi-2, yi)
(xi-1, yi)
(xi, yi)
(xi+1, yi)
(xi+2, yi)
(xi, yi+1) (xi, yi+2)
(c) (xi-2, yi-2)
(xi+2, yi-2) (xi-1, yi-1)
(xi+1, yi-1) (xi, yi)
(xi-1, yi+1) (xi-2, yi+2)
(xi+1, yi+1) (xi+2, yi+2)
Fig. 1. Composition of the three training parameters used in this study. (a) Square-shape region [used to compute d compute pute
−1
pi . The center of region is at pi=(xi,yi)] (b) Cross-shape region [used to
d −2 pi . The center of region is at pi=(xi,yi)] (c) X cross-shape region [used to com-
d −3 pi . The center of region is at pi=(xi,yi)]
402
O. Fındık et al.
formulas which are used to compose these three parameters employed in training SVM are given in equation (8), (9) and (10) below.
⎞ 1 ⎛⎜ c1 c1 ⎟ D pi = B pi − B p B p − ∑∑ ( , ) + i l r i ⎟ 4c1 ⎜⎝ l =− c1 r =c1 ⎠ −1
(8)
c1 ⎞ 1 ⎛ c2 ⎜ ∑ B pi + ( l , 0 ) + ∑ B pi + ( 0 , r ) − 2 B pi ⎟ ⎟ 4c1 ⎜⎝ l =− c 2 r =c1 ⎠
(9)
c1 ⎞ 1 ⎛⎜ c3 ⎟ D pi = B pi − B p B p 2 B p + − ∑ ∑ i l l i r r i + ( − , ) + ( − , ) ⎟ 4c1 ⎜⎝ l =− c 3 r =c1 ⎠
(10)
D − 2 pi = B pi −
−3
The training set could be formulated as follows;
Gi = ( D −1 pi , D −2 pi , D −3 pi , d i ) Here, D
−1
(11)
pi , D −2 pi and D −3 pi in Gi training set are the input variables and di is
the output variable. SVM is then trained by these input and output variables. After training process finished, watermark is extracted using this trained SVM. For the SVM training procedure, the following steps could be employed; Step 1: Determine the pixel block using random number generator. Step 2: Input and output variables are prepared for the training of SVM. Step 3: Train the SVM using these prepared input and output vectors. For the watermark extraction procedure, the following steps could also be employed; Step 1: Find the watermark embedded block using the same random number generator in order to embed the watermark. Step 2: Calculate the input vectors. Step 3: Obtain the watermark bit embedded using trained SVM. Step 4: Repeat steps 1 through 3 until all watermark bits are obtained.
4 Experimental Results PSNR and BCR values are used to qualify the watermarked image and extracted watermark. The value of PSNR is calculated as follow:
⎛ ⎞ 2552 ⎟⎟ PSNR (O, O′) ≡ 10 × log⎜⎜ ⎝ MSE (O, O′) ⎠
(12)
Color Image Watermarking Scheme Based on Efficient Preprocessing and SVM
403
Mean Square Error (MSE) is calculated as follow: MSE (O, O′ ) ≡
1 M ×N
2
M −1 N −1
∑ ∑ ( p(i, j ) − p′(i, j )) i=0
(13)
j =0
In digital watermark, it is expected to have lower MSE and higher PSNR values. BCR is calculated as follow: Lm ⎛ ⎜ ∑ wi ⊕ Wi ' BCR ( w, w ' ) ≡ ⎜1 − i =1 ⎜ Lm ⎜ ⎝
(
(a)
) ⎞⎟
⎟ x100% ⎟ ⎟ ⎠
(b)
(14)
(c)
Fig. 2. Color images and watermark used in the study; (a) Watermark image, (b) Lena, (c) Baboon Table 1. PSNR results PSNR Our method PPSVMW Lena Baboon
43.37 42.25
Tsai’s SCIW
41.52 41.49
Table 2. BCR results BCR
Attack free case %5 Noising attack case %10 Noising attack case Blurring attacked case Blurring twice attacked case Sharpening attacked case Sharpening twice attacked case
Lena Our method PPSVMW 99.02 96.17 90.70 95.85 95.68 99.15 99.44
Tsai’s SCIW 97.81 95,51 88.75 91.29 88.24 98.29 98.24
Baboon Our method PPSVMW 93.09 90.43 87.87 92.92 92.87 93.21 93.02
Tsai’s SCIW 90.36 87,82 83.84 86.14 83.21 91.46 92.07
404
O. Fındık et al. Table 3. Images of attack-free-case and attack-test-case results Lena
Attack free case
%5 Noising attacked case
%10 Noising attacked case
Blurring attacked case
Blurring twice attacked case
Sharpening attacked case
Sharpening twice attacked case
Extracted watermark from Lena
Baboon
Extracted watermark from Baboon
Color Image Watermarking Scheme Based on Efficient Preprocessing and SVM
405
In this study, Matlab application platform to perform whole algorithms and tests is employed. The results obtained from our study are compared specifically with Tsai’s SVM-based color image watermarking (SCIW) method. To make this comparison, we took watermark strength α = 0.2 , the sizes of the shapes c1 = 1 , c2 = 2 and
c3 = 2 . The size of the binary T array used in training of SVM is taken as 1024. There are 512 units is 0, 512 units is 1 in T array. Binary logo with 64x64 pixel size shown in Fig. 2(a) is used. Lena and Baboon images with size of 512x512 pixels are employed as color images shown in Fig. 2(b) and Fig. 2(c) as well as used in Tsai’s study. δ which is one of the required parameters in the development of blocks used in preprocessing, determined as 121 and 144 for
Bi and Sk arrays used in preprocess-
ing respectively. 5% and 10% noise, blurring and sharpening attacks are employed to test whether watermark is extracted or not after various image processing attacks. PSNR and BCR results are given in Table 1, Table 2 respectively. Attack-free-case result and attack-test-case results images are given in Table 3. All the experimental results obtained here are compared with those results of Tsai’s methods as shown in the same tables for comparison purposes.
5 Conclusion PPSVMW method reported in this work is a new robust and blind watermarking technique. Simulation results obtained after the application of various image processing attacks show that the embedded watermark can be extracted more efficiently than that of the Tsai’s method. Besides, the degradation between the original and the watermarked images obtained in this work were comparatively less than those reported elsewhere [4,7].
References 1. Pan, J.S., Huang, H.C., Jain, L.C.: Intelligent Watermarking Techniques. Series on innovative Intelligence, vol. 7 (2004) 2. Pan, J.S., Huang, H.C., Wang, F.H.: Genetic Watermarking Techniques. In: Proceed. Fifth Int. conf. inform. Engineer. Syst. Allied technol., pp. 1032–1036 (2001) 3. Podilchuk, C.I., Delp, E.J.: Digital Watermarking: Algorithms and Applications. IEEE signal processing magazine, 33–46 (2001) 4. Tsai, H.H., Sun, D.W.: Color Image Watermark Extraction Based on Support Vector Machines. Information Sciences 177, 550–569 (2007) 5. Li, C.H., Lu, Z.D.: Application Research on Support Vector Machine in Image Watermarking. Neural networks and Brain, 1129–1134 (2005) 6. Shieh, C.S., Huang, H.C., Wang, F.H.: Genetic Watermarking Based on Transformdomain Techniques. Pattern Recognition, 555–565 (2003) 7. Yu, P.T., Tsai, H.H., Lin, J.S.: Digital Watermarking Based on Neural Networks for Color Images. Signal Processing 81, 663–671 (2001)
406
O. Fındık et al.
8. Khan, A., Tahir, S.F., Majid, A., Choi, T.S.: Machine Learning Based Adaptive Watermark Decoding in View of Anticipated Attack. Pattern Recognition, 2594–2610 (2008) 9. Li, L.D., Guo, L.: Localized Image Watermarking in Spatial Domain Resistant to Geometric Attacks. International Journal of Electronics and Communications (Article in press) 10. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995) 11. Kulkarni, A., Jayaraman, V.K., Kulkarni, B.D.: Support Vector Classification with Parameter Tuning Assisted by Agent-based Technique. Computers and Chemical Engineering 28, 311– 318 (2004) 12. Takeuchi, K., Collier, N.: Bio-medical Entity Extraction Using Support Vector Machines. Artificial Intelligence in Medicine 33(2), 125–137 (2003) 13. Chen, K.Y., Wang, C.H.: A Hybrid SARIMA and Support Vector Machines in Forecasting the Production Values of the Machinery Industry in Taiwan. Expert Systems with Applications 32(1), 254–264 (2007) 14. Çomak, E., Arslan, A., Türkoğlu, İ.: A Decision Support System Based on Support Vector Machines for Diagnosis of the Heart Valve Diseases. Computers in Biology and Medicine 37(1), 21–27 (2007)
Image and Its Semantic Role in Search Problem Nasir Touheed1, Muhammad Saeed2, M. Atif Qureshi2, and Arjumand Younus2 1
Institute of Business Administration, Karachi, Pakistan Department of Computer Science, University of Karachi, Pakistan [emailprotected], [emailprotected], [emailprotected], [emailprotected] 2
Abstract. The world has now shrunk and information today exists in many forms ranging from text to videos. An overloaded World Wide Web, full of data makes it difficult to extract information from that data and this has given birth to a new phenomenon in the computer industry which is the search engine technology. Image that contains dense information has not yet found its real interpretation over search engines. In this paper we practice application of Semantic Web concepts and propose a standard dimension in image structures in order to improve searching ability in image search engines. An image annotation tool, called "SemImage", was developed which allows users to annotate an image with various ontologies and JPEG was taken as a case-study. This work describes our initial research efforts in semantics-based searching driven by ontological standards for images which we refer to as Image SemSearch. Keywords: Search problem redefinition, semantics, ontological standard, search seed.
1 Introduction The World Wide Web is a huge repository of information and this information is really varied and diverse in nature. Multimedia content has now become an integral part of the Web and it adds to the appeal of web sites; among these varied forms of media images are the most primitive and widely-used form. They have a unique appeal of their own as it is said “A picture is worth a thousand words” In primitive ages when Man did not know any language he learnt through the language of images. The same phenomenon still holds and an evidence of this is the huge amount of visual information on the Internet. In fact it would not be wrong to say that without visual information the Web would not have attained the popularity it enjoys today. We use our visual ability to see and understand visual information more than any other medium to communicate and collect information. Visual information can be found in Web documents or even as stand-alone objects. These include images, graphics, bitmaps, animations and videos [1]. With such a large volume of unstructured digital media there is a need for effective and additional techniques for image D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 388–397, 2008. © Springer-Verlag Berlin Heidelberg 2008
Image and Its Semantic Role in Search Problem
389
retrieval [2]. Further people quite often need such visual information as it finds a variety of applications like education and training, criminal tracking, law enforcement etc. Even though image search engines do exist like Google Image Search, Lycos and AltaVista photo finder but their search relies on text to look for images which yields a great percentage of irrelevant results [3]. Both types of visual information (still or moving) could not be left unorganized or unexplored. This led us to explore and dive deep into the world of images. Realizing the significance of the task there is extensive research being conducted on the topic yet there is a need of effective tools that can search for images and videos [4]. In this paper we propose a technique for modifying existing image formats that can give new directions to the field of image searching; furthermore we have devised a new technique for image search engines on basis of underlying image information and its semantics. Image Retrieval systems are now an active area of research especially after the explosive growth of the World Wide Web. However although much work has been done in Content-Based Image Retrieval-CBIR; techniques to retrieve images on basis of semantic content are yet in a stage of infancy [5]. As pointed out by Eakins [5] in order to achieve the goal of intelligent image retrieval one has to go for a paradigm shift. In this paper we follow this approach by borrowing ideas from Semantic Web and Ontology. This paper describes the first phase of our research and lays the groundwork for image searching on basis of ontology and semantic annotations. It explores the role that images themselves can play in getting themselves known to search engines: the mechanics which we refer to as “Semantic Image.” SemSearch is a type of search engine which is based on ontology-driven principles leading to semantic-based searching mechanism. It has at its foundations a modification of image formats to include in them information about image taxonomy which will assist in the concept of the “Semantic Image” and will eventually lead to a redefinition of the image search problem. The idea presented in this paper is to standardize the image structure so as to introduce a strong association between the content to be searched and the search engine itself which is lacking in current implementations. 1.1 Motivation On analysis of image retrieval systems we find that there are commercial engines in the form of Google [6], Alta-Vista [7] etc and the research prototypes are mostly CBIR-based systems. Even though there have been major improvements in text search especially after the famous PageRank algorithm [8], yet image search has not found its ground. We explored this problem with a paradigm shift by borrowing a majority of ideas from the Semantic Web concept [9] and considered the enhancements that could be achieved by giving ontological descriptions of images thereby laying the foundation for ontology-driven semantic search [10]. The rest of this paper is organized as follows. In section 2 we explore the existing image searching techniques and their underlying architecture so that our research can refine existing ideas. We also discuss drawbacks of the previous approaches. In section 3 we describe the proposed architecture and standardizations for image formats in detail. In section 4 we present a summary of our case-study which at this point is the JPEG image format. Section 5 concludes the paper.
390
N. Touheed et al.
2 Existing Technologies for Image Searching Traditional techniques for image searching have been keyword indexing or browsing. These are employed in the popular commercial engines like Google Image Search or Yahoo! Photo Finder. According to Kherfi et al [3] services users would require from an image retrieval system can be of three types: query-based retrieval, browsing or summarization. Each of these is explored briefly. In query-based retrieval systems users specify queries and the system retrieves images corresponding to their queries. The queries can be text-based or image-based, or a combination of the two. A text-based query may contain names of objects within the image to be sought, name of the image, or a description of the image or a citation that can be associated with these images. In an image-based query user may be given set of sample images or may be asked to provide his/her own image. After query specification, the system goes through its index to put the image in its proper class that most closely relates to the query. Certain problems may then arise in the retrieved images but Kherfi [3] proposes that these problems can be removed with relevance feedback. The browsing service is provided by text search engines like Lycos, AltaVista and Yahoo! These systems surf the web on a regular basis, record textual information on each page and through automated or semi-automated analysis create catalogs of the information on the Web. These catalogs are hierarchical in nature. We propose a somewhat similar mechanism but extend the ideas to make images selfdescriptive as you will see in Section 3. Another service that may be required of a Web image search engine is provision of summaries; the user may require the system to give a title to a set of images collected from the Web in order to perform the images on categorized images under a sub-title. Summaries can complement the catalog if each category of images in the catalog is represented by some representative words or images provided by the summarization service. 2.1 Existing Systems’ Review Many prototypes have been proposed for image retrieval from the Web; these include ImageRover [11], Atlas WISE [12], WebSeek [1], ImageScape [13] etc. We will now explore and throw light on these systems in comparison with Image SemSearch mechanism. Image retrieval has been a very active research area since the 1970’s with thrust from mainly two major research communities: computer vision and database management [14]. Text-based image retrieval borrows ideas from database management
Fig. 1. Current Image Search Techniques
Image and Its Semantic Role in Search Problem
391
field in which images are first annotated by text and then text-based DBMS’s are used for image retrieval. Then in the 1990’s content-based image retrieval was proposed in which images are indexed by their own visual content such as color and texture. With the size and diversity of digital image collections growing at exponential rate efficient image retrieval is becoming increasingly important. In a general sense current image search systems/engines can be categorized into two categories: text-based and image content-based as shown in figure 1. 2.2 Limitations of Existing Approaches Although these approaches yield effective results yet they suffer from some limitations and this work attempts to remove these limitations by giving a redefinition of the “image search problem.” In text-based retrieval systems there is the problem that accompanying the relevant search results, there could be a large number of irrelevant search results i.e. their precision could be low [15]. Content-based image retrieval systems have been on the research scene for quite sometime now but there are still some impediments in their widespread user acceptance as also pointed out by Eakins [5]. This is not because the need for such systems is lacking- there is ample evidence of user demand for better image data management in fields as diverse as crime prevention, photo-journalism, fashion design, trademark registration, and medical diagnosis. It is because there is a mismatch between the capabilities of the technology and the needs of users. The vast majority of users do not want to retrieve images simply on the basis of similarity of appearance. They need to be able to locate pictures of a particular type (or individual instance) of object, phenomenon, or event. A successful solution lies in semantic image retrieval which intelligently retrieves images on basis of users’ specific, focused needs and it requires a significant paradigm shift as suggested by Eakins [5]. The information that these systems derive about the images is in a loosely connected form, it can be given more worth by embedding it into the image structure itself in the form of tags and metadata. In short an important limitation in current research directions is lack of a standard foundation upon which to build the searching framework, be it manual annotation of an image or automatic through content-based image processing techniques. What we propose is a modification to the image formats in order to standardize the image retrieval process through “Semantic Encapsulation and Annotation of Images.”
3 Proposed Strategy We now move on to explain our strategy for image searching. We have sought to collect the afore-mentioned techniques on a single platform (refer figure 2) and treat images as being important entities in themselves. By this we mean to make an image more and more self-sustainable and not heavily reliant on the text around it and/or on the HTML tagged information. The image search problem needs a proper redefinition and in such an attempt modifications to image formats are suggested from our end which can prove to be helpful in standardizing the image search problem.
392
N. Touheed et al.
Fig. 2. Proposed Framework for a Layered Approach towards Semantic Searching of Images
3.1 The Search Problem Redefined We feel that one fundamental aspect has to be an important part of all image retrieval systems. This fundamental aspect lies in the famous phenomenon: “Picture is worth a thousand words.” Unfortunately current image retrieval systems are not realizing this phenomenon in essence and when it comes to a computer a picture is worth almost zero words. What has made this task immensely difficult is the computer’s reluctance in understanding images. This limitation could be overcome by redefining an image as a self-descriptive entity through “semantics” and in it the 1000 words should be contained so that it can tell by itself what it is. They already have a visual appeal and tell a lot but this information is conveyed to humans when humans look at the pictures. Image search engines cannot grasp the 1000 words that the picture wants to reveal about itself. This led us to a new idea; what if the image formats that are used on the Web are enhanced and revised with an ad-on in standard. This can be embedded into the image formats in such a way that image itself becomes self teller of the compressed knowledge of 1000 words and that knowledge is compacted into limited words which creates actual sense. What we are proposing is a modification in the structure for images such that their properties like class (taxonomy), type and description is hidden in the structure. So alongside display information (e.g. RGB) of all pixels there can be an image header which can tell about what exactly a particular image represents. This information about the image can also be added somewhere else in the image structure for instance towards the end of file marker or in the bytes reserved for future aspects. This would naturally lead to “a redefinition of the image searching problem.” Hence unlike the previous approaches where images were searched as integral part of web pages and their categorization was carried out on basis of the text around the images and their tags but this structure will enable the image to have a separate entity and existence of its own. 3.2 Nth Dimensional Image Structure Data is a not just a simple term today; it takes multi-varied forms and can be considered to possess a multi-dimensional nature. This is what has given rise to new data processing and mining techniques like Online analytical processing [16]. The similar concepts can be applied to images by extending images to nth dimensions where n is an arbitrary number. Greater the amount of detail to be added to the image, greater the value of n.
Image and Its Semantic Role in Search Problem
393
Now we discuss the concept of image dimensionality in detail. An image is physically two-dimensional view port having certain width and height; the existing search engines do give details of some parameters of images like width, height and pixel depth are available but for image retrieval systems to get more intelligent the image formats should include additional information. It is this information which will add new dimensions to the image and search results can then be refined by virtue of this information. We suggest embedding image description into the standards for image formats thereby taking the “Semantic Web” idea ahead. The Semantic Web is a philosophy, a set of design principles, a scheme to make data on the Web defined and linked in a way so as to enable efficient discovery, automation and integration of information. The proponents of the Semantic Web philosophy put forward the argument that the vision of making the Web function as an intelligent agent can only flourish when standards are well established. The need for ontological standards has led to development of an independent data representation scheme like XML [9] [17]. Likewise what we are putting forward is an “ontological standard for images” so as to make image search on the World Wide Web easy and efficient. We believe that the searching seed should be within the image rather than chiefly within the context like in today’s engines. Hence the idea is to combine content-based image retrieval with context-based image retrieval. Following is a representation of the modified image format hierarchy:
Fig. 3. Image Format Hierarchy
394
N. Touheed et al.
Moreover the proposed additions to image formats would contain seven bits for TYPE in its structure: Table 1. Description of TYPE bit of semantic image
Bit 1 RGB
Bit 2 G
Bit 3 P
Bit 4 L
Bit 5 3D
Bit 6 A
Bit 7 C
Here RGB represents whether it is palette-based or RGB. G and P tell if it’s a graphic or a picture. L represents whether image is layered or not. 3D tells whether it a three-dimensional view of an object or simple 2D-image. A represents whether the image is animated or still. C tells whether class definition exists or not. Amongst these seven parameters class is of the greatest significance because on basis of class we can classify the image and it is this attribute of the image which will enhance the search process. Class is further categorized into description or subclasses. For example with family photographs etc. names of all people in the picture can be given as description of the image and class can be personal family. Class could be categorized as landscape, sports, personal family, etc. Even these classes can be further classified like sports into cricket, football, hockey etc. 3.3 Semantic Role of Image in Search Problem Some researchers might argue as to what breakthroughs the proposed scheme can achieve in image search engines: modification of image structures is a mere programming activity and hence would not be capable of achieving much. However what is of paramount significance is the fact that the proposed scheme can introduce standardization into the entire image search process. The main focus of our research is to create a standard for description of an image. Since images have always been a visual entity rather than textual information that is why researches on image retrieval have expressed much weight towards computer vision techniques. However today the processed data from images is only available to a limited scope and becomes unknown or useless to the world due to the absence of a standard. This is what motivated us to take an initiative for supporting available computer vision algorithms in order to enable them to share their processed information by utilizing the capabilities and features of “Image SemSearch.” Hence the trigger point of our research starts with an Image Labeler “SemImage” which can then help realize the semantic role of an image in search engines. 3.4 Towards Intelligent Image Retrieval As pointed out earlier in this paper though automated retrieval systems have been around for quite some time now yet their widespread user acceptance still has to be achieved. We feel that this has been due to the fact that automation (through image analysis techniques) should be the next step in the process of image searching; first
Image and Its Semantic Role in Search Problem
395
there is a need of standardization in placing image labels so that image semantics can be defined. One fundamental problem with existing systems is the lack of a standardized approach for image retrieval; once this is addressed we can move towards intelligent image retrieval. Techniques like Google Image Labeler [18] and Facebook’s image tags utilize the manual element in a better sense but the lack arises due to absence of a standardized approach. The image searching process can then be standardized by making the proposed image format technology open for standard image search engines already available and then through their content-based retrieval procedures like edge detection, convolution, power spectrum analysis etc. [5], the image annotation process can be automated. Then to insert that automatically deduced information into the images already available on the World Wide Web the owners of Websites that have those images can be asked to go through some authentication mechanism to verify that they own those retrieved and annotated images. After the authentication process is over, the information about the image that is deduced by the engine can be given to the owner of that image and he can then be asked to insert that information into the image. Of course website owners would want traffic to be redirected to their site and they would surely want to make the digital content of their sites easily searchable through insertion of semantic content into them.
4 Experiments with JPEG File Format In this section we discuss our image annotation tool “SemImage” that allows users to create the Semantic Image by annotating it with various ontologies in the form of image type, classes and description. At this point we have experimented with the JPEG File Format hence the type bit will already have some bits filled out; that is the picture and RGB bit will be set to 1 for the reason that JPEG contains both these features. Further the bits for layered, animated and 3D field would be set to 0 since the JPEG file format does not contain support for these features. Following shows a screenshot of the image viewer of our prototype model in which the class description and type attribute of a particular image can be viewed. The image is that of a brick wall, its name is “khi.jpg” Four classes have been defined for this JPEG image namely Location: Karachi, Pakistan, Building: Karachi Port Trust, Bridge: Kemari over hear Bridge and Link: http://www.pakistanpage.net/gallery/main/cities/karachi.html. All this information is now a part of the image structure. The approach presented allows a maximum of 255 classes for the image to be specified; further one can set the seven bits for the TYPE parameter. We have employed a technique similar to steganography in which some information is hidden in an image. As we mentioned in earlier sections the image has a hidden part (semantic part) and display part. Steganography utilizes the concept of hiding information so it served the purpose. Data about the image has been inserted at the end of the EOI (End of Image) marker. This enables the image to be viewed in any JPEG compliant Internet browser or Picture Viewer. Work of a similar nature has been performed in the EXIF file format [18]. However EXIF inserts camera-specific information into the JPEG image unlike content that can give semantic meaning to the image which is the approach that we have followed.
396
N. Touheed et al.
Fig. 4. Sample JPEG for Application of Semantics and Semantic Information about it
One can now easily see that semantics have given a new self-descriptive nature to this image: the search seed is now located within the image itself. This approach can give a major boost to existing techniques by giving them a platform through which semantics can be associated with images. Moreover the idea of custom and personalized search can be realized through the proposed technique.
5 Conclusion The answer to the “Image Search” problem lies in building a strong association between image and search engine. The shortcoming is not in the database management schemes used for the search engine but rather in the image self-description. This is where image semantics shares the load and distributes it in a better way to lead to a semantic and effective solution to the image search problem. In this paper we have laid the foundations for Semantic Search in the field of image searching and have attempted to implement Semantic Web technologies within images. These concepts cannot just be limitized to the Internet in fact searching for images on local PC’s would also be much easier and the “Semantic Web” can be taken a step further and things such as “Semantic Multimedia” can be introduced.
6 Future Directions As mentioned earlier this work describes the initial stages of our approach which are geared towards standardization of image formats so that to make their semantic information and their visual content tightly integrated with each other. The future phases of “Image SemSearch” include introduction of an ontological standard language “Image Semantic Language” which will assist the process of querying the engine thereby making image retrieval efficient and effective through Semantic Web technologies.
Image and Its Semantic Role in Search Problem
397
References 1. Smith, J.R., Chang, S.-F.: An Image and Video Search Engine for the World-Wide Web. In: Proceedings of the SPIE Conference on Storage and Retrieval for Image and Video Databases V (IS&T/SPIE, San Jose, CA), pp. 84–95 (1997) 2. Halaschek-Wiener, C., Schain, A., Golbeck, J., Grove, M., Parsia, B., Hendler, J.: A Flexible Approach for Managing Digital Images on the Semantic Web. In: Proceedins of the Fifth International Workshop on Knowledge Markup and Semantic Annotation (SemAnnot) (2005) 3. Kherfi, M.L., Ziou, D., Bernadi, A.: Image Retrieval from the World Wide Web: Issues, Techniques and Systems. ACM Comput. Surv. 36(1), 35–67 (2004) 4. Scarloff, S.: World Wide Web Image Search Engines. In: NSF Workshop on Visual Information Management, Cambridge, MA (June 1995) 5. Eakins, J.P.: Towards Intelligent Image Retrieval. Pattern Recogn. 35(1), 3–14 (2002) 6. Parbury, B.: Google Empire – Technology Behind the Giants. In: 7th Annual Multimedia Conference (2006) 7. Silverstein, C., Henzinger, M., Marais, H., Moricz, M.: Analysis of a Very Large Alta Vista Query Log. Systems Research Center Technical Note, pp.1998–2014 (1998) 8. Brin, S., Page, L.: The Anatomy of a Large-scale Hypertextual Web Search Engine. In: Proceedings of the Seventh International World Wide Web Conference, pp. 107–117, Brisbane, Australia (April 1998) 9. Shadbolt, N., Hall, W., Berners-Lee, T.: The Semantic Web Revisited. IEEE Intell. Syst. 21(3), 96–101 (2006) 10. Bonino, D., Corno, F., Farinetti, L., Bosca, A.: Ontology Driven Semantic Search. WSEAS Trans. Inform. Sci. Appl. 1(6), 1597–1605 (2004) 11. Taycher, S., La Cascia, M.: ImageRover: A Content-Based Image Browser for the World Wide Web. In: Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries, San Juan, Peurto Rico, pp. 2–9 (1997) 12. Kherfi, M.L., Ziou, D., Andbernardi, A.: Atlas WISE: A Web-based image retrieval engine.In: Proceedings of the International Conference on Image and Signal Processing(ICISP), Agadir, Morocco, pp. 69–77 (2003) 13. Lew, M.S.: Next Generation Web Searches for Visual Content. IEEE Comput. 33(11), 46– 53 (2000) 14. Rui, Y., Huang, T.S., Chang, S.F.: Image Retrieval: Current Techniques, Promising Directions and Open Issues. J. Vis. Commun. Image R. 10, 39–62 (1999) 15. Baeza-Yates, R., Rebeiro-Neto, B.: Modern Information Retrieval. ACM Press, New York (1993) 16. Somani, A., Choy, D., Kleewein, J.C.: Bringing Together Contenet and Data Management Systems: Challenges and Opportunities. IBM Syst. J. 41(4), 686–696 (2002) 17. http://www.w3.org/XML/ 18. http://images.google.com/imagelabeler/ 19. Exchangeable Image File Format for Digital Still Cameras, Exif Version 2.2, Standard of Japan Electronics and Information Technology Industries Association (2002)
Multiple Classification of Plant Leaves Based on Gabor Transform and LBP Operator Feng-Yan Lin1,2, Chun-Hou Zheng2,3,*, Xiao-Feng Wang2,4, and Qing-Kui Man1,2 1
Institute of Automation, Qufu Normal University, Rizhao, Shandong 276826, China [emailprotected] 2 Intelligent Computing Lab, Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui 230031, China [emailprotected] 3 College of Information and Communication Technology, Qufu Normal University 4 Department of Computer Science and Technology, Hefei University, Hefei 230022, China
Abstract. In this paper, a multiple classification method for plant species based on Gabor filters and local binary patterns (LBP) operator is proposed. We classify plant species by extracting global texture features with Gabor filters and local features with LBP operator. Simply speaking, the LBP operator is conducted on the magnitude maps of multi-scale and multi-orientation Gabor filtered image rather than original image. Thus, a plant leaf is presented as more spatial histograms with varying scales and orientations. The method has impressively improved the performance comparing with using Gabor transform or LBP absolutely. Keywords: Gabor transform, Local binary patterns, SVM.
1 Introduction Plant is a familiar living form with the largest population and the widest distribution on the earth, which plays an important role in improving the environment of human beings. It is important to correctly and quickly recognize the plant species in the collecting and preserving genetic resources, the discovery of new species, plant resource surveys and plant species database management, etc. Currently, it is the botanists who mainly take the time-consuming and troublesome task. With the development of computer science, it is feasible to solve the problem by using image processing and pattern recognition techniques. For plants, the leaves are stable and easy to collect for the most time of a year. In addition, the plant leaves are approximately two-dimensional in nature. E.g.Du et al. [1] has used the shape features to classify the plant species. Color is an important character, but it is not stable during the growing period. Besides the shape and color features, texture is another efficient means for charactering plant species. In this paper, Gabor wavelet transform and local binary pattern operator are used to extract the texture features of plant leaf images. *
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 432–439, 2008. © Springer-Verlag Berlin Heidelberg 2008
Multiple Classification of Plant Leaves Based on Gabor Transform and LBP Operator
433
2-D Gabor filters have been shown to be particularly useful for analyzing textured images containing highly specific frequency or orientation characteristics [9, 10]. They are appropriate for textural analysis in several senses: they have tunable orientation and radial frequency bandwidths, tunable center frequencies, and optimally achieve joint resolution in space and spatial frequency. Now Gabor filters have been widely adopted to extract texture features from the gray images in many fields, such as face recognition [2, 3], palmprint recognition [5, 6], fingerprint identification [4] and handwriting identification [7], ect. Local Binary Pattern operator was introduced by Ojala et al. in 1996 as a means of summarizing local gray-level structure [12]. In 2002, Ojala proposed a method based on recognizing certain local binary pattern, termed “uniform” [13]. After LBP was introduced, it was widely used in face recognition because of its insensitivity to illumination and rotation [8, 11]. For the recent years there have been lots of people studying the combination of Gabor transform and LBP operator [11, 14]. In this paper, we adopted Gabor filters and LBP operator to achieve the multi-scale feature extraction in a new field. First, we employed Gabor filters to get 20 magnitude maps of the coefficients for five scales and four scales. Second, LBP operator was used to get the histograms of each magnitude map with three scales. Thus, a plant leaf was presented as more spatial histograms. The rest of the paper is organized as follows. In Section 2, we describe the Gabor wavelet features. In Section 3, the local binary pattern operator is given. The experimental method of classification and results are shown in Section 4. Section 5 gives the conclusions.
2 Texture Features Extraction Using Gabor Transform 2.1 Gabor Wavelet Transform For a given image I ( x, y ) with size P × Q , its 2-D discrete Gabor wavelet transform is given by a convolution:
Gmn( x, y) = ∑∑ I ( x − s, y − t ) gmn *(s, t ) s
t
(1)
Where s and t are the filter mask size variables, and gmn * is the complex conjugate of gmn which is a self similar function generated from dilation and rotation of the following mother wavelet: gmn( x, y ) = (
1 x2 y2 ) exp[− ( 2 + 2 )] + 2π jWx 2πσ xσ y 2 σx σy 1
(2)
Its Fourier transform can be written as:
1 (u − W ) 2 v 2 + 2 ]} G(u, v) = exp{− [ 2 σ u2 σv
(3)
434
F.-Y. Lin et al.
where: j =
− 1 , σu
=
1 2πσx
, σv =
1
,W is the modulation frequency.
2πσ v
The self-similar Gabor wavelets are obtained through the generating function:
gmn(x, y) = a−m g(x′, y′)
(4)
Where m and n specify the scale and orientation of the wavelet respectively, with and m = 0,1,..., M − 1 , n = 0,1,..., N − 1 x ′ = a − m ( x cos θ + y sin θ ) y ′ = a − m (− x sin θ + y cos θ ) , here a > 1 , θ = nπ / N number of scales and orientations respectively.
M and N specify the total
2.2 Gabor Filter Dictionary Design
The nonorthogonality of the Gabor wavelets implies that there is redundant information in the filtered images, and the following strategy is used to reduce this redundancy. Let U l and U h denote the lower and upper center frequency of interest. The variables in the above equations are defined as below:
a = (U h / U l ) −1/( M −1)
σu =
σ v = tan(
π 2N
(5)
(a − 1)U h
(6)
(a + 1) 2 ln 2
)[U h − 2 ln(
σ u2 Uh
)][2 ln 2 −
(2 ln 2) 2 σ u2 12 ] U h2
(7)
In our experiment, the filter parameters used are U h = 0.4 , U l = 0.05 .
3 Local Binary Patterns 3.1 Gray Scale Invariant Local Binary Pattern
The Local Binary Pattern operator was introduced by Ojala et al. in 1996 as a means of summarizing local gray-level structure [12]. The operator takes a local neighborhood around each pixel, thresholds the pixels of the neighborhood at the value of the central pixel and uses the resulting binary-valued image patch as a local image descriptor. It was originally defined for 3×3 neighborhoods, giving 8 bit codes based on the 8 pixels around the central one. Formally, the LBP operator takes the form: 7
LBP( xc, yc ) = ∑ 2n s (in − ic ) n=0
(8)
Multiple Classification of Plant Leaves Based on Gabor Transform and LBP Operator
435
where ic denotes the gray value of the central pixel c , and in denotes the gray value of the neighborhood around the central pixel c . S (u ) is 1 if u ≥ 0 or 0 otherwise. The LBP encoding process is illustrated in Fig.1.
%LQDU\FRGH
Fig. 1. Illustration of the basic LBP operator
Circular model for different sizes is defined, which can make it feasible to deal with textures at different scales. The circular model of the local binary pattern is shown in Fig.2. If the coordinates of ic are (0, 0) , then the coordinates of in are given by (− R sin(2π / P), R cos(2π / P )) . The gray values of neighbors which do not fall exactly in the center of pixels are estimated by interpolation.
Fig. 2. The circular model of the local binary pattern
3.2 Rotation Invariant Local Binary Pattern
When the image is rotated, the gray value will correspondingly move along the perimeter of the circle around ic . To remove the effect of rotation, i.e., to assign a unique identifier to each rotation invariant local binary pattern, Ojala et al. defined:
LBPPri,R = min{Ror(LBPP,R , k ) k = 0,1,L, P −1}
(9)
Where Ror ( x, k ) performs a circular bit-wise right shift on the P-bit number x
k times. Actually, the minimum value is chosen as the code of the pattern. For example in Fig.3, 00001111 is taken as the code. Thus the binary patterns not only become rotation invariant, but also the number of the patterns is reduced a lot.
436
F.-Y. Lin et al.
00001111(15) n o i t a t o r
00001111 (15)
00011110 (30)
00111100 (60)
01111000 (120)
11110000 (240)
11100001 (225)
11000011 (195)
10000111 (135)
00001111(15)
Fig. 3. Rotation invariant coding
3.3 Uniform Local Binary Pattern
An LBP is “uniform” if it contains at most one 0-1 and one 1-0 transition when viewed as a circular bit string. For example, the code 11000011 in Fig.1 is uniform. Uniformity is an important concept in the LBP methodology, representing primitive structural information such as edges and corners. Take p=8 for example, Ojala et al. observed that although only 58 of the 256 8-bit patterns are uniform, nearly 90 percent of all observed image neighborhoods are uniform. By this means, the original 2 p patterns can be reduced to p + 1 uniform patterns and one non-uniform pattern. The number of bins can be thus significantly reduced by assigning all non-uniform patterns to a single bin, often without losing too much information.
4 Experiments In our experiments, we use a database of about 500 leaf images corresponding to 27 categories to show the efficiency of our method. A subset of the database is shown in Fig. 4. We divided every sort of the plant leaves into training set and testing set. The
Fig. 4 Subset of the database
Multiple Classification of Plant Leaves Based on Gabor Transform and LBP Operator
437
number of leaves in testing set is about 2/3 of that in training set. We adopted the software LIBSVM2.86 as classifier, which is given by Chih-Jen Lin of National Taiwan University [12]. During the experiments, we selected radial basis function (RBF) as the kernel function of the SVM, and optimized it by cross validating the parameters c and γ , i.e. It is defined as: K ( xi , yj ) = exp(−γ || xi − yj ||2 ), γ > 0 . c > 0 is the penalty parameter of the error term. We carried out the experiments with three stages. For the first stage, after converting the input images to 256 × 256 , we extracted the features with Gabor filters only. 5 scales and 4 orientations were taken to analyze the texture of plant leaves. For each scale and orientation, the mean μ mn and variance σ mn of the magnitude of the Gabor transform coefficients was obtained. They were used to represent the region for classification: 2 ∑ ∑ | Gmn( x, y ) | ∑ ∑ (| Gmn ( x, y ) | − μ mn) μ mn = , σ mn = . Thus we got a 40-dimentation P×Q P×Q vector f = [ μ00σ 00 μ01σ 01 ...μ 43σ 43 ] . After normalizing the data, let c = 20 and γ = 1 , then we could get the accuracy of 85.44%. For the second, we extracted the histogram features when P=8, R=1, P=16, R=2, and P=24, R=3, respectively. As we mentioned in Section 3, we obtained 10-D, 18-D and 26-D vectors, respectively for each image. The experimental results are shown in Table 1. For the last, leaf images were first filtered by Gabor filters and the magnitude map of Gabor transform (GMM) for each scale and orientation could be obtained. Then we used LBP operator on every GMM to get the histograms. After dimension reduction, we classified the leaves with LIBSVM. The experimental results are also listed in Table 1. From Table 1, we can find that the larger the p is, the better the result is. On the whole, the result is optimistic. Table 1. Accuracy Comparison
Method Gabor LBP(P=8,R=1) LBP(P=16,R=2) LBP(P=24,R=3) Gabor+LBP(8,1) Gabor+LBP(16,2) Gabor+LBP(24,3)
Feature Dimension 2×5× 4 8+ 2 16 + 2
24 + 2 10 × 5 × 4 18 × 5 × 4 26 × 5 × 4
Accuracy 85.44% 78.73% 82.26% 86.64% 86.59% 89.28% 90.97%
Table1 illustrates that it can efficiently improve the accuracy when adopted the LBP operator based on Gabor transform. The classification results are more satisfying than using Gabor or LBP operator only.
438
F.-Y. Lin et al.
5 Conclusions In this paper, a multiple classification method based on Gabor filters and LBP operator was proposed to classify the plant leaves. We obtained the magnitude maps by Gabor filters for different scales and orientations first. Next, we employed LBP to get the local features based on the magnitude maps. The experimental results demonstrated that our proposed method is more efficient for plant leaf classification compared with using Gabor filters or LBP absolutely. In the future, we need to study how to combine other features such as shape or color etc to improve the accuracy. Acknowledgements. This work was supported by the grants of the National Science Foundation of China, No.10771120, the Scientific Research Startup Foundation of Qufu Normal University, No. Bsqd2007036, the grant of the Graduate Students’ Scientific Innovative Project Foundation of CAS (Xiao-Feng Wang),the grant of the Scientific Research Foundation of Education Department of Anhui Province, No. KJ2007B233, and the grant of the Young Teachers’ Scientific Research Foundation of Education Department of Anhui Province, No. 2007JQ1152.
References 1. Du, J.X., Huang, D.S., Wang, X.F., Gu, X.: Computer-Aided Plant Species Identification (CAPSI) Based on Leaf Shape Matching Technique. Transactions of the Institute of Measurement and Control 28, 3, 275–284 (2006) 2. Olshasen, B., Field, D.: Emergence of Simple-cell Receptive Field Properties by Learning a Sparse Code for Natural Images. Nature 381, 607–609 (1996) 3. Rao, R., Ballard, D.: An Active Vision Architecture Based on Iconic Representations. Artif. Intel. 78, 461–505 (1995) 4. Xu, Y., Zhang, X.D.: Gabor Filterbank and Its Application in the Fingerprint Texture Analysis. Parallel and Distributed Computing. Applications and Technologies, December 5-8, pp. 829–831 (2005) 5. Kong, W.K., Zhang, D., Li, W.X.: Palmprint Texture Features Extraction Using 2-D Gabor Filters. Pattern Recogn. 36(10), 2339–2347 (2003) 6. Zhang, D., Kong, W.K., You, J., Wong, M.: Online Palmprint Identification. Pattern Analysis and Machine Intelligence. IEEE Trans. Pattern Anal. Mach. Intel. 25(9), 1041–1050 (2003) 7. Said, H.E.S., Tan, T.N., Baker, K.D.: Personal Identification Based on Handwriting. Pattern Recogn. 33(1), 149–160 (2000) 8. Ahonen, T., Hadid, A., Pietikäinen, M.: Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Trans. Pattern Anal. Mach. Intel. 28(12), 2037–2041 (2006) 9. Turner, M.R.: Texture Discrimination by Gabor Functions. Biol. Cybern. 55, 71–82 (1986) 10. Clark, M., Bovik, A.C., Geisler, W.S.: Texture Segmentation Using a Class of Narrowband Filters. In: Proc. IEEE Int. Con. & Acoust., Speech, Signal Processing, Dallas, TX (1987) 11. Hsu, C.W., Chang, C.C., Lin, C.J.: A Practical Guide to Support Vector Classification. Technical report (2003)
Multiple Classification of Plant Leaves Based on Gabor Transform and LBP Operator
439
12. Ojala, T., Pietikainen, M., Harwood, D.: A Comparative Study of Texture Measures with Classification Based on Feature Distributions. Pattern Recogn. 29, 51–59 (1996) 13. Ojala, T., Pietikainen, M., Maenpaa, T.: Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE T. pami 24(7), 971–987 (2002) 14. Lian, Q.S.: Hierarchical Palmprint Identification Based on Gabor Filter and LBP. Comput. Engin. Appl. 43(6), 212–215 (2007)
Multiple Ranker Method in Document Retrieval* Dong Li1, Maoqiang Xie2, Yang Wang1, Yalou Huang2, and Weijian Ni1 1
College of Information Technology Science, Nankai University, Tianjin, China 2 College of Software, Nankai University, Tianjin, China [emailprotected], [emailprotected], [emailprotected] [emailprotected], [emailprotected]
Abstract. In this paper, we propose a multiple-ranker approach to make learning to rank methods more effective for document retrieval application. In traditional learning to rank methods, a ranker is learned from a set of queries together with their corresponding document rankings labeled by experts, and it is then used to predict the document rankings for new queries. But in practice, user queries vary in large diversity, which makes the single ranker learned from a close set of data not representative. The single ranker cannot be guaranteed with the best ranking result for every single query, and this becomes the bottleneck of traditional learning to rank approaches. To address this problem, we propose a multi-ranker approach. We train multiple diverse rankers which can cover diverse categories of queries, instead of an isolated one, and take an ensemble of these rankers for final prediction. We verify the proposed multipleranker approach over real-world datasets. The experimental results indicate that the proposed approach can outperform existing ‘learning to rank’ methods significantly. Keywords: Learning to rank, document retrieval, ensemble learning.
1 Introduction In recent years, learning to rank is a focused problem in machine learning because many applications can be formulated as ranking problems, such as document retrieval, web search, expert finding, product rating, and spam email. The ranking system can calculates a score for each instance and sorts them by these scores. So learning to rank is aimed to create a proper ranker to predict the score for every instance by using training data labeled by experts and machine learning techniques. Many machine learning techniques have been applied to ranking problem. Ranking SVM (RSVM) [1] transforms the task from ranking the instances into classifying the instance pairs, and it utilizes SVM to solve the transformed classification problem. PRank [2] uses perception to predict the rank label of instances. Besides, many * This work is supported by National Science Foundation of China under the grant 60673009, Tianjin Science and Technology Research Foundation under the grant 05YFGZGX24000 and Microsoft Research Asia Foundation. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 407–414, 2008. © Springer-Verlag Berlin Heidelberg 2008
408
D. Li et al.
machine learning methods, such as neural network in RankNet [3] and ListNet [4], boosting in RankBoost [5] and AdaRank [6] are proposed in recent years. But in document retrieval the ranking problem have some differences from the general. The training data of document retrieval consist of multiple queries, their corresponding retrieved documents, and relevance level given by experts. When predicting, given new queries, their corresponding retrieved documents will be sorted by the trained ranker. The training data contains multiple ranking orders (one per a query) in document retrieval, while only one in general. And, the trained ranker need to predict various queries’ ranking orders in document retrieval, while only one in general. Therefore, it is not enough for a single ranker to represent all the queries, when the queries are so various and diverse in real world. To deal with this problem, we propose to use multiple ranker methods to make ‘learning to rank’ more effective for document retrieval. Considering the different ranking patterns of diverse queries, we try to construct a group of rankers to cover them. We choose a distribution divergence measure Kullback–Leibler divergence [7], to describe the similarity of ranking pattern between two queries, and partition the queries by clustering methods. Finally, the ensemble of rankers trained by these query clusters is applied, and a lower generalization ranking error will be gotten than single because of the accuracy and diversity of the base rankers. Another advantage of multiple ranker methods is that mostly existing learning to rank methods can be applied without much adapting. For validating, we apply our approach in two real-world document retrieval datasets and use two classic ranking algorithms, PRank and RSVM, as the base algorithm. The experiment results indicate that our can improve the result than original methods with single ranker effectively. The rest of this paper is organized as follows: Section 2 is the description of learning to rank and query diversity problems in document retrieval. Accordingly, we describe our multiply rankers method in Section 3. In Section 4, we give the experimental results and analysis. Section 5 concludes this paper and gives future works.
2 Problem for Learning to Rank in Document Retrieval ‘Learning to rank’ is a popular approach of machine learning in recent years, and has close relation with the classification and regression. The model of ‘learning to rank’ can be described as follow. Given the training set which consists of the input vector r r set X = {x1 ,K , x m } ⊆ ℜ n and corresponding ranking level set Y = { y1 , K , y m } , ‘learning to rank’ try to find a ranking function f (x) trained by training set to rank the input instances correctly, and f (x) should meet yi > y j ⇔ f ( xri ) > f ( xr j ) for most instances. Specially, the values in the output set Y determine the ranking orders. If y i > y j , we r r r r can say that xi is ranked ahead of x j , denoted as xi f x j . The ranking function generated by learning to rank method is called as ranker. In majority of ranking methods it usually is a linear function. The result f (x) is denoted as the ranking score by which the predicting ranking order can be taken.
Multiple Ranker Method in Document Retrieval
409
,
Many methods have been developed to find the proper ranker but when they applied in document retrieval there are some problems. In document retrieval, training data consists of multiple queries, their corresponding retrieved documents, and relevance level given by experts. And the instances to rank in document retrieval are query-document pairs. The traditional ‘learning to rank’ methods consider that all the instances in training and testing set have the same distribution and same ranking pattern. But in document retrieval, the query-document pair instance belonged to different queries do not meet these assumption because the queries are various and diverse in real world. We use the OHSUMED [8], a real-world document retrieval dataset, to demonstrate the problem. The OHSUMED dataset have 106 queries and their associated documents. Experts give every query-document pair a relevance label which contains three levels, “definitely relevant”, “partially relevant”, and “irrelevant”. Figure 1 contains two groups of instances belonged to different queries. To make them visually, we use Principal Component Analyze (PCA) to deal with the feature vector and choose the top two principal coordinates which have morn than 90 percent information of original. 40
30
30
20
20
10
10 0
0 -10
-10 -20
-20 -30
-30 -40
-40 -50 -140
-120
-100
-80
-60
-40
-20
(a) Query No 5, 11 ,12
20
-50 -140
-120
-100
-80
-60
-40
-20
20
(b) Query No 10,41
Fig. 1. PCA analysis on OHSUMED dataset
In figure 1, the circles, cross and point denote the three relevance levels and the arrows denote their proper ranking function. From the figure, we can see the quite difference between the two groups of queries in the distribution of instances and their proper ranking function. Furthermore, these diversity problems occur among all the queries. It is conflicted with the assumption that all queries and all instances have same distribution and same ranking patterns, and the tradition single ranker methods can not solve this condition. Therefore, we propose multiple ranker methods.
3 Multiple Ranker Method 3.1 Framework We propose to improve learning to rank by using multiple rankers. The framework of multiple ranker methods can be described as follow. Firstly we create multiple training subsets to regenerate the base rankers. The selection of the subset must ensure the predicting accuracy of base rankers, and each base ranker must have its own ranking pattern to represent the diversity of query. After using the existed ranking learning
410
D. Li et al.
algorithm to learn the base rankers on the training subsets, we employ an average ranking aggregation function to create an ensemble of the base rankers to predict the queries’ ranking results. 3.2 Constructing Training Subsets We employ two strategies to construct the training subsets: bootstrap re-sampling and clustering by ranking distance. Bootstrap re-sampling is the classic multiple predictor method in the classification and regression [9]. Sampling the queries from training set by bootstrap re-sampling method (draw randomly with replacement), and taking the collection of querydocument pairs affiliated with the sampled queries, multiple training subsets can be generated which have the same distribution with original and are diverse because of random drawing. Bootstrap re-sampling is used as the baseline to prove the effect of multiple ranker methods. The other strategy focuses on the diversity of queries. We try to gather the queries with similar ranking pattern into same training subset, and disperse the diverse queries into different. To implement it, we must measure the similarity or diversity between two queries. We define the ranking distance of queries by using Kullback–Leibler divergence [7]. Kullback–Leibler divergence is a measure described the difference between the two probability distributions. For two discrete random variables P and Q, the KL divergence is defined to be D KL ( P, Q ) =
1 P (i ) Q (i ) + ∑ Q (i ) log ( ∑ P (i ) log ) 2 i Q (i ) i P (i )
(1)
To describe the diversity of queries in ranking pattern, we estimate the distribution of queries in different rank level, and compute the divergence of them. the ranking distance of two queries can be defined as follow, k
Dranking ( P, Q) = ∑ DKL ( P ( x | y ), Q( x | y )) y =1
(2)
where P and Q are two queries in training set, k is the number of rank level. The ranking distance Dranking ( P, Q) is non-negative and symmetric, and Dranking ( P, Q) is zero only if P = Q . Base on the ranking distance, we can use many clustering methods such as Kmeans [10] and hierarchy cluster [11] on the training set, and partition the training set into the several subsets. 3.3 Ensemble of Rankers Using the training subsets, we can learn multiple base rankers by any existed ranking algorithm. Then, the ensemble of these ranker can give improved accuracy and reliable estimation of the generalization error which can be proved as follow. Assume there are N base rankers in ensemble. Each ranker α is denoted as a ranking function hα (x ) from the instance feature vector R N to ranking score R . Using
Multiple Ranker Method in Document Retrieval
411
average aggregating function to construct the ensemble, the result of ensemble can be calculated as
h( x ) =
1 N
hα ( x) ∑ α
(3)
For the ranking instance x with its target ranking level y , the ranking quadratic errors of base rankers are eα ( x) = ( y − hα ( x)) 2 , and ensemble are e( x ) = ( y − h( x )) 2 . Substituting (3) for them, the error of ensemble is yielded as e( x ) =
1 N
1
eα ( x) − ∑ (hα ( x) − h( x)) ∑ N α α
Denote the average accuracy e( x) =
1 N
2
(4)
eα ( x) , and the average diversity of base ∑ α
1 ∑ (hα ( x) − h( x)) 2 . So (4) becomes e( x) = e( x) − d ( x) . N α Assume the predicted instances x are drawn randomly from the distribution p( x) .
rankers d ( x) =
All these formulas can be averaged over the distribution. Denote E , E and D as the average version for e , e and d . The generalization error of ensemble ranking becomes E =E−D
(5)
According to (5), the generalization error of ensemble E will be lower the average error of base rankers E , because the diversity of base rankers D are always positive. Moreover, the lower accuracy E and the higher diversity D will make the generalization error of ensemble lower. The performance of ensemble of ranking depends on the accuracy and diversity of base rankers.
4 Experiment 4.1 Data Collection
In our experiment, we made use two benchmark datasets: OHSUMED [8] and .Gov. These two datasets are gained from Letor [12] Dataset released by Microsoft Research Asia. The OHSUMED dataset is a collection of documents and queries on medicine, consisting of 106 queries, and 16,140 query-document pairs upon which relevance judgments where made. The relevance judgments are either d (definitely relevant), p (possibly relevant), or n (not relevant). In the Dataset, each querydocument pair instance consists of a vector of 25 features. .Gov dataset is a crawl from the .gov domain in early 2002. There are in total 1,053,110 html documents in this collection, together with 11,164,829 hyperlinks. We use the version taken by TREC2004 web track [13]. There are 75 queries and 1000 labeled corresponding document for each query. Each query-document pair instance consists of a vector of 44 features.
412
D. Li et al.
Two evaluation measures are used to evaluate the result of ranking methods: Mean Average Precision (MAP) [14] which calculates the mean of Average Precisions over a set of queries, and Normalized Discounted Cumulative Gain (NDCG) [15] which focus to the accuracy of Top N results in the ranking list. 4.2 Experiment with OHSUMED Data
We choose two classic ranking algorithms: PRank and RSVM as the baseline in our experiments. And we apply two training set construction strategies, Bootstrap Resampling and Clustering by Ranking Distance on the two ranking algorithms. We denote them as BR-PRank, CRD-PRank, BR-RSVM and CRD-RSVM. The experiments conduct 5-cross validation. The datasets are divided into five parts. For each cross, three parts are chosen as training set, one as validation set to tune the parameter and the other one as the test set. 0.6
0.6
0.55
0.55
0.5
0.5
0.45
0.45
0.4
0.4
0.35
0.35
0.3
0.3
0.25
0.25
0.2
MAP
NDCG@1
PRank
NDCG@3
Bagging PRank
NDCG@5
NDCG@10
Clustering PRank
(a)Based on PRank
0.2 MAP
NDCG@1 RSVM
NDCG@3
Bagging RSVM
NDCG@5
NDCG@10
Clustering RSVM
(b)Based on RSVM
Fig. 2. Ranking accuracies on OHSUMED data
From Figure 2, we can see the two multiple ranker methods both outperform the single model algorithm. Comparing between the two ranking algorithm, the RSVM series have higher performance than PRank series. It means the RSVM outperform PRank in the OHSUMED. But using the multiple ranker methods on PRank, there are a large relative improvement of MAP which is 8% by BR-PRank and 17% by CRD-PRank, and CRD-PRank’s performance is better than RSVM. For RSVM, both BR-RSVM and CRD-RSVM have a relative improvement as large as 1-2% of MAP. 4.3 Experiment with .Gov Data
For .Gov data, we only choose Ranking SVM as the base ranker learning algorithms. Again, the experiments conduct 5-cross validation. From Figure 3, we can see that both BR-RSVM and CRD-RSVM outperform RSVM on .Gov data. All measures have a significant improvement, the relative improvement of BR-RSVM is about 16%, and CRD-RSVM is 19%.
Multiple Ranker Method in Document Retrieval
413
0.55 0.5 0.45
RSVM BR-RSVM CRD-RSVM
0.4 0.35 0.3 0.25 0.2
MAP
NDCG@1
NDCG@3
NDCG@5
NDCG@10
Fig. 3. Ranking accuracies on .Gov data(RSVM)
4.4 Discussions
We present the MAP performance analysis in Table 1. We use the means of base ranker’s MAP EMAP to represent the base ranker’s accuracy, and use the standard deviation of base ranker’s MAP σMAP to represent the base ranker’s diversity. Besides, we denote the MAP of our multiple ranker methods and the relative improvement by base rankers as the performance evaluation. It is obvious that the multiple ranker method get a higher improvement for PRANK than RSVM, and the CRD strategy are better than BR. It is because that the base ranker’s accuracy and diversity determine the multiple ranker methods’ performance. Higher EMAP which represents the higher accuracy will cause higher ensemble MAP, like the RSVM series, and higher σMAP which represents the higher diversity will cause larger relative improvement, like the CDR series. So how to increase the base rankers’ accuracy and diversity is the key problem of multiple ranker methods, and it is worth to get more research in the future. Table 1. MAP performance analysis on OHSUMED
EMAP
σ
MAP
MAP
Improvement(%)
BR-PRank
0.380
0.0188
0.409
7.81%
CRD-PRank
0.381
0.0249
0.444
16.64%
BR-RSVM
0.426
0.0070
0.447
5.03%
CRD-RSVM
0.396
0.0250
0.449
14.71%
5 Conclusion In this paper, we proposed a multiple-ranker approach to improve ‘learning to rank’ methods for document retrieval. We propose a method to cluster the queries by there diverse ranking patterns. Based on these clusters, a set of base rankers are generated to cover the various queries. Because the well accuracy and diversity of these base rankers, the ensemble of them can decrease the generalization ranking error than traditional single methods. Experiments implemented on real-world dataset indicate that
414
D. Li et al.
our proposed approach can improve the performance of ‘learning to rank’ methods effectively. In the future work, the prime direction of our work is to find more effective methods to make the base rankers have more accuracy and high coverage of diverse queries. And finding more effective measurements to represent the similarity and diversity of queries is also our focused approach.
References 1. Herbrich, R., Graepel, T., Obermayer, K.: Large Margin Rank Boundaries for Ordinal Regression. Advances in Large Margin Classifiers, pp. 115–132 (2000) 2. Crammer, K., Singer, Y.: PRanking with Ranking. Proceedings of NIPS 2001, Vancouver, British Columbia, Canada (2001) 3. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to Rank Using Gradient Descent. In: Proceedings of ICML 2005, Bonn, Germany (2005) 4. Cao, Z., Qin, T., Liu, T.Y., Tsai, M.F., Li, H.: Learning to Rank: from Pairwise Approach to Listwise Approach. In: Proceedings of ICML 2007, Oregon, USA (2007) 5. Freund, Y., Iyer, R.D., Schapire, R.E., Singer, Y.: An Efficient Boosting Algorithm for Combining Preferences. Journal of Machine Learning Research 4, 933–969 (2003) 6. Xu, J., Li, H.: AdaRank: a Boosting Algorithm for Information Retrieval. In: Proceedings of SIGIR 2007, Amsterdam, The Netherlands (2007) 7. Kullback, S.: Information Theory and Statistics, New York, Dover (1968) 8. Hersh, W.R., Buckley, C., Leone, T.J., Hickam, D.H.: OHSUMED: An Interactive Retrieval Evaluation and New Large Test Collection for Research. In: Proceedings of SIGIR 1994, Dublin, Ireland (1994) 9. Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996) 10. MacQueen, J.B.: Some Methods for Classification and Analysis of Multivariate Observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley (1967) 11. Johnson, S.C.: Hierarchical Clustering Schemes. Psychometrika 2, 241–254 (1967) 12. Liu, T.Y., Qin, T., Xu, J., Xiong, W.Y., Li, H.: LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. In: Proceedings of LR4IR 2007, in conjunction with SIGIR 2007, Amsterdam, Netherlands (2007) 13. Craswell, N., Hawking, D.:Overview of the TREC-2004 Web Track. In: TREC (2004) 14. Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval, Addison-Wesley Longman Publishing Co., Inc., Boston, MA (1999) 15. Jarvelin, K., Kekalainen, J.: Cumulated Gain-based Evaluation of IR Techniques. ACM Transactions on Information Systems 20(4), 422–446 (2002)
An Elimination Method of Light Spot Based on Iris Image Fusion Yuqing He1, Hongying Yang1, Yushi Hou2, and Huan He1 1
Department of Opto-Electronic Engineering, Beijing Institute of Technology, Beijing, P.R. China, 100081 {Yuqinghe,angelaying,20701170}@bit.edu.cn 2 Smartiris Biometrics Co. Ltd., Beijing, P.R. China, 100081 [emailprotected]
Abstract. Iris recognition has become an effective method for personal identification. Many kinds of factors related with illumination in the image acquisition period bring light spots to the iris images, which causes the loss of the iris texture. To a great extent, it will affect the speed and precision of the iris recognition system. Here we proposed a light spot elimination method based on image fusion. First, image preprocessing is used to spread the iris feature circle to a rectangle. Next, match two images of the same iris captured in different time and with light spots in different positions. Then calculate the precise positions of the light spots and carry on the fusion of the two iris images. As a result, the light spots are effectively eliminated after the fusion. Experimental results show that this method can eliminate the disturbance of light spots and avoid the loss of iris texture in iris recognition. Keyword: Iris recognition, Light spot elimination, Image fusion, Image matching.
1 Introduction The iris is one of the most unique structures in the human body, which is a fabric assorted ring in the eyes. Each iris contains each unique structure which is based on crown, crystalline lens, filament, spot, structures, dints, beams, wrinkles and stripes and so on. The rich texture and the complex structure make the iris have the characteristics, including uniqueness, the stability, possibilities of being captured, the difficulties of being changed, non-invasiveness [1-3] and so on. These make the iris to be suitable in the personal recognition, and have the lowest recognition error rate. Although there are many iris recognition methods [1-5] at present, it still has some problems to the iris recognition. With the small size and the limitation of optical imaging system, it’s hard to get the iris images in focus. And this requests the image acquisition system to consider the low quality iris pictures which caused by all kinds of situations in image capture process[6-7], thus leads to low iris recognition rate. For instance, there are light spots on the image which caused by the illumination. And this problem has been one of the most difficult questions to solve in the iris image D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 415–422, 2008. © Springer-Verlag Berlin Heidelberg 2008
416
Y. He et al.
preprocess. Generally speaking, the light spots on the iris image are often caused by the situation below: (1) Regarding non-contact acquisition system, the unfixed positions of users in front of the system always make the light spots caused by equipment's auxiliary light source not be exactly controlled in the pupil. They may appear in the iris, Fig.1 shows the sample images; (2) When users wear the eyeglasses, there are always large area of light spots in the iris which caused by the eyeglasses lens' reflection. (3) Light reflection can also be caused by the liquid on the eyes’ surface.
Fig. 1. Sample images which have the light spots in different positions of the iris
The light spots on the iris image caused by the situation above, have caused the loss of iris texture feature in some extent, thus will influence the extraction of texture feature and affect the accuracy of iris recognition. If we solve these problems, the iris recognition system would have a higher accuracy, and will be more robust in different situation. In view of the light spots’ influence, this article proposes an elimination method of light spots based on the iris image fusion. After image preprocessing, the iris feature circle is spreaded to a rectangle. Then, in the form of rectangle, we match the iris images and localize the precise positions of the light spots. Through image fusion, we get the iris image with no light spot. The result shows that this method can effectively eliminate the useless info of the iris feature.
2 Iris Image Preprocessing The iris image preprocessing is very essential in the iris recognition, which immediately influences the accuracy of recognition rate. The iris texture is in circle area, so it’s hard to match the rotated images of the same person. This procedure can spread the circle iris area into a rectangle, which is more suitable for the subsequent processing and feature extraction. The image preprocessing mainly includes the image enhancement, iris localization, iris image normalization and so on. 2.1 Iris Localization The iris localization immediately influences the accuracy of iris recognition. It includes the iris inner edge location and external edge location. According to the gray characteristics of the eye images, we can easily find the round dark part of pupil. Then we can calculate the position and the radius of the pupil, which can realize the iris inner edge location.
An Elimination Method of Light Spot Based on Iris Image Fusion
417
Fig. 2. Location result of iris image
There are always lid and eyelash disturbances in the iris image. And the gradient between the iris and the sclera is not big enough. Therefore the external edge location is much more difficult. After doing median filtering, we use the canny operator[8] and Hough transform to locate the iris external edge and get the center and the radius of iris external edge. Fig. 2 shows the localization result of the iris image. 2.2 Image Normalization It seems that the center of the pupil and the iris's center closely coincide. Therefore we take the center of the pupil as the center of the circle, when we cut the iris image from the eye image. The polar coordinate transformation[9] spread the circle into a rectangle. It is defined by the formula below. X θ ( ρ ) = X 0 + ρ c o s (θ ) Yθ ( ρ ) = Y 0 + ρ s i n (θ )
(1)
Here Xθ (ρ) and Yθ (ρ) is separately coordinate of the iris image, when the angle is θ and the length is ρ. X0 and Y0 are the coordinates of the pupil’s center. The scope of the angle is between 0 and 360 degree. When ρ changes, sampling rate won’t change along with ρ. Sometimes the sampling point is not the integer. So we use the bilinearity interpolation to solve this problem. As shown in Fig. 3, the picture must be reproduced to the unified size after normalization, with the aim of feature extraction and matching.
Fig. 3. Image normalization result of two iris images
3 Image Registration and Fusion After image preprocessing, we can get the iris area in rectangle. Then we need to register the two images, localize the position of the light spot and fuse the two images. 3.1 Image Registration As shown in Fig.3, there are two iris image of the same person in different image acquisition situation. Rotation in certain extent and other factors may bring the
418
Y. He et al.
position deviation of the iris image, so the iris texture always cannot align completely after expansion and normalization. This situation has brought problems to the fusion of iris image. Therefore it is necessary that the two images must be matched and aligned completely before fusion. Generally speaking, there are many methods for the image registration. And the template matching [10-14] method is a proper method because it doesn’t limit the texture information of the images which are matching. Here we use the template matching method for image registration. As shown in Fig.4, suppose that template T(m,n) moves across the searched graph S. The size of the template is m×n matrix. Generally the searched graph, which is covered by the template, is named subgraph. And i and j are the corodinates of the template’ position. And Si,j(m,n), the reference point, is the coordinate of the image point which is in the subgraph's top-left hand corner in S graph. Compared T(m,n) with Si,j(m,n), if both are consistent, the difference between T(m,n) and Si,j(m,n) are 0. Therefore we can use the following correlation function to weigh similar degree of T(m,n) and Si,j(m,n), namely M
R
( i ,
j )
=
∑ m
M
∑
= 1
n = 1 M
∑ m
= 1
S
i ,
j
( m
, n ) T
( m
, n )
(2)
M
∑
[ S
i , j
( m
, n ) ]
2
n = 1
Here we suppose that the size of the searched graph is M×N matrix. As we can see from Fig.4, the value of i and j are both smaller than (N-M + 1) and bigger than 1. The calculation amount of matching by using the correlation method to is very big, because the template must make the similar computation in the number of (N-M+1) reference position, most of which is useless work in the non-matching points. Therefore we use a rapid calculation method named sequential similarity detection algorithm (SSDA). [15] The procedures of iris image registration are as follows. As shown in Fig.4, at first we should select a template on an image, which is called the standard image, and take down this template's central coordinate. Actually normalization has been completed before matching. And there is not rotation and scaling between these two images.
Fig. 4. Schematic diagram of template matching
An Elimination Method of Light Spot Based on Iris Image Fusion
419
Therefore the difference between the testing image and the standard image only has relationship with the translation. Accordingly we can add and subtract certain value on the basis of the template coordinate of the standard image to take the search sector separately, and carry on the matching work. Then use the SSDA algorithm to select the region in the testing image which has the greatest similarity with the template to be matching region. Generally we always take the rectangle or the square to be the template. According to the gray levels in the four corners of template and gradient characteristics of two directions we can find the matching points of the four corners of the template image rapidly in the testing image, and then obtain this template's matching region. As shown in Fig.5, after obtaining the matching parameter, we should cut and splice the image. Then the image registration has been done. The registration result is shown in Fig.5. Edge of Original Image
Fig. 5. Two iris images after registration
3.2 Location of the Light Spots As shown in Fig.5, images registration has be done. And we need to separately locate the light spots’ positions in the two iris images before image fusion. Then we can know the parameters of the light spots, such as the size, the position and the shape. Because the light spots’ gray values are the biggest in the images and the gradient of gray values is large, here we select Sobel operator to detect the edges of light spots. Combining the character of the light spot’s approximate shape, size and gray value, we can exclude the false light spot. Then we can use Hough transform to locate the edge of the light spot and get the center and the radius of it. The results are shown in Fig.6. Light Spot
Light Spot
Fig. 6 Location of light spots by using Sobel operator
3.3 Image Fusion After locating the position and the parameters of light spots in the two images, we use image segmentation technology to cut the light spots in the images and replace the corresponding part with the completed texture of the same position in another image. Considering the different sizes of light spots, here we select the diameter of the bigger light spot to be the standard one. The image shown in Fig.7 is the completed iris image after image fusion. From it we can see that the light spots have been eliminated successfully.
420
Y. He et al.
Fig. 7. Iris image after image fusion
4 Experimental Results and Analysis We get a database of iris images which are captured from 30 persons. For each person, we capture 8 samples of both two eyes under different conditions. And there are different images including the iris images with light spots on the iris and the ones without light spots. As a result, a total of 240 iris samples are available. The resolution of all the iris images is 640 × 480, and their gray level is 256. We choose different images of the same iris sample for experiments, in which there are light spots in the different positions. Then we use the method proposed in this paper to eliminate light spots. After iris localization and normalization, we use the templete matching method to complete the registration of the two iris images. And then we use Sobel operater to locate the positions of light spots. At last, we use image segmentation technology to cut the light spots in the images and replace the corresponding part with the completed texture of the same position in another image. Then in the fused image, the light spots are eliminated completely. To evaluate the experimental results, we use the gray variance to compare the differences between the fused image and the same sample’s standard image without the light spots (here we use “standard image” to represent it). The gray variance reflects the approximation level between the fused image and the standard one, which is defined in the formula below. D ( X , G ) = E [( X − G ) 2 ] − [ E ( X − G )]2
(3)
Here X is the gray matrix of the fused image, while G is the gray matrix of the standard image. Finally there are 60 sets of data in all, which are shown in Fig.8. -3
1
x 10
0.9
Gray Variance
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
10
20
30
40
50
Experiment Times
Fig. 8. Gray Variance of Fused Images and Standard Images
60
An Elimination Method of Light Spot Based on Iris Image Fusion
421
As shown in Fig.8, we can see that the range of the different value is between 0.0001 and 0.001. It means that the difference between the fused image and the standard image are very little. Experimental results show that this method proposed in this paper can effectively eliminate the light spots on the iris image and avoid the loss of the iris texture.
5 Conclusion Among all kinds of low quality iris images, light spots on the iris image which caused by the illumination often appear in the image acquisition period. This problem may cause the loss of iris texture feature, thus will influence the extraction of texture feature and affect the accuracy of iris recognition. This paper proposed an elimination method of light spot based on the image fusion. This method rapidly match the iris images after the iris localization and normalization, with locating the light spots’ precise position, it realize the image fusion. Finally we can get the high quality iris images with no light spots. The experimental results show that this method can effectively eliminate light spots and avoid the loss of the iris texture. This processing has guaranteed the integrity and clarity of the iris texture information, thus has further guaranteed the iris feature extraction and the match precision. Acknowledgments. This project is supported by National Science Foundation of China (No. 60572058) and Excellent Young Scholars Research Fund of Beijing Institute of Technology (No. 2006Y0104).
References 1. Daugman, J.G.: High Confidence Personal Identification by Rapid Video Analysis of Iris Texture. In: Proceedings of IEEE 1992 International Conference on Security Technology, pp. 50–60 (1992) 2. Daugman, J.G.: How Iris Recognition Works. In: Proceedings of the 2002 International Conference on Image Processing, New York, USA, September 22-25, vol. 1, pp. 133–136 (2002) 3. Wildes, R.P.: Iris Recognition: an Emerging Biometric Technology. Proceedings of IEEE 85(9), 1348–1363 (1997) 4. Xiangjun, W., Zhangmin, Xinling, Z. et al.: Research on Non-contact Method of Capturing Iris Image and Extracting Feature. Acta. Optica. Sinica. 25(3), 319–323 (2005) (in Chinese) 5. Weiqi, Y., Lu, X., Zhonghua, L.: Iris Identification Method Based on Gray Surface Matching. Acta. Optica. Sinica. 26(10), 1537–1542 (2006) (in Chinese) 6. He, Y., Cui, J., Tan, T., Wang, Y.: Key Techniques and Methods for Imaging Iris in Focus. In: The 18th International Conference on Pattern Recognition (ICPR) (2006) 7. Wang, Y., He, Y., Hou, Y., Liu, T.: Design Method of ARM Based Embedded Iris Recognition System. In: International Symposium on Photoeletronic Detection and Imaging: Technology and Applications (ISPDI) (2007) 8. Xiaomei, Z., Yuanbin, H.: A New Iris Location Method. Chinese Journal of Sensors and Actuators 20(1), 218 (2007) (in Chinese)
422
Y. He et al.
9. Proenca, H., Alexandre, L.A.: Iris Recognition: An Analysis of the Aliasing Problem in the Iris Normalization Stage. In: Computational Intelligence and Security, vol. 2, pp. 1771– 1774. IEEE, Los Alamitos (2006) 10. Rezaie, B., Srinath, M.D.: Algorithms for Fast Image Registration. IEEE Transactions on Aerospace and Electronic Systems AES-20, 716–728 (1984) 11. Pratt, W.K.: Digital Image Processing. Wiley, New York (1978) 12. Jun, G., Xuewei, L., Jian, Z., Bingheng, L.: Image Registration Algorithm Based on Template Matching. Journal of Xiaan Jiaotong University 41(3), 308 (2007) (in Chinese) 13. WebbTer, W.F.: Techniques for Image Registration. In: Proceedings of Machine Processing of Remotely Sensed Data, IEEE Catalog 73, CHO 834-2GE, pp. 181–l87 (1973) 14. Li, Q., Zhang, B.: Template Matching Based on Image Gray Value. In: Shipeng, L. (ed.) Proc. of SPIE Visual Communications and Image Processing 2005, vol. 5960 (2005) 15. Shen, T., Wang, W., Yan, X.: Digital Image Processing and Pattern Recognition, pp. 175– 177. Beijing Institute of technology Press, Beijing (2007) (in Chinese)
An Improved Model of Producing Saliency Map for Visual Attention System Jingang Huang1,2,3, Bin Kong1,3, Erkang Cheng1,2,3, and Fei Zheng1,3 1
Center for Biomimetic Sensing and Control Research, Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, Anhui Province, China 2 Department of Automation, University of Science and Technology of China, Hefei, Anhui Province, China 3 The Key Laboratory of Biomimetic Sensing and Advanced Robot Technology, Anhui Province [emailprotected], [emailprotected]
(
)
Abstract. The iLab Neuromorphic Vision Toolkit INVT , steadily kept up to date by the group around Laurent Itti, is one of the currently best known attention systems. Their model of bottom up or saliency-based visual attention as well as their implementation serves as a basis for many research groups. How to combine the feature maps finally into the saliency map is a key point for this kind of visual attention system. We modified the original model of Laurent Itti to make it more corresponding with our perception. Keywords: Visual attention, Feature integration, Saliency map, bottom up.
1 Introduction Visual selective attention is the essential ability of primates. It allows primates to process the visual information from the environment. It is also an important function of the humans to select the interest things from the input information. If there is no specific purpose or intention, the bottom up or saliency-based visual attention is helpful for primates. Koch and Ullman introduced the first approach for a computational architecture of visual attention in 1985[1]. One of the best known and implemented attention system :iLab Neuromorphic Vision Toolkit INVT of Itti et al, is just derived from Koch and Ullman’s model. The first model of Itti et al [2], which was introduced in 1998, has been extended in many directions since then. The ideas of the feature maps, the saliency map, the WTA and the IOR were adopted from the Koch-Ullman Model, the approaches of using linear filters for the computation of the features, of determining contrasts by centersurround differences and the idea of the conspicuity maps were probably adopted from Milanese [3]. The main contributions of this work are detailed elaborations on the realization of theoretical concepts, a concrete implementation of the system and the application to artificial and real-world scenes. The authors describe in detail how
(
)
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 423–431, 2008. © Springer-Verlag Berlin Heidelberg 2008
424
J. Huang et al.
the feature maps f are computed: all computations are performed on image pyramids, Image pyramid a common technique in computer vision that enables the detection of features on different scales. Additionally, they propose a weighting function for the weighted combination of the different feature maps by promoting maps with few peaks and suppressing those with many ones. This technique is computationally much faster than the relaxation process of Milanese [3] and yields good results. Since the suggested weighting function still suffered from several drawbacks, they introduced an improved procedure in 2001[4]. The improved procedure also is not perfect. For example, it is more complex and needs iterative steps. Although after these years the INVT becomes more and more consummate and complex, the basic model of bottomup visual attention doesn’t change much. So we improved the model and proposed another useful weighting function because of these drawbacks.
2 Model Before we introduce our model, we want to analysis Laurent Itti’s model shown in Fig.1 first. The conspicuity of every pixel in original input image is represented by the value of the corresponding pixel in the saliency map. The purpose of the saliency map is to represent the conspicuity at every location in the visual field by a scalar quantity and to guide the selection of attended location, based on the spatial distribution of saliency. The spatial scales are created by using dyadic Gaussian pyramids which progressively low-pass filter and subsample the input static color image. Each feature is computed by a set of linear “center-surround” operations akin to visual receptive fields. Center-surround is implemented in the model as the difference between fine and coarse scales image. The across-scale difference between two maps, denoted – below, is obtained by interpolation to the finer scale and point-by-point subtraction. More details can be found in [2]. Now, we analyze some aspects of the model in detail which have more or less drawbacks and guide us in establishing our model.
Fig. 1. General architecture of Itti’s model
An Improved Model of Producing Saliency Map for Visual Attention System
425
2.1 Center-Surround Differences (On Center Difference and off Center Difference) In Itti’s paper [3], they mentioned: Center-surround differences between a “center” fine scale c and a “surround” coarser scale s yield the feature maps. The first set of feature maps (intensity feature maps) is concerned with intensity contrast, which, in mammals, is detected by neurons sensitive either to dark centers on bright surrounds or to bright centers on dark surrounds. These mechanisms are inspired by the ganglion cells in the visual receptive fields of the human visual system, which respond to intensity contrasts between a center region and its surround. The cells are divided into two types: on-center cells respond excitatorily to light at the center and inhibitorily to light at the surround, whereas off-center cells respond inhibitorily to light at the center and excitatorily to light at the surround [5]. But in Itti’s model, they didn’t distinguish the two aspects.
Fig. 2. Both the white and black dots has the same contrast with the background
We take the intensity channel for example. The two images shown in Fig.2 are different in our perception, but they yield the same results in Itti’s model. This is because only the contrast between center and surround by using I (c, s ) = I (c)ΘI ( s ) is cared. Θ represents center-surround differences, and it is well defined in Itti’s paper in 1998[3]. In our model, on center difference and off center difference are used as follows: We define: I (c, s) = ( I (c)ΘI ( s) )
On center differences:
(1)
for each pixel ( x, y ) ,
⎧ I (c, s )( x, y ), I on (c, s )( x, y ) = ⎨ , ⎩0
if
I (c, s )( x, y ) > 0
else
(2)
Off center differences: for each pixel ( x, y ) , ⎧− I (c, s )( x, y ), I off (c, s )( x, y ) = ⎨ , ⎩0
if else
I (c, s )( x, y ) < 0
(3)
426
J. Huang et al.
2.2 Extraction of Early Visual Features
First we see the extraction of early visual features in Itti’s model: With r , g and b being the red, green and blue channels of the input image, an intensity image I is obtained as:
I = ( r + g + b) / 3
(4)
I is used to create a Gaussian pyramid I (σ ) ,where σ ∈ (0...8) is the scale, The r , g and b channels are normalized by I in order to decouple hue from intensity. Four broadly-tuned color channels are created as show in (5). R = r − ( g + b) / 2 G = g − ( r + b) / 2 B = b − (r + g ) / 2 Y = ( r + g ) / 2− | r − g | / 2 − b
(5)
Negative values are set to zero. Gaussian pyramids R(σ ), G (σ ), B(σ ), and Y (σ ) are created from these color channels. Next, how to obtain feature maps in our model will be introduced. 2.1.1 Intensity Feature Maps The intensity image in our model is defined as the same as in Itti’s model. But we take both on center difference and off center difference into consideration which has been introduced in Section A. So we obtain two times intensity feature maps than Itti’s model. We can solve the problem which is described in Section A. 2.2.2 Color Feature Maps We don’t use the method of Itti to deal with the color channel. The CIE LAB color space is currently one of the most popular uniform color spaces [6]. In uniform color spaces, “the distance in coordinate space is a fair guide to the significance of the difference between two colors as perceived by a human observer”. The color transform from RGB to CIE LAB is described in [7]. In the CIE LAB color space, four color channels are created by follows:
[-a, a] is normalized to [0, 255] for red, -[-a, a] is normalized to [0, 255] for green, [-b, b ] is normalized to [0, 255] for yellow, -[-b, b] is normalized to [0, 255] for blue.
Fig. 3. The CIE LAB color space
An Improved Model of Producing Saliency Map for Visual Attention System
427
We create the Gaussian pyramid R(σ ), G (σ ), B(σ ), and Y (σ ) .Then, we can just take the on center difference into consideration because taking the off center difference on red map is exactly taking the center difference on green map. So do the blue and green maps. 2.2.3 Orientation Feature Maps We don’t alter Itti’s method too much for producing the orientation feature maps. Local orientation information is obtained from I using oriented Gabor pyramids O(σ ,θ ) , where σ ∈ (0...8) represents the scale and θ ∈ {0o ,45o ,90o ,135o } is the
preferred orientation [8]. (Gabor filters, which are the product of a cosine grating and a 2D Gaussian envelope, approximate the receptive field sensitivity profile (impulse response) of orientation-selective neurons in primary visual cortex [9].) The only difference is that we use the on center difference to obtain the orientation feature maps. In total, 60 feature maps are computed: 12 for intensity, 24 for color and 24 for orientation. 2.3 Combine the Feature Maps Finally into Saliency Map
The most important step is how to combine these feature maps into saliency map. Itti mentioned: One difficulty in combining feature maps is that they represent a priori not comparable modalities, with different dynamic ranges and extraction mechanisms. If we merely sum up all the feature maps in a straight way, salient objects appearing strongly in only a few maps may be masked by noise or by less-salient objects present in a larger number of maps. Itti proposed a map normalization operator N (<), which globally promotes maps in which a small number of strong peaks of activity (conspicuous locations) is present, while globally suppressing maps which contain numerous comparable peak responses. For each feature map, N (<) includes: 1) 2) 3)
Normalizing the values in the map to a fixed range [0,..., M ] , in order to eliminate modality-dependent amplitude differences; Finding the location of the map’s global maximum M and computing the average m of all its other local maxima m ; Globally multiplying the map by ( M − m)2 .
Itti used a figure to explain the reasonability of his operator in his paper. But there are problems with this approach. When M and m is the same or very similar, the problem appears. The map will have no contribution at all. This is not true for our perception. For example, when there is more than one white dot in the dark background and all the dots have the same intensity, we can not percept their existence. This is not reasonable. In 2001, Itti proposed other feature combination strategies: the “Naive”, “Trained” and “Iterative” [4]. The “Naïve” strategy is simple but has bad result. “Trained” strategy has the best result but needs supervised learning. It is more a top-down way.
428
J. Huang et al.
“Iterative” has better result but it is more complex and needs iterative several times. More details can be obtained in [4]. So we propose our strategy––another normalization operation N '(<): 1) 2)
3)
Normalizing the values in the map to a fixed range [0,..., M ] , in order to eliminate modality-dependent amplitude differences; Finding the number of points that above a threshold of 50% of M and the number of points that above a threshold of 90% of M , remarking them as m1 and m2 separately; if m 2 = 1 , the point has the value of M is regarded as the noise and is removed away. 1) and 2) will not stop repeating until m 2 > 1 . 1 . Globally multiplying the map by (m1* m2)1/ 4 That is, our weighting function can be described as follows: N '( X ) = X *
1 (m1* m2)1/ 4
(6)
In this way, we not only realize the function of N (<) but also solve the problems that we mentioned above. Just like Itti’s model, feature maps are combined into three “conspicuity maps,” at the scale (σ = 4) of the saliency map. They are obtained through across-scale addition, “ ⊕ ” which consists of reduction of each ma to scale four and point-by-point addition: I=
4
⎛ 4 c+4 ⎞ N ' ⎜ ⊕ ⊕ N ' ( I i ( c, s ) ) ⎟ ⎝ c = 2 s =c +3 ⎠ i ={on , off }
∑
c+4
C = ⊕ ⊕ ⎡⎣ N ' ( Ron ( c, s ) ) + N ' ( Gon ( c, s ) ) + N ' ( Bon ( c, s ) ) + N ' (Yon ( c, s ) ) ⎤⎦ c = 2 s =c +3 O=
⎛ 4 c+4 ⎞ N ' ⎜ ⊕ ⊕ N ' ( Oon ( c, s, θ ) ) ⎟ c =2 s=c +3 o o o o ⎝ ⎠ θ ={0 ,45 ,90 ,135 }
∑
(7)
(8)
(9)
Finally, we get the saliency map by using: S=
( ()
( )
( ))
1 N' I +N' C +N' O 3
(10)
We don’t use WTA in our model. We select the most salient point as the seed point and the attended location is the region that all the points whose values differ at most 20% from the maximum value surround the seed point. The region is show by a red elliptic. If there are more than one most salient point, they all are seed points. To sum up, the architecture of our model can be shown as Fig.4.
An Improved Model of Producing Saliency Map for Visual Attention System
429
Fig. 4. The architecture of our model
3 Experiment Results and Discussion Our experiment results compare with the results of Itti’s model
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 5. Compared with Itti’s model
In Fig. 5, (a) is the saliency map by our model, (b) is our experiment result, (c)is the saliency map by Itti’s model and (d) is Itti’s result. (e) is our experiment result and (f) is Itti’s result. In artificial scene, our model is more corresponding with our perception because we distinguish on and off center-surround differences. For real world scene, there is no uniform evaluation standard. We can see some other results in Fig.7.
430
J. Huang et al.
Fig. 6. Other experiment results
Finally, Comparisons of “Naïve”, “Trained”, “ N (<)”, “Iterative” and our “ N '(<)” are show in Fig.7.
Fig. 7. Comparisons with other strategies
We can see that our weighting function N '(<) received the same good result. Acknowledgement. This work is supported by National Basic Research Program of China (No.2006CB300407) and NSFC (No.10635070).
References 1. Koch, C., Ullman, S.: Shifts in Selective Visual Attention: towards the Underlying Neural Circuitry. Human Neurobiology 4(4), 219–227 (1985) 2. Itti, L., Koch, C., Niebur, E.: A Model of Saliencybased Visual Attention for Rrapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 20(11), 1254–1259 (1998)
An Improved Model of Producing Saliency Map for Visual Attention System
431
3. Milanese, R.: Detecting Salient Regions in an Image: From Biological Evidence to Computer Implementation. PhD thesis, University of Geneva, Switzerland (1993) 4. Itti, L., Koch, C.: Feature Combination Strategies for Saliency-based Visual Attention Systems. Journal of Electronic Imaging 10(1), 161–169 (2001) 5. Palmer, S.E.: Vision Science, Photons to Phenomenology. MIT Press, Cambridge (1999) 6. Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall, Berkeley (2003) 7. Color Conversion Algorithm, Accessed (May 5, 2007), http://www.cs.rit.edu/~ ncs/color/t_convert.html 8. Greenspan, H., Belongie, S., Goodman, R., Perona, P., Rakshit, S., Anderson, C.H.: Overcomplete Steerable Pyramid Filters and Rotation Invariance. In: Proc. IEEE Computer Vision and Pattern Recognition, Seattle, Wash, pp. 222–228 (June 1994) 9. Leventhal, A.G.: The Neural Basis of Visual Function: Vision and Visual Dysfunction, vol. 4. CRC Press, Boca Raton (1991)
Research on License Plate Detection Based on Wavelet Junshan Pan1 and Zhiyong Yuan2 1
Department of Computer science and technology, Xiaogan University, Xiaogan, Hubei 432100, China 2 School of Computer Science Wuhan University, Wuhan, Hubei 430072, China [emailprotected], [emailprotected]
Abstract. License Plate Recognition (LPR) is one of the critical techniques for ITS. It can be widely used in traffic control and traffic surveillance. License Plate Detection is one of the important components in LPR. In this paper, we propose a method based on wavelet transform to locate the plate zoom. Firstly, we decompose and de-noise the image using wavelet transform; secondly, the vertical gradient is calculated through four components of one level wavelet decomposition, and then, all the vertical gradients of each detail to obtain a summative image are added. Finally, we take a window traversing through the summative image; find out the coordinate that max summation can be gained, the license plate region can be detected. The experimental results show that our method is effective. Keywords: license plate detection, wavelet transform, vertical gradient.
1 Introduction License Plate Recognition (LPR) is one of the important components of ITS. It is widely used in traffic control and traffic surveillance. A complete license plate recognition system usually consists of vehicle image acquisition, license plate detecting, character segmentation and character recognition, and other components. See the following Fig.1. License Plate Detection is detecting the precise position of the plate license using image processing technology. License Plate Detection technology is the foundation and prerequisite in License Plate Recognition technology. License plate detecting is directly related to the follow-up work. At present, the domestic and foreign car license plate detection algorithm mainly falls into two broad headings, namely based on color image and based on gray image. Because of a license plate consists of a few colorful characters and the background composition, and so there are many ways to search for plates using color information processing technology[1][2]. The gray-scale image processing speed is faster, so most of the current license plate recognition system is based on gray-scale images. Based on the analysis of texture, detection based on edge detect, detection based on mathematical morphology [4][5], based on neural network and heritage of the detection algorithms[6][7], based on wavelet transform license plate detection[3]. In this paper, we propose a method based on wavelet transform to locate the license plate region. The algorithm consists of three stages, Wavelet decomposition, calculate D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 440–446, 2008. © Springer-Verlag Berlin Heidelberg 2008
Research on License Plate Detection Based on Wavelet
441
Vehicle
License
Character
Character
Image
Plate
Segmentation
Recognition
Acquisition
Detection
Fig. 1. License Plate Recognition system
Image Preprocessing
Segment License
the
Wavelet
Calculate Vertical
Decomposition
Gradient for each
and De-noising
Component
Detected
Plate
Region
by
Sum
Up
each
traversing
Component’s
Window
Vertical Gradient
Fig. 2. The flowchart of the algorithm
vertical gradient and locates the license plate region by traversing window. The flowchart of the algorithm is shown in Fig.2. This paper is organized as follows: Section 2 describes the procedures of this approach in detail. Then, the experiment results are presented in Section3 and finally conclude the paper.
2 Algorithm for License Plate Detection 2.1 Image Preprocessing Because the quality of vehicles images is susceptible to weather, light, observation points, and so on, license plate images often have the disadvantage of poor contrast. Therefore, we need to expand image gray degree, in order to improve the image quality of observation and improve character recognition rate. Use the technology of histogram equalization and contrast enhancement to improve the quality of the vehicle image. 2.2 Wavelet Decomposition and De-noising 2.2.1 Wavelet De-noising Wavelet analysis has been widely used in image processing. In general, it is manifested mainly in the following areas: image decomposition and reconstruction, image compression, image de-noising, image enhancement, etc.
442
J. Pan and Z. Yuan
Fig. 3. Original image (left) and De-noised image (right)
The traditional de-noising method is to bring signal through a low-pass or band-pass filter, the drawback is obscuring the signal in the de-noising at the same time. Wavelet denoising uses a different center frequency band-pass filter for signal filtering, to remove those coefficients of the scale which mainly reflect the noise frequency, and then integrate the remaining coefficient-scale to anti-transformation, so that the noise suppression is satisfactory. The original image and the de-noised image are shown in Fig. 3. 2.2.2 Wavelet Transforms Decomposition Because the detail information at different resolution in license plate image often represents different physical structure of the image, coefficient obtained from the Wavelet transform is valuable. There are many popular wavelets can be used in license plate detection such as Haar wavelet, Daubechies wavelets, Mexican Hat wavelets and Morlet wavelet transform. In our approach, we use Haar wavelet to carry out two-dimensional wavelet transform to extract multi-resolution characteristics, Haar wavelet transform is simple, and the results can be rapid. Its transform equation [8]:
Wϕ ( j 0 , m, n ) =
Wψi ( j0 , m , n ) =
1 MN 1 MN
M −1 N −1
∑ ∑ f ( x, y )ϕ x =0 y =0
M −1 N −1
∑ ∑ f ( x , y )ϕ x =0 y =0
j0 , m , n
i j ,m ,n
( x, y )
(1)
( x, y )
(2)
In the formula: j j
ϕ j ,m ,n ( x , y ) = 2 2 ϕ ( 2 x − m ,2
j
y − n)
(3)
j j j j j ψ j ( x , y ) = 2 2 ψ ( 2 x − m ,2 y − n ) ,m,n
(4)
Research on License Plate Detection Based on Wavelet
443
;
ϕ j , m , n ( x, y ) is the function of scale ψ jj , m , n ( x, y ) are three functions of two-dimensional Wavelet , and that Wϕ ( j0 , m, n) is the coefficient define to the approximation on the scale j0 with (x ,y). Wψi ( j0 , m, n) is the coefficient representing the detail information of the horizon, vertical, diagonal direction on scale j≥j0,i ={H ,V ,D}. In digital image processing, we generally use Mallat fast algorithm to get the high-frequency information. Fig.4 show the results of the one-level directional image using the Haar wavelet. Clearly, these four sub-plans of the multi-resolution represent the license plate regional characteristics. In addition to LL graph, the other three sub-plans illustrate the high-frequency characteristics of the original image.
Fig. 4. Four components of one level wavelet decomposition
2.3 Gradient Images 2.3.1 Vertical Gradient Images In order to highlight these high characteristics furthermore, we calculate vertical gradient for each high-frequency sub-plan so as to highlight the feature that gray values change greater in license plate region. The reason why we choose the vertical gradient rather than horizontal gradient is that the license plate region changes mainly concentrated in the vertical gradient, its vertical plate regional obviously more evident than in other regions, and horizontal gradient is not clear that its license plate region and other regional differences do not and are thus unable to extract license plate results, as Fig.5 shown. Horizontal, vertical gradient calculation formula: (5) gh(i,j)=|f(i+1,j)-f(i,j)| gv(i,j)=|f(i,j+1)-f(i,j)|
(6)
444
J. Pan and Z. Yuan
Fig. 5. Image of Horizontal gradient (left), vertical gradient (right)
2.3.2 Gradient Combined Images We then combined these vertical gradient together, the sum of the license plate through regional highlight. the formula of summation as follow: gg(i,j)=|h(i,j+1)-h(i,j)|+|v(i,j+1)-v(i,j)|+|d(i,j+1)-d(i,j)|
(7)
In this formula, h, and d v is shown as the three high-frequency graph in the multi-resolution analysis of, and gg(i,j) is the combination of three high-frequency graph of the vertical gradient. From the Fig.6, we can find value in the total gradient within the region should be higher than many other places. We gradient image of the total value of those pixel in a smaller home for the gray value 0 to increase image contrast eliminate interference. Set a threshold thres, if pixel in the image of the gray value is less than the thres set it to 0; Thres is the average of the gradient image.
Fig. 6. Gradient combined images
2.4 Window Traversing and License Plate Image Segmentation In the gradient image, the summation of gradient value in the region of license pate is higher (ie, the point is relatively concentrated white). Although some parts of the external plate region also have higher gradient aggregate value of high-frequency characteristics, but rather scattered unlike within the license plate region concentrated. Through this feature, we will be able to quickly identify the location of license plates out to achieve rough positioning.
Research on License Plate Detection Based on Wavelet
445
Specific approach is: a window slightly smaller than the estimated license plate traversing the entire gradient image, the window fell gradient aggregate value of the highest position plate is the general location of the region.Search set to step 2 in order to expedite the search process. Window size can be estimated in accordance with image size and license plate size appropriate adjustments to plate slightly smaller than the estimated size suitable. To avoid misuse of information and cut loss, in accordance with the experience of value, the vehicle registration to the initial position from top to bottom and were about to expand, then shear image, in order to ensure the integrity of regional license plate. A rough plate image segmentation, as Figure 7 shown.
Fig. 7. The license plate region
3 Experimental Results and Conclusion We propose a method based on wavelet transform to locate the license plate region. This algorithm first decomposes and de-noises the image using Haar wavelet transform, and then calculates the vertical gradients through four components. All the vertical gradients of each detail are added to obtain a summative image. At last, we use a window slightly smaller than the estimated license plate traversing the entire gradient image; the window fell gradient aggregate value of the highest position plate is the general location of the region. In the experiment, 115 images are used and the size of the images is 512*384. All of them were acquired by the CCD camera from many kinds of conditions, such as different angles and different lightening conditions. We get license plate region from 106 images and 9 images failed. The rat is 92.2%.The experimental results show that our method is effective.
References 1. Li, J., Xie, M.: A Color and Texture Feature Based Approach to License Plate Location. In: International Conference on Computational Intelligence and Security 2007, pp. 376–380. IEEE, Los Alamitos (2007) 2. Liu, D., Xie, M.: Detecting License-plate Based on Color Edge From Complex Scenes. In: Signal and Image Processing, Novosibirsk, Russia (2005) 3. Hung, K.M., Chuang, H.L., Hsieh, C.T.: License Plate Detection Based on Expanded Haar Wavelet transform. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007) (2007) 4. Hsieh, J.W., Yu, S.H., Chen, Y.S.: Morphology-based License Plate Detection from Complex Scenses. In: 16th International Conference On Pattern Recognition (ICPR 2002) (2002)
446
J. Pan and Z. Yuan
5. Yang, F., Ma, Z.: Vehicle License Plate location Based on Histogramming and Mathematical Morphology. In: Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID 2005) (2005) 6. Chen, Y.N., Han, C.C., Wang, C.T., Jeng, B.S., Fan, K.C.: The Application of a Convolution Neural Network on Face and License Plate Detection. In: The 18th International Conference on Pattern Recognition (ICPR 2006) (2006) 7. Becerikli, Y., Olgac, A.V., Sen, E., Coskun, F.: Neural Network Based License Plate Recognition System. In: Proceedings of International Joint Conference on Neural Networks, Orlando, Florida, USA (2007) 8. Ganzalez, R., Woods, R.: Digital Image Processing, 2nd edn. Prentice Hall, Englewood Cliffs (2003)
Stereo Correspondence Using Moment Invariants Prashan Premaratne and Farzad Safaei School of Electrical, Computer & Telecommunications Engineering, The University of Wollongong, North Wollongong 2522, NSW, Australia [emailprotected]
Abstract. Autonomous navigation is seen as a vital tool in harnessing the enormous potential of Unmanned Aerial Vehicles (UAV) and small robotic vehicles for both military and civilian use. Even though, laser based scanning solutions for Simultaneous Location And Mapping (SLAM) is considered as the most reliable for depth estimation, they are not feasible for use in UAV and land-based small vehicles due to their physical size and weight. Stereovision is considered as the best approach for any autonomous navigation solution as stereo rigs are considered to be lightweight and inexpensive. However, stereoscopy which estimates the depth information through pairs of stereo images can still be computationally expensive and unreliable. This is mainly due to some of the algorithms used in successful stereovision solutions require high computational requirements that cannot be met by small robotic vehicles. In our research, we implement a feature-based stereovision solution using moment invariants as a metric to find corresponding regions in image pairs that will reduce the computational complexity and improve the accuracy of the disparity measures that will be significant for the use in UAVs and in small robotic vehicles.
1 Introduction Stereo vision is a mechanism for obtaining depth information from digital images. The challenge in stereovision is how to find corresponding points in the left image and the right image, known as the correspondence problem. Once a pair of corresponding points is found the depth can be computed using triangulation. There are two prominent approaches to finding such corresponding pairs namely, area-based and feature-based techniques. In the area based techniques, every pixel in a designated area of one image is compared with the pixels in the same row of the other image. This is done with few constraints such as maximum disparity to avert any false matches. Some of the well-known techniques in this approach are Hierarchical Block Matching [1], Census [2], Correlation Matching [3-4] and Zitnick-Kanade (Cooperative Algorithm for Stereo Matching and Occlusion Detection) [5-6] algorithms. The feature-based methods rely on finding special features in corresponding pairs and may result in fewer depth values lowering the computational complexity. Our approach is very much aimed at controlling small robotic vehicles using stereovision for depth calculation. This depth information will be used in control algorithms to detect and avoid obstacles. If this depth information is to be useful, they D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 447–454, 2008. © Springer-Verlag Berlin Heidelberg 2008
448
P. Premaratne and F. Safaei
need to be estimated in realtime which requires any stereovision algorithm to be less computationally expensive. With our great success in using moment invariants for recognizing hand gestures, moment invariants can be of great use in finding corresponding matching regions in stereo pairs [7-8]. Moment invariants are invariant to rotation, scale and shift and the rotation invariant property is especially beneficial to the stereo correspondence problem as any misalignment or non-flat ground conditions can create slightly rotated versions of any scene in any one of the cameras. In our approach, we rely on edgecorner detection algorithms such as Harris corner detection [8] to produce reliable feature points. This will result in fewer points of interest compared to area-based techniques [9-18]. An image can be separated to a collection of blocks and they can be marked as candidates or not depending on whether they occupy corners. These blocks can be matched with the help of moment invariants and disparity of the identified features can be simply calculated. In this paper, section 2 details the general stereo matching approaches and the moment invariant based technique is presented in detail in section 3. This is followed by our experimental results and the conclusion.
2 Stereo Matching Approaches In area based stereo matching, for a given pair of stereo images, the corresponding points are supposed to lie on the epipolar lines [19]. Area based techniques rely on the assumption of surface continuity, and often involve some correlation-measure to construct a disparity map with an estimate of disparity for each point visible in the stereo pair. Area based techniques produce much denser disparity maps, which is critical in obstacle detection and avoidance. Since corresponding points are the images of the same real point in the taken scene projected into both pictures, we can assume that their surroundings in both pictures will be quite similar. Area-based methods use this similarity for corresponding points detection [9-13]. It is computed from the difference in local neighborhoods (usually a constant size square) of the points. Computing the similarity of two points is the elementary step in the method and cannot be accelerated. The main problem is to look for the corresponding point in the picture. The naive area-based algorithm chooses a point from the first image, and run through all the points in the second image to find its corresponding point. This inefficient process can be accelerated by restraining the search area to a specific region around the corresponding pixel by specifying a maximum disparity. The most efficient method is adapted from Epipolar Geometry known as epipolar constraint. Area based methods are considered to be computationally expensive due to the exhaustive nature of the metrics being used. Sum of Squared Difference (SSD), Sum of Absolute Difference, Sum of Sum of Squared Difference (SSSD) and Cross Correlation based metrics use every pixel to calculate these metrics making them exhaustive. Feature-based stereo matching techniques focus on local intensity variations and generate depth information only at points where features are detected. In general, feature-based techniques provide more accurate information in terms of locating
Stereo Correspondence Using Moment Invariants
449
Stereopsis Approaches
Feature based techniques
Area based techniques
Block Matching Algorithm
Moment Invariant based technique Monogenic Phase Algorithm
Correlation Algorithm Census Algorithm Zitnick-Kanade Stereo Algorithm
Fig. 1. Stereopsis Approaches
Fig. 2. Left and Right stereo images with ‘corners’ marked using Harris corner detection
depth discontinuities and thus achieve fast and robust matching. However they yield very sparse range maps and may have to undergo expensive feature extraction process. Edge elements, corners, line segments, and curve segments are features that are robust against the change of perspective, and they have been widely used in many stereovision work. Features such as edge elements and corners are easy to detect however, they may suffer from occlusion whereas line and curve segments require extra computation time, but are more robust against occlusion. Higher level image features such as circles, ellipses, and polygonal regions have also been used as features for stereo matching, these features are, however, restricted to images of indoor scenes. Nevertheless, feature based techniques are associated with computation of conjugate pairs with subpixel accuracy. They will also include object-dependent constraints in the solution of the correspondence problem such as ‘corners’ when using Harris corner detection algorithm.
450
P. Premaratne and F. Safaei
3 Moment Invariant Based Stereo Matching Using invariant moments to locate corresponding features in stereo pairs will be less computationally intensive as the number of block comparisons will depend on the disparity constraint as well as number of features. In our approach, ‘corners’ will be used as features as shown in Fig. 2. Then the left image is divided into 20x20 pixel blocks and will be marked with occupying features. Then the blocks containing features will be used to calculate the first 4 moments using equations 3 to 6. Even though the moment invariants can calculate upto 7 such moments, 4 moments will be adequate to uniquely represent a square. Using epipolar and disparity constraints, we can now evaluate the adjoining 20x20 pixel blocks in the Right image for matches for the blocks containing the features. The ‘closeness’ of these moments will be decided using a threshold that is dependent on the image scenery. When such blocks are identified, the simple depth calculation formula can be used to calculate the depth to the identified feature as follows:
d =b
f D
Where d is the depth the object from the camera plane, f being the focus of the camera, D the disparity a b is the baseline distance. Here the underlying assumption is that the epipolar lines run parallel to the image lines, so that corresponding points lie on the same image lines. 3.1 Moment Invariants Moment invariants algorithm has been known as one of the most effective methods to extract descriptive feature for object recognition applications. The algorithm has been widely applied in classification of aircrafts, ships, ground targets, etc [20, 21]. Essentially, the algorithm derives a number of self-characteristic properties from a binary image of an object. These properties are invariant to rotation, scale and translation. Let f(i,j) be a point of a digital image of size M×N (i = 1,2, …, M and j = 1,2, …, N). The two dimensional moments and central moments of order (p + q) of f(i,j), are defined as: M
N
m pq = ∑∑ i p j q f (i, j )
(1)
i =1 j =1
M
N
U pq = ∑∑ (i − i ) p ( j − j ) q f (i, j )
(2)
i =1 j =1
Where
i=
m10 m00
j=
I1 K 20 K 02
m01 m00
(3)
(4)
Stereo Correspondence Using Moment Invariants
Where
K pq
451
φ 2 = (η 20 − η 02 ) 2 + 4η11 2
(5)
φ3 = (η 30 − 3η12 ) 2 + (3η 21 − η 03 ) 2
(6)
φ 4 = (η 30 + η12 ) 2 + (η 21 + η 03 ) 2
(7)
is the normalized central moments defined by:
η pq =
U pq U 00r
.
Image frames Left and Right
Harris Corner Detection on both frames
Divide the Left frame to 20x20 pixel blocks and if anyone of them contains ‘corners’, then calculate their invariant moments
Look for the matching blocks in Right image if features found in the Left block using disparity constraints.
Calculate the disparity for matching features
Fig. 3. Summary of the proposed technique
If certain blocks do not contain any features, the search area in the other image (Right) can be restricted using maximum disparity constraint. This will further cut down the computational requirements as opposed to area-based correlation techniques.
4 Experimental Results We ran the proposed algorithm on 100 stereo pairs generated using a BumbleBeeTM stereo rig mounted on a mobile robot producing 320x240 pixel frames. The path the robot covered had special markers to provide feature points as most of the cubicles of
452
P. Premaratne and F. Safaei
the indoor setup had monotonous flat featureless walls and partitions. This was followed by running correlation based stereo matching algorithm using SSD and SAD metrics. The results are shown in Fig. 4. The processing system comprised of a Pentium 4 system running at 2GHz with 2GB of RAM. The system was capable of processing 10 frames per second for both moment invariant based approach and correlation based technique using SAD metric however, it only managed 5 frames per second using SSD. This was expected as the SSD involved more computational complexity compared to SAD. However, it should be pointed out that the proposed approach computed handful of disparity values similar to Fig. 2. We also used the Zitnick-Kanade algorithm to estimate the disparity even though the results are not presented due to the inability of the algorithm to run anywhere near realtime with our modest processing power. Since this algorithm expects multiple iterations to refine its estimates, it is simply not useful for realtime applications that we are interested in. Fig. 3 summarizes the major steps in the proposed algorithm.
SSD
SAD
Moment Invariant
Fig. 4. Comparison of Moment Invariant technique with correlation method using SSD and SAD metrics
The major advantage of local approaches presented here is speed and suitability for hardware implementation. Global optimization algorithms commonly require 2 to 3 orders of magnitude more time than even the software implementations of local methods.
5 Summary In many respects feature based algorithms are established as the most robust way to implement stereo vision algorithms for the industrial-type stereo problems. The
Stereo Correspondence Using Moment Invariants
453
advantages offered by using features are that feature-based representations contain desirable statistical properties and provide algorithmic flexibility to the programmer. The flexibility is that algorithmic constraints can be applied explicitly to the data structures rather than implicitly as with area based correlation techniques. In particular, the use of ‘corners’ leads to algorithms which are as locally accurate as the precision to which the edges can be extracted. Even though, feature-based techniques do not produce denser disparity map, their values are more accurate than the area-based techniques. Some of the reasons for this is that presence of shadows produce erroneous results; some surfaces were nonuniformly reflecting light from; backgrounds are usually flat single-colored surfaces and some parts of the first image were occluded in one of the images. We managed to demonstrate that feature-based techniques relying on moment invariants for matching can process a frame in the order of tenths of seconds in software implementations. This implies that the algorithm can comfortably reach higher video rates using DSP and FPGA implementations [22]. At the moment, there is no technique for achieving simultaneously the high quality range obtained from global optimization with the fast run-times of local schemes.
References 1. Koschan, Q., Rodehorst, V., Spiller, K.: Color Stereo Vision Using Uierarchical Block Matching and Active Color Illumination. In: Proc. 13th Int. Conf. Pattern Recog., vol. 1, pp. 835–839 (1996) 2. Zabih, R., Woodfill, J.: Non-parametric Local Transformers for Computing Visual Correspondence. In: Third Eurpean Conf. Computer Vision (1994) 3. van Beek, J.C.M., Lukkien, J.J.: A Parallel Algorithm for Stereo Vision Based on Correlation. In: Proc. 3rd Int. conf. High Performance Computing (1996) 4. Hirschm utller, H., Innocent, P.R., Garibaldi, J.M.: Real-Time Correlation-Based Stereo Vision with Reduced Border Errors. Int. Journal of Computer Vision 47(1/2/3), 229–246 (2002) 5. Zitnick, C., Kanade, T.: A Cooperative Algorithm for Stereo Matching and Occlusion Detection. Tech. Report CMU-RI-TR-99-35, Robotics Institute, Carnegie Mellon University (October 1999) 6. Zitnick, C., Kanade, T.: A Cooperative Algorithm for Stereo Matching and Occlusion Detection. IEEE Trans. Pattern Analysis and Machine Intellig. 22(7), 675–684 (2000) 7. Premaratne, P., Nguyen, Q.: Consumer electronics control system based on hand gesture moment invariants. IET Computer Vision 1(1), 35–41 (2007) 8. Premaratne, P., Safaei, F., Nguyen, Q.: Moment Invariant Based Control System Using Hand Gestures: Book Intelligent Computing in Signal Processing and Pattern recognition. In: Book Series Lecture Notes in Control and Information Sciences, vol. 345, pp. 322–333. Springer, Heidelberg (2006) 9. Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: Proc. 4th Alvey Vision Conference, pp. 147–151 (1988) 10. Di Stefano, L., Marchionni, M., Mattoccia, S., Neri, G.: A Fast Area-Based Stereo Matching Algorithm. In: 15th IAPR/CIPRS International Conference on Vision Interface, Calgary, Canada, May 27-29 (2002)
454
P. Premaratne and F. Safaei
11. Fusiello, A., Roberto, V., Trucco, E.: Experiments with a New Area-Based Stereo Algorithm. In: International Conference on Image Analysis and Proceedings, Florence (1997) 12. Barnard, S.T., Thompson, W.B.: Disparity Analysis of Images. IEEE Trans. of PAMI PAMI-2, 4 (1980) 13. Hannah, M.J.: Bootstrap Stereo. In: Proc. Image Understanding Workshop (1980) 14. Hannah, M.J.: SRI’s Baseline Stereo System. In: Proc. of DARPA Image Understanding Workshop, pp. 149–155 (1985) 15. Hannah M.J.: A System for Digital Stereo Image Matching. In: Photogrammatic Engineering and RemoteSensing, pp. 1765–1770 (1989) 16. Burt, P., Julesz, B.: Modifications of the Classical notion of Panum’s Fusional Area. Perception 9, 671–682 (1980) 17. Lane, R.A., Thacker, N.A., Seed, N.L.: Stretch-Correlation as a Real-Time Alternative to Feature Based Stereo Matching Algorithms. Image and Vision Comp. Journal 12(4) (May 1994) 18. Lane, R.A., Thacker, N.A., Seed, N.L., Ivey, P.A.: A Stereo Vision Processor. In: Proc. of IEEE Custom IntegratedCircuits Conference (1995) 19. Weng, J.: Camera calibration with distortion models and accuracy evaluation. IEEE Trans. Patt. Anal. Machine Intel. 14, 965–980 (1992) 20. Premaratne, P.: ISAR ship classification, An alternative approach. CSSIP-DSTO Internal Publication, Australia (March 2003) 21. Zhongliang, Q., Wenjun, W.: Automatic ship classification by superstructure moment invariants and two-stage classifier. In: ICCS/ISITA 1992 Comm. on the Move, pp. 544–547 (1992) 22. van der Horst, J., van Leeuwen, R., Broers, H., Kleihorst, R., Jonker, P.: A Real-Time Stereo SmartCam, Using FPGA, SIMD and VLIW. In: Proc. of the 2nd Workshop on Applications of Computer Vision (May 12, 2006)
The Application of the Snake Model in Carcinoma Cell Image Segment Zhen Zhang1, Peng Zhang1, Xiaobo Mao1, and Shanzhong Zhang2 1
School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, China 2 Highway Administration Bureau of Zhengzhou, Zhengzhou 450052, China [emailprotected]
Abstract. Accurate cell nucleus segmentation is crucial for the development of automated cytological cancer recognition and diagnosis system. The paper proposes an improved Snake model for esophageal cell image based on the study of several main methods for esophageal cell image and analysis of their advantages and disadvantages. The novel cell nucleus segmentation method has been tested on a number of cell images obtained from esophageal smear slide and the results are encouraging. Experimental results show that the presented method performs well on both well-separated nuclei and some overlapped nuclei. Keywords: snake model, image segmentation, cell Image, cell boundary, esophageal cancer.
1 Introduction With a fast development of computer technology, the image processing, the pattern recognition and the computer visualization technology has more and more application in the medical field. It can combine the expert knowledge of pathologic diagnosis of cancer with accurate calculation and rapid processing capability of a computer system when the advanced technology is applied to the automated analysis and processing with the cell image. Simultaneously this technology can avoid subjective factor in the manual operation and provide reliable technical means for cancer diagnosis. As a result of the cell images’ complexity, the traditional method based on the bottom-pixel cannot extract cell’s profile accurately, so it’s better to unify kinds of new theories and the methods of image analysis to make a better progress. The analysis shows that the method based on the model can make a better progress than the other cell segment methods. It suits in the fuzzy borders or overlapping cells. Active contour mode which is also called Snake model was proposed by Kass and others in 1987. The Snake model has provided a global image segment method, which gives us a new idea of cell segment. There are some shortcomings such as high requirement of initial profile position, approximation of concave and pointed shape and so on in the basic Snake model. This paper uses the improved Snake model to segment the cell image, overcoming the shortcomings of the basic model effectively through the exterior energy sphere of action's expansion, the dynamic exterior strength's foundation, the algorithm strategy's adjustment and the deformation process's control. The experiment indicates that the results of this new model tally with the cell's actual boundary very well. D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 455–462, 2008. © Springer-Verlag Berlin Heidelberg 2008
456
Z. Zhang et al.
The rest of this paper is organized as follows. In Section 2, we carry out a theoretical analysis of basic Snake model and improved Snake model. The experimental results are given in Section 3 and Section 4 draws the conclusions. In Section 4, we sum up innovations of the paper.
2 Theoretical Analysis 2.1 Basic Snake Model The basic concepts of Snake model is to discover a parameterized contour curve in the image ,in this curve the weighted sum of internal energy and potential comes up to the minimum. The internal energy determined by the characteristic of contour curve has reflected its tension and rigidity. The potential determined by the characteristic of image which usually defined as the image gradient information achieves the partial minimum in the image edge and attracts the curve moving to the dominant character place. If we express the contour curve as
ν( s ) = ( x( s ), y ( s )), s ∈ [0,1] The curve will move in the image until the following energy function achieves the minimum:
E (ν ) = Ei (νi ) + Ep (νi ) In which
Ei (νi ) is to say the internal action energy of curve, and can be defined
as: 2
2
1 dv( s ) d 2ν ( s ) Ei (ν i ) = ∫ (α ( s ) )ds + β ( s) 20 ds ds 2 1
Ep (νi ) is the potential energy, usually defined as forms as image gradient information and so on. 1
Ep (ν i ) = ∫ (−γ ( s ) ∇I (ν ( s )) )ds 2
The model's computational process is to get the minimum process as follow: 2
2
dv( s ) d 2ν ( s ) 2 Esnake = ∫ [(α ( s ) + β (s) ) / 2 − γ ( s ) I (ν ( s )) ]ds 2 ds ds 0 1
The internal energy restrains its shape, the potential energy guides its behavior, and it trends to the goal boundary independently with the both driving forces. In order to realize the program of the snake model algorithm, it must be discretized. So we have the discrimination forms as follow: n −1
E = ∑ ( Ei(ν i ) + Ep (ν i )) i=0
The Application of the Snake Model in Carcinoma Cell Image Segment
457
In which
Ei (vi ) =
1 2 2 (α i ν i −ν i − 1 + β i vi − 1 + vi + 1 − 2vi ) 2 1
n
Ep (vi ) = ∫ −γ ∇I ( x, y ) ds = γ ∑ − ∇I ( x, y ) 2
2
i =1
2.2 Improved Snake Model In order to use the Snake model better, this paper do some pretreating operations according to the cell image quality, such as gray-scale transformation, median filtering, corrosion to obtain an image which will be in line with the requirements better. Subsequently obtains a more accurate initial contour using threshold segmentation. To improve the basic Snake model, we can consider the constituents of the model. (1)Use the Gaussian Function to expand the scope of exterior energy effect. (2) Expand the judge range of the boundary point grey level. (3) Improvement energy function
①
Elastic energy: Considered that the object boundary has certain length generally, calculates the Snake elastic energy with the equation as follows, 2
Eelastic = α i [l − vi − vi −1 ] l=
1 n −1 2 vi − vi −1 ∑ n i =0
in which, α is a coefficient, l is the apex mean distance. Rigid energy: The traditional Snake rigid energy and the curvature and the apex distance concerns. In order to remove the apex to be away from to the rigid energy influence uses Eviatar the rigid energy,
②
n −1
Erigidity = β ∑ θ i
2
− π ≤ θi ≤ π
i=0
In which,
θ i is included angle between two vector
which is
Vi −1 to Vi and Vi to
Vi +1 .
③
Area energy: Considered Snake model through the contraction (or inflation), approaches the target contour from the initial contour. During this process, Snake surrounds the area reduces (or increases). In order to enhance Snake to extract the contour of anomalous object, to expand the range of search, increased a new internal energy Earea which related surrounding area of target. Increased snake length L and the center of gravity C, its formula is:
458
Z. Zhang et al. n −1
n −1
i =0
i =0
Earea = k a ∑ Ai = k a ∑ Vi − C In which, k a is coefficient, its plus or minus decided the snake contraction or expansion. n −1
L = ∑ vi − vi −1
C = ( x, y )
i =0
x=
1 n−1 ∑ xi L i =0
y=
1 n −1 ∑ yi L i =0
Through the above improvement, the function of the image strength which is far away from the boundary is still small, and it also can’t examine the sunken region. What’s more, a noise is often miscarriaged as a key point, therefore the exterior action is considered. Analyzing the convex, concave, pointed shape and other circumstances, we can determine θ which is the angle between the exterior strength direction key point νi and the horizontal direction as follow:
θ = arccos
n forced to the
ν iν ij ν iν jk
And the size of the exterior strength can be defined as follow:
ν=
νi + 1 −νi νi + 1 −νi
Consider that there may be some special cases when using the exterior strength, for example, the exterior strength size establishment in unit of time, we must not only guarantee the speed of the key point migration, but also avoid the possible mistake of going beyond the target boundary. Establish a threshold valve
v max to make the dynamic exterior strength speed size
satisfyν i < ν max . In the algorithm aspect, in the three main energy minimum computational method of Snake model, the Williams’ greedy algorithm is with the smallest complexity in time and space at present. Considering that this project must realize a real-time examination system finally, therefore we select Williams’ greedy algorithm as Snake mode’s optimizational computation method. 2.3 Characteristics of Esophageal Cancer Cell In medicine, we usually inspect the following several representative characteristics for cancer cell's recognition: Cancer's cell nucleus color is very deep; the nucleus is quite
The Application of the Snake Model in Carcinoma Cell Image Segment
459
big; the karyoplasm is just opposite to the normal cell, the cancer cell's nucleus shape is anomalous and intranuclear has the thick kind of distribution. Esophageal cancer divides into the phosphorus cancer and the adenocarcinoma according to the pathology characteristic. The phosphorus cancer is more common, therefore this paper mainly researches phosphorus cancer cell's segment. The phosphorus cancer cell has the following characteristics: the difference of size is very big between the cancer cell, the cytoplasm is rich and with bright red color; the cell nucleus grow bigger obviously, abnormal, the karyoplasm compared to the inversion, the dyeing is deep, the high differentiation degree phosphorus cancer, the cell shape is relative regular, the cytoblastema are being obviously more. Fig.1 has given a normal and cancer cell image.
(a) Normal esophageal cells
(b) Cancerous esophageal cells
Fig. 1. Comparison between normal cell and the cancer cell
3 Analysis of the Experimental Results 3.1 Comparison and Evaluation of Performance The image must be performed by gray conversion, median filtering, all-eroded and so on. First, using the basic Snake model to carry on the segment, shown in Fig.2(b), then using the GVF model, shown in Fig.2(c), the last, using the improved Snake model, shown in Fig.2 (d). It can be seen from the results, we only get a poor quality image because there may be noise sources effecting on esophageal cancer cells image during esophageal cancer cells generate and deliver, which results in vague contour of the border, and can not distinguish it from the background well, so traditional image segmentation may be get a false profile. The basic Snake model approaches the concave shape, the acute angle shape as well as the anomalous shape region boundary with difficulty. May then see, the profile curve is obtained from the following chart, which has the big disparity comparing the real outline, especially in cell hollow region. But GVF model, because
460
Z. Zhang et al.
(a) The original picture
(c) Divided by GVF model
(b) Divided by the basic Snake model
(d) Divided by the improved snake model
Fig. 2. Comparison of segment result of several Snake models
of strengthened the exogenic process, the good solution approached questions and so on hollow region, but in the profile curve is easy to get in real outline, just like Fig.2 (c) to show. Observing segment result chart, we may see that the profile curve which can be obtained using the improved Snake model is accurate with the cell real profile to tally. Table 1. Comparison of correction rate of several Snake methods
Random selection test image 1 2 3 4 5 6 7
Basic Snake model (%) 78.18 79.18 87.58 75.42 83.80 74.65 85.94
GVF model (%) 93.27 92.62 93.50 91.89 93.74 91.98 92.79
Improved Snake mode (%) 96.22 98.41 98.36 97.14 97.86 97.72 97.35
The Application of the Snake Model in Carcinoma Cell Image Segment
461
3.2 Comparison of Several Snake Methods Comparison the contour line which is got from the proposed algorithm with the contour line which is got from the manual segment, we can calculate on both contour line picture element superposition integer CorrectPixelNum and the total picture element integer TotalPixelNum percentage ratio, namely equation below:
CorrectRate =
CorrectPixelNum × 100% TotalPixelNum
4 Conclusion and Prospect Experiments show the cell segment results using improved Snake model are identify with the actual cell border. The improvement of better external energy function, strengthening external force, extending the sphere of external energy action and increasing anti-noise capability overcomes the shortcomings effectively, such as high requirement of initial profile position, approximation of concave and pointed shape and so on in the basic model. The greedy algorithm used in this paper is fast, but it is difficult to access to the global optimal solution because current selection doesn’t depend on next selection. The DP algorithms raised by some scholars ensure an optimal solution as a whole, but lead to the problems of complexity in the process and low efficiency, so it’s not applicable to real-time systems. The new algorithm is a research direction in future, and auto-adaptively selecting the three parameters α , β , γ is a problem to be solved in future to ensure the commonality of system.
5 Innovations To overcome the lack of Snake model, the paper presents an innovational method. On the basis of improving external energy function, strengthening external force, extending the role of external energy and increasing anti-noise capability, the paper proposes and introduces new dynamic external force and determines the direction by convex, concave and pointed shape and so on. The definition of dynamic external force can effectively guide the key point campaign.
References 1. Kass, M., Witkin, A.P., Terzopoulo, S.D.: Snake: Active Contour Models. In: Process. First Int. Conf. Compute. Vision, London, pp. 259–268 (1987) 2. Kass, M., Witkin, A.P., Terzopoulos, S.D.: Snake: Active Contour Models. International journal on Computer Vision 1, 321–331 (1988) 3. Xu, C., Prince, J.L.: Snake Shapes and Gradient Vector Flow. IEEE Trans Image Proc. 7(3), 359–369 (1998) 4. Fan, J., David, K.Y.: Automatic Image Segmentation by Integrating Edge Extraction and Seeded Region Growing. IEEE Trans On Image Process. 10, 1454–1466 (2001)
462
Z. Zhang et al.
5. Dwi, A.: Cell Segmentation with Median Filter and Mathematical Morphology Operation. In: Proc. Of The IEEE 10th International Conference On Image Analysis And Processing, pp. 1034–1046 (1999) 6. Lam, K.M., Yan, H.: Fast Greedy Algorithm for Active Contours. Electronics Lett. 30, 21–22 (1994) 7. Yi, Y.: The New Image Segmentation Based on the Snake Model. First Military Medical University. 32–67 (2005) 8. Xue, D.J., Ping, X.J.: The New Image Segmentation Based on Geomatics Morphology. Journal of Information Engineering University 4(4), 88–92 (2003) 9. Hang, D., Yu, R.: The Snake Model Uses in Edge Detection. Journal of Shanghai Jiaotong University 34, 848–850 (2000) 10. Zhang, Y.J.: Image Segmentation Appraisal Classification and Comparison. Journal of Image and Graphics 5A(1), 39–43 (2000) 11. Zhao, B.J., Li, D.: The Improvement Snake Model for Complex Edge Detection. Journal of Beijing Institute of Technology 24(2), 162–165 (2004) 12. Wang, Y.Q., Jia, Y.D.: One New Herat Nuclear Magnetic Resonance Image Division Method. Chinese Journal of Computers 1, 129–135 (2007)
Data Clustering and Evolving Fuzzy Decision Tree for Data Base Classification Problems Pei-Chann Chang1, Chin-Yuan Fan2, and Yen-Wen Wang3,* 1
Department of Information Management, Yuan Ze University, Taoyuan 32026, Taiwan, R.O.C. 2 Department of Industrial Engineering and Management, Yuan Ze University, Taoyuan 32026, Taiwan, R.O.C. 3 Department of Industrial Engineering and Management, Chin Yun Tech. University, Taoyuan Taiwan, R.O.C. [emailprotected]
Abstract. Data base classification suffers from two well known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case based reasoning technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system for data classification in various data base applications. The model is major based on the idea that the historic data base can be transformed into a smaller case-base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller cases based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated by experimentally compared with other approaches on different data base classification applications. The average hit rate of our proposed model is the highest among others. Keywords: Fuzzy decision tree, Case Base Reasoning, Genetic Algorithm, classification, clustering.
1 Introduction The classification algorithms proposed so far have applied various classifier representations and different learning methods, but relatively few researchers have considered about data pre-clustering before classifying the data. The major incentive by data preclustering is to decompose the original data base into a set of more homogeneous smaller data base. Therefore, data within each sub-data base is with higher similarity and the classifying methods applied will have a better performance expected. The objective of this research is to develop a novel hybrid fuzzy decision tree model which combines a data clustering and a fuzzy decision tree model to classify different kinds of data base. Those data base includes Iris, Breast cancer, Liver Disorders,wine and Contraceptive Method choice (CMC). The hybrid model will be compared with *
Corresponding author.
D.-S. Huang et al. (Eds.): ICIC 2008, CCIS 15, pp. 463–470, 2008. © Springer-Verlag Berlin Heidelberg 2008
464
P.-C. Chang, C.-Y. Fan, and Y.-W. Wang
other classification models such as Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Back Propagation Neural Networks (BPN), and K-Nearest Neighbor (KNN) methods, our contribution is to estimate the data base hit-rate by extracting and analyzing available data with an appropriate clustering approach and a fuzzy decision tree generation procedure.
2 Literature Survey Data Clustering has been applied in many different fields such as data mining0, image compression 0 and information retrieval 0. There are many researches try to improve clustering efficiency. 0 uses several kinds of clustering technique, including Machine learning , statistics and computer science, and illustrated their applications in some benchmark data set, like T.S.P problems and bioinformatics. Other researchers also try to combine clustering technique with other forecasting method to improve their efficiency. 0 0 have successfully integrated K-mean, Self Organizing Maps (SOM), fuzzy C mean, with Neural Network, Wang & Mendel Fuzzy rule system and T.S.K Fuzzy system to improve the overall system performances. Recently, a case based clustering model is developed by 0 which propose a methodology of maintaining Case Based Reasoning (CBR) systems by using fuzzy decision tree 0 induction - a machine learning technique. The methodology is mainly based on the idea that a large case library can be transformed to a small case library together with a group of adaptation rules, which are generated by fuzzy decision trees. The approach is quite different from traditional clustering methods and many attempts 0 have applied this new technology into data base classification. Many data mining techniques exist for pattern classification 0 and for time series identification 0. However, the classification accuracy of these models is often limited when the relationships of the input/output dataset are complex and/or non-linear 0. In such situation, which are frequently found in real world problems, machine learning methods are more suitable for building simple and interpretable pattern classification models 0. The most common models are 0: Bayesian networks 0, neural networks 0, rough sets 0, decision trees0 and heuristic algorithm classifiers 0.
3 Development of a CBR-Based Fuzzy Decision Tree This model is develop by a CBR-Based data clustering and a fuzzy decision tree for data base classification judgment. This classification model integrates a data
Fig. 1. A Detailed Model for CBR-FDT
Data Clustering and Evolving FDT for Data Base Classification Problems
465
clustering technique, a Fuzzy Decision Tree (FDT), and Genetic Algorithms (GA) to construct a decision-making system based on historical data. The framework of CBRFDT is shown in Figure 1. 3.1 The Selection of Different Data Bases in UCI Dataset Library In this paper, our research team chooses popular data set in UCI database library; this database includes Iris, Wine, Liver Disorders, Breast Cancer Wisconsin and Contraceptive Method choice. Details of each database will be discussed in Table 4. 3.2 A Case Based Weighted-Clustering Method In this research, A case-based weighted –clustering method is applied to develop the weighted distance metric and a similarity measure used in the following. Phase one: Finding weighted feature values from important Medical data set Indices In this step, the gradient method is applied to find the weighted values from important data set Indices and a feature evaluation function is defined. The smaller is the evaluation value, the better are the corresponding features. Thus we would like to find the weights such that the evaluation function attains its minimum. The detail processes can be described as follows figures: Step 1. Select the parameter Step 2. Initialize
wj
α
and the learning rateη .
with random values in [0, 1].
Step 3. Compute Δw j for each j using equation Δw
∂E
j =−η ∂w j
In this equation, E is defined as equation ⎡ 2*⎢∑ ∑ ⎛⎜ SM (pqw) 1− SM1pq + SM1pq 1− SM (pqw) pq ( q< p) ⎝ Ew= ⎣
( )
(
)
N *( N −1)
base Step 4. Update w j with w j
+ Δw j
(
) ⎞⎟⎠⎤⎥⎦
Where N is the number of cases in the SL
for each j.
Step 5. Repeat step 3 and step 4 until convergence, i.e., until the value of E becomes less than or equal to a given threshold or until the number of iterations exceeds a certain predefined number.
Fig. 2. A Detailed steps for Phase one
Phase two: Dividing the Data base (includes 4 different data bases in UCI dataset library) into Several Clusters This section attempts to partition the Data base Case library into several clusters by using the weighted distance metric with the weights learned in previous section. This approach first transforms the similarity matrix to an equivalent matrix and then considers the cases being equivalent to each other as one cluster. The detail processes can be described as follows:
466
P.-C. Chang, C.-Y. Fan, and Y.-W. Wang
Step 1. Give a significant level (threshold)
β ∈ ( 0,1]
Step 2. Determine the similarity matrix SM = ( SM (
w)
pq
Step 3. Compute
) according to equation (2) and (1)
( (
( w)
( w)
pk
kq
SM1 = SM . SM=( spq) Where s = max min sm , sm pq
k
))
Step 4. If SM 1 ⊂ SM then go to step 5, else replace SM with SM1 and go to step 3. Step 5. Determine several clusters based on the rule “case p and case q” belong to the same
Fig. 3. A Detailed steps for Phase Two
Through a series of weighted and clustering steps, the case library will be divided into different sub-clusters. The data in each sub-cluster will have a more similar pattern in terms of the features and action of each data point. Then, these case bases are ready for further application in Data base classification 3.3 A Fuzzy Decision Tree Classification Model This research applies a CBR-Based clustering method with Fuzzy Decision Trees (FDT) to develop a classification model for different Data base application. The framework of FDT is depicted in Fig. 4. Data Fuzzification In Fuzzy set theory, Membership function is one of the basic concepts, through this concept one will be able to process quantitative fuzzy set data, and dispose of fuzzy message. How to find an apropos membership function to approach quantitative fuzzy set data and dispose of fuzzy message becomes very important in fuzzy set theory. However, there are not exist one perfect rule to adopt all kinds of fuzzy set data. Researchers always consider different problems with different membership function; the most used membership function includes Triangle membership functions, Trapezoid membership functions, and Gauss membership functions. This research will adopt Triangle membership functions as our primary membership functions. ID3 decision tree The ID3 decision tree learning algorithm computes the Information Gain G based on each attribute A, and it is defined as follows: G (S , A) = Entropy (S ) −
∑
v∈values ( A)
Sv S
Entropy(Sv ),
(1)
where S is the total input space and Sv is the subset of S for which attribute A has a c
value v. The Entropy (S) over classes is given by ∑ − pi log 2 ( pi ) , where pi represents i =1
the probability of class “i.” The attribute with the highest information gain, says B, is chosen as the root node of the tree. Next, a new decision tree is recursively constructed over each value of B using the training subspace S − {S } .A leaf-node or a B
Data Clustering and Evolving FDT for Data Base Classification Problems
467
Fig. 4. A Framework of CBR-FDT Model
decision-node is formed when all the instances within the available training subspace are from the same class. For detecting anomalies, the ID3 decision tree outputs binary classification decision of “0” to indicate normal and “1” to indicate anomaly class assignments to test instances. Evolving Fuzzy Decision Tree by Genetic Algorithm Genetic Algorithm will be used in this stage to improve the accuracy of FDT (fuzzy decision tree) in Data base classify forecasting. Genetic Algorithms will find the best number of fuzzy terms of every input data (Different kinds of data base), and then the fitness function will be re-calculated after each new number of fuzzy terms. In this research, fitness functions is the forecasting accuracy of Different database. 3.4 The Judgment of Output Value
This research mainly applies evolutional fuzzy decision trees to classify the different case in UCI database library. The judgment of this result will be based on the classification accuracy.
4 Experimental Results In this research, four datasets were employed to validate our method. These data sets, named Iris, Wine, BUPA Liver Disorders, Breast Cancer Wisconsin and Contraceptive Method choice covering examples of data of low, medium and high dimensions. All data sets are available at UCI machi