An Optimized Hybrid Techniques of Training set reduction for Performance Improvement of k-Nearest Neighbour Classifier
Abstract
In non-parametric algorithms such as k-nearest neighbour the fundamental predicaments are the larger storage and computational requirements. Moreover, the effectiveness of classification task affected significantly due to uneven distribution of training data. To overcome the drawbacks of lazy learner likek-nearest neighbour classifier, the scope of training set reduction by editing and condensing the training set is explored in this research work. Additionally, the reduction of training set is carried out by hybrid techniques of training set reduction namely TSR-FkNN (Elbow method) and TRS-FkNN (Silhouette value) in optimized way to achieve improvement of classification performance.
Keywords
Download Options
Introduction
In Machine learning (ML) algorithm like the simplek-nearest neighbour (k-NN) classifier the input training set consists of vectors and associated class labels [1]. This training set is used in training phase of ML task and size of the input training set is not changed while taken as input [2]. The ML algorithm calculates the distance between a new input test vector and each vector of the stored training set then assigns a class label to the test vector [3]. Hence, the k-NNclassifier requires a large amount of memory to store the training dataset and a large amount of time required to execute this algorithm, because in contrast to parametric classification algorithms where parameters are learned from training set and algorithm uses these parameters to compute similarity measure, the non-parametric classifiers stores all training instance [4]. Since non-parametric classifiers stores all training instances, it motivated us to find the solution to reduce time and space of k-NNclassifier. There are a few solutions to this problem which are feature selection, training set reduction by removing noisy and unimportant training instances [5]. In this research work, we have evaluated hybrid training set reduction techniques with optimization. These techniques are Training set reduction Fastk-NN by applying SSE (TSR-FkNN, Elbow method) and Training set reduction Fastk-NN by applying silhouette value (TSR-FkNN, silhouette value).
The evaluation of above approaches is carried out on agriculture soil health card dataset. And results suggest that the effectiveness of above approaches is significant to existing methods. This paper is organized as follows. Section 2 presents the background about the research topic. Section 3 is covering proposed research work. Section 4 is about comparison of all methods and analysis. Section 5 is concluding this research work.
Conclusion
5.1 Storage reduction: Storage requirement in k-NN is very high in comparison to other algorithms.
For TSR-FkNN (Elbow method) storage requirement is lowest when the value of k is 33 and 35 respectively followed TSR-FkNN (Silhouette value). Hence, in terms of storage TSR-FkNN is efficient. 5.2 Execution time: Execution time is highest in k-NN as it store more number of instances for training purpose. Execution time is lowest in TSR-FkNN (Elbow method) as it store less number of training instances. 5.3 Generalization accuracy, precision, recall and F1 measure: Generalize accuracy of TSR-FkNN (Silhouette value) is highest compared to other algorithms hence in terms of accuracy TSR-FkNN (Silhouette value) is recommended.
In terms of Time, Space and Accuracy comparisons. The proposed novel hybrid algorithm TSR-FkNN (Silhouette value) is the best algorithm hence it can be recommended for classifying soil samples in respective nutrients deficiencies category.