Remote Sensing Scene Classification Based on Convolutional Neural Networks Pre-Trained Using Attention-Guided Sparse Filters
Semantic-level land-use scene classiﬁcation is a challenging problem, in which deep learning methods, e.g., convolutional neural networks (CNNs), have shown remarkable capacity. However, a lack of sufﬁcient labeled images has proved a hindrance to increasing the land-use scene classiﬁcation accuracy of CNNs. Aiming at this problem, this paper proposes a CNN pre-training method under the guidance of a human visual attention mechanism. Speciﬁcally, a computational visual attention model is used to automatically extract salient regions in unlabeled images. Then, sparse ﬁlters are adopted to learn features from these salient regions, with the learnt parameters used to initialize the convolutional layers of the CNN. Finally, the CNN is further ﬁne-tuned on labeled images. Experiments are performed on the UCMerced and AID datasets, which show that when combined with a demonstrative CNN, our method can achieve 2.24% higher accuracy than a plain CNN and can obtain an overall accuracy of 92.43% when combined with AlexNet. The results indicate that the proposed method can effectively improve CNN performance using easy-to-access unlabeled images and thus will enhance the performance of land-use scene classiﬁcation especially when a large-scale labeled dataset is unavailable.
Open access article
Citation : Chen, J., Wang, C., Ma, Z., Chen, J., He, D. and Ackland, S. (2018) Remote Sensing Scene Classification Based on Convolutional Neural Networks Pre-Trained Using Attention-Guided Sparse Filters. Remote Sensing, 10(2), p.290.
Research Group : Centre for Computational Intelligence
Peer Reviewed : Yes