-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
282 lines (258 loc) · 16.5 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>ASP</title>
<!-- <link rel="stylesheet" href="http://cdn.static.runoob.com/libs/bootstrap/3.3.7/css/bootstrap.min.css"> -->
<link rel="stylesheet" href="bootstrap/css/bootstrap.min.css">
<!-- <link rel="stylesheet" type="text/css" href="css/mystyle.css"> -->
<script src="http://cdn.static.runoob.com/libs/jquery/2.1.1/jquery.min.js"></script>
<link rel="stylesheet" href="bootstrap/js/bootstrap.min.js">
</head>
<body>
<div class="container">
<div class="content">
<h1 style="text-align:center; margin-top:60px; font-weight: bold">
<!-- Aerial Image Recognition: A Revisit From Tile-level Scene Classification to Pixel-level Semantic Parsing -->
Aerial Scene Parsing: From Tile-level Scene <br> Classification to Pixel-wise Semantic Labeling
</h1>
<p style="text-align:center; margin-bottom:15px; margin-top:20px; font-size: 18px">
<a href="http://www.captain-whu.com/longyang_En.html" target="_blank">Yang Long<sup>1</sup></a>,
<a href="http://www.captain-whu.com/xia_En.html" target="_blank">Gui-Song Xia<sup>2,1,*</sup></a>,
<!-- <a href="http://people.ucas.ac.cn/~shyli" target="_blank">Shengyang Li<sup>3</sup></a>, -->
<!-- <a href="http://www.captain-whu.com/yangwen.html" target="_blank">Wen Yang<sup>2,3</sup></a>, -->
<!-- <a href="https://sites.google.com/site/michaelyingyang/home" target="_blank">Michael Ying Yang<sup>5</sup></a>, -->
<!-- <a href="https://www.sipeo.bgu.tum.de/team/zhu" target="_blank">Xiao Xiang Zhu<sup>6</sup></a>, -->
<a href="http://www.lmars.whu.edu.cn/prof_web/zhangliangpei/rs/index.html" target="_blank">Liangpei Zhang<sup>1</sup></a>,
<a href="https://www.sipeo.bgu.tum.de/team/zhu" target="_blank">Gong Cheng<sup>3</sup></a>,
<a href="http://www.lmars.whu.edu.cn/prof_web/prof_lideren/ldryinwenjl.htm" target="_blank">Deren Li<sup>1</sup></a>.
</p>
<p style="text-align:center; margin-bottom:15px; margin-top:20px; font-size: 15px;font-style: italic;">
1. State Key Lab. LIESMARS, Wuhan University, Wuhan 430079, China <br>
2. School of Computer Science, Wuhan University, Wuhan 430079, China <br>
3. School of Automation, Northwestern Polytechnical University, Xi'an 710072, China <br>
<!-- 3. Key Laboratory of Space Utilization, Technology and Engineering Center for Space Utilization, Chinese Academy of Sciences, Beijing 100094, China <br> -->
<!-- 4. School of Electronic Information, Wuhan University, Wuhan 430072, China <br> -->
<!-- 5. Faculty of Geo-Information Science and Earth Observation, University of Twente, Hengelosestraat 99, Enschede, Netherlands <br> -->
<!-- 6. German Aerospace Center (DLR) and also Technical University of Munich, Germany -->
</p>
</div>
<!-- <br><hr> -->
<div class="row">
<div class="span6 offset2">
<ul class="nav nav-tabs">
<br />
</ul>
</div>
</div>
<table style= "width:100%;" align="center">
<tr>
<td style="text-align: center;"><a href="#RoadMap" ><img src="files/RoadMap-Head.jpg" class="img-responsive center-block"/> <br> Road map</a></td>
<td style="text-align: center;"><a href="https://captain-whu.github.io/DiRS/" target="_blank"><img src="files/Million-AID-Head.jpg" class="img-responsive center-block" /> <br> Million-AID</a></td>
<td style="text-align: center;"><a href="https://arxiv.org/pdf/2201.01953.pdf" target="_blank"><img src="files/Paper-Header.png" class="img-responsive center-block" /> <br> Paper</a></td>
<td style="text-align: center;"><a href="files/ASP-PPT.pdf" target="_blank" ><img src="files/PPT-Head.png" class="img-responsive center-block" /> <br> PPT</a></td>
</tr>
</table>
<div class="row">
<div class="span12">
<h2 style="text-align:left; margin-bottom:10px; margin-top:20px; ">
- Introduction -
</h2>
<p style="text-align:justify; font-size: 17px">
Given an aerial image, aerial scene parsing (ASP) targets to interpret the semantic structure of the image content, <i>e.g.</i>,
by assigning a semantic label to every pixel of the image. With the popularization of data-driven methods, the past decades have
witnessed promising progress on ASP by approaching the problem with the schemes of <i>tile-level scene classification</i> or
<i>segmentation-based image analysis</i>, when using high-resolution aerial images. However, the former scheme often produces results
with tile-wise boundaries, while the latter one needs to handle the complex modeling process from pixels to semantics, which often
requires large-scale and well-annotated image samples with pixel-wise semantic labels. In this paper, we address these issues in aerial
scene parsing, with perspectives from tile-level scene classification to pixel-wise semantic labeling. Specifically, we first revisit
aerial image interpretation by a literature review. We then present a large-scale scene classification dataset that contains one million
aerial images termed Million-AID. With the presented dataset, we also report benchmarking experiments using classical convolutional neural
networks (CNNs). Finally, we perform ASP by unifying the tile-level scene classification and object-based image analysis to achieve pixel-wise
semantic labeling. Intensive experiments show that Million-AID is a challenging yet useful dataset, which can serve as a benchmark for evaluating
newly developed algorithms. When transferring knowledge from Million-AID, fine-tuning CNN models pretrained on Million-AID perform consistently
better than those pretrained ImageNet for aerial scene classification, demonstrating the strong generalization ability of the proposed dataset.
Moreover, our designed hierarchical multi-task learning method achieves the state-of-the-art pixel-wise classification on the challenging GID, which
is a profitable attempt to bridge the tile-level scene classification toward pixel-wise semantic labeling for aerial image interpretation. We hope
that our work could serve as a baseline for aerial scene classification and inspire a rethinking of the scene parsing of high-resolution aerial images.
</p>
</div>
</div>
<br>
<div class="row">
<div class="span12">
<h2 id="RoadMap" style="text-align:left; margin-bottom:10px; margin-top:20px;">
- Revisiting Aerial Image Interpretation -
</h2>
<p style="text-align:justify; font-size: 17px">
With the progress of sensor technology, the spatial resolution of aerial images has witnessed a continuously improvement, which has also greatly promoted the development of aerial image interpretation. Consequently, the interpretation tasks of aerial images has experienced a long course from pixel-wise image classification, segmentation-based image analysis, to tile-level image understanding, relying on the visual characteristics of aerial images with different resolutions.
</p>
<img src="files/RoadMap.png" width="80%" class="img-responsive center-block" />
</div>
</div>
<br>
<div class="row">
<div class="span12">
<h2 id="scbm" style="text-align:left; margin-bottom:10px; margin-top:20px;">
- Scene Classification: A New Benchmark on Million-AID -
</h2>
<p style="text-align:justify; font-size: 17px">
Data-driven algorithms represented by deep learning have been reported with overwhelming advantages over the conventional classification methods based on handcrafted features, and thus, dominated aerial image recognition in recent decade. In this section, we train a number of representative CNN models and conduct comprehensive evaluations for multi-class and multi-label scene classification on <a href="https://captain-whu.github.io/DiRS/"><strong><u>Million-AID</u></strong></a>, which we hope to provide a benchmark for future researches.
</p>
<h4 style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold; font-style: italic">
- Million-AID
</h4>
<img src="files/MAID-Samples.png" width="750px" class="img-responsive center-block" />
<br>
<h4 id="mcbm" style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold; font-style: italic">
- Multi-class classification
</h4>
<img src="files/Multi-class-classification.svg" width="65%" class="img-responsive center-block" />
<br>
<h4 id="mlbm" style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold; font-style: italic">
- Multi-label classification
</h4>
<img src="files/Multi-label-classification.svg" width="97%" class="img-responsive center-block" />
</div>
</div>
<br>
<div class="row">
<div class="span12">
<h2 id="Million-AID" style="text-align:left; margin-bottom:10px; margin-top:20px;">
- Transferring Knowledge From Million-AID -
</h2>
<p style="text-align:justify; font-size: 17px;">
Million-AID consists of large-scale aerial images that characterize diverse scenes. This provides Million-AID with rich semantic knowledge of scene content. Hence, it is natural for us to explore the potential to transfer the semantic knowledge in Million-AID to other domains. To this end, we consider two basic strategies, <i>i.e.</i>, fine-tuning pre-trained networks for tile-level scene classification and hierarchical multi-task learning for pixel-level semantic parsing.
</p>
<h4 style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold; font-style: italic">
- Fine-tuning pre-trained networks for scene classification
</h4>
<p style="text-align: center; font-size: 17px;">
Classification Accuracy (%) on AID Dataset Using Different Training Schemes
</p>
<img src="files/Fine-tuning4AID.svg" width="88%" class="img-responsive center-block" />
<br>
<p style="text-align: center; font-size: 17px;">
Classification Accuracy (%) on NWPU-RESISC45 Dataset Using Different Training Schemes
</p>
<img src="files/Fine-tuning4RESISC45.svg" width="88%" class="img-responsive center-block" />
<br>
<h4 style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold; font-style: italic">
- Hierarchical multi-task learning for semantic parsing
</h4>
<p style="text-align:justify; font-size: 17px;">
The conventional CNN learns scene features via stacked convolutional layers and the output of the last fully connected layer is usually employed for scene representation. However, learning stable features from single layer can be a difficult task because of the complexity of scene content. Moreover, data sparsity which is a long-standing notorious problem can easily lead to model overfitting and weak generalization ability because of the insufficient knowledge captured from limited training data. To relieve the above issues, we introduce a hierarchical multi-task learning method and further explore how much the knowledge contained in Million-AID can be transferred to boost the pixel-level semantic parsing of aerial images. To this end, the <a href="https://captain-whu.github.io/GID/"><strong><u>GID</u></strong></a>, which consists of training set with tile-level scenes and large-size test images with pixel-wise annotations, has provided us an opportunity to bridge the tile-level scene classification toward pixel-level semantic parsing. Generally, the presented framework consists four components, <i>i.e.</i>, hierarchical scene representation, multi-task scene classification (MSC), hierarchical semantic fusion (HSF), and pixel-level semantics integration as shown in below.
</p>
<img src="files/HMTL.svg" width="87%" class="img-responsive center-block" />
<br>
<h5 style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold; font-style: italic">
- Qualitative comparisons among different classification schemes
</h5>
<img src="files/Ablation.png" width="87%" class="img-responsive center-block" />
<p style="text-align:justify; font-size: 16px; font-family: Times; width: 87%; margin-left: 6.5%">
Images in the first to fifth columns indicate the original image, ground truth annotations, classification maps of baseline, MSC, and the full implementation of our method, respectively.
</p>
<br>
<h5 style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold; font-style: italic">
- Performance comparison among different methods
</h5>
<img src="files/SOTA-GID.svg" width="47.2%" class="img-responsive center-block" />
<br>
<h5 style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold; font-style: italic">
- Visulization of classification results
</h5>
<img src="files/CMSOTA.png" width="87%" class="img-responsive center-block" />
<p style="text-align:justify; font-size: 16px; font-family: Times; width: 87%; margin-left: 6.5%">
Visualization of the land cover classification results on the <i>fine classification set</i> of GID. Images in the first to fourth columns indicate the original image, ground truth annotations, classification maps of PT-GID, and classification maps of our method, respectively.
</p>
</div>
</div>
<br>
<div class="row">
<div class="span12">
<h3 id="Download" style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold">
Download
</h3>
<p style="text-align:justify; font-size: 17px">
For the construction of Million-AID, please refer to the second item of the following citations.
</p>
<ul>
<li style="font-size:17px">
<a href="https://whueducn-my.sharepoint.com/:f:/g/personal/longyang_whu_edu_cn/Et-SJsQYQRxMh63Z59iFyH0Bl7nPamTEj4ZQ9GZ1Ch1Ueg?e=ySI1Nt" target="_blank">
Million-AID Download
</a>
</li>
<!-- <li style="font-size:17px">
<a href="">Codes</a> (<i>Coming soon ...</i> )
</li> -->
</ul>
<h3 id="Evaluaton" style="text-align:left; margin-bottom:10px; margin-top:10px; font-weight: bold">
Evaluation
</h3>
<p style="text-align:justify; font-size: 17px">
A public evaluation platform for the multi-class and multi-label scene classification based on Million-AID.
</p>
<ul>
<li style="font-size:17px">
<a href="" target="_blank">
Evaluation Server for Multi-class Scene Classification
</a>
</li>
<li style="font-size:17px">
<a href="" target="_blank">
Evaluation Server for Multi-label Scene Classification
</a>
</li>
</ul>
<div class="section bibtex">
<h3 style="text-align:left; margin-bottom:10px; margin-top:20px; font-weight: bold">
Citation
</h3>
<pre>
@misc{Long2022ASP,
title={Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling},
author={Yang Long and Gui-Song Xia and Liangpei Zhang and Gong Cheng and Deren Li},
year={2022},
eprint={2201.01953},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
</pre>
<br>
<pre>
@article{Long2021DiRS,
title={On Creating Benchmark Dataset for Aerial Image Interpretation: Reviews, Guidances and Million-AID},
author={Yang Long and Gui-Song Xia and Shengyang Li and Wen Yang and Michael Ying Yang and Xiao Xiang Zhu and Liangpei Zhang and Deren Li},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
year={2021},
volume={14},
pages={4205-4230}
}</pre>
</div>
<h3 style="text-align:left; margin-bottom:10px; margin-top:20px; font-weight: bold">
Contact
</h3>
<p>
If you have any problem, please contact:
<ul>
<li>Yang Long at <strong>[email protected]</strong></li>
<li>Gui-Song Xia at <strong>[email protected]</strong></li>
</ul>
<br />
<br />
<br />
</div>
</div>
<!-- <div class="row">
<div style="text-align:center; margin-top:0; margin-bottom: 20px;">
<embed id="map" src="http://rf.revolvermaps.com/f/f.swf" type="application/x-shockwave-flash" pluginspage="http://www.macromedia.com/go/getflashplayer" wmode="transparent" allowScriptAccess="always" allowNetworking="all" width="150" height="75" flashvars="m=0&i=5dp1mfnunae&r=10&c=fffdc0" loop="true" autostart="False"></embed>
<img class="img-responsive center-block" src="http://rf.revolvermaps.com/js/c/5dp1mfnunae.gif" width="1" height="1" alt="" value="True"/>
<a style="font-size: x-small;"> Copyight@2020, Captain</a>
<a href="http://www.revolvermaps.com/livestats/5dp1mfnunae/"></a>
</div>
</div> -->
</body>
</html>