Reinforcement Learning (Q-Learning) Path finding: Mountain Climbing Route, From Digital Elevation Model (DEM)
by: Moien Rangzan and Matineh Rangzan
It took humans thousands of years of experience and countless lives to find the optimum paths to ascend mountains. Finding the best route for an unexplored mountain requires competent mountain climbers and a lot of supervised surveying, which can be costly and dangerous. For planets other than the earth, the situation is even worse.
What I wanted to try in this project was to achieve an unsupervised method to find this path with the most available and accessible Remote Sensing data, the Digital Elevation Model (DEM).
-
Downloaded SRTM for Mount Damavand from Earth Explorer - I didn't know about Geemap by that time.
-
Asked my friend, a professional mountain climber, about Mount Damavand, and he suggested the southern ascending path, so I clipped only the south part of Damavand using Raster Clip to reduce computation.
-
Again, with his assistance, I added four locations to aid the Q-Learning agent find his way, including a starting node, a mountain summit, and two resting sites that mountain climbers often visit. and convert them to a raster of the same size and spatial resolution as our SRTM area.
-
Then calculated Slope and Aspect form the DEM, to further help the Agent.
There are a ton of youtube videos explaining this Algorithm, so I'm not translating my project description to here :))
But here is the Sudo:
Initialize
$\mathbf{Q}(\mathbf{s}, \mathbf{a})$ for all$\mathbf{s} \in \mathcal{S}$ and$\mathbf{a} \in \mathcal{A}(\mathbf{s})$
Loop forever:
(a)
$S \leftarrow$ current (nonterminal) state
(b)
$\boldsymbol{E} \leftarrow \boldsymbol{\varepsilon}$ -greedy$(\boldsymbol{S}, \boldsymbol{Q})$
(c) Take action A; observe Resultant reward,
$R$ , and state,$S^{\prime}$
(d)
$Q(S, A) \leftarrow Q(S, A)+\alpha\left[R+\gamma \max _a Q\left(S^{\prime}, a\right)-Q(S, A)\right]$
The Agent can go Up-Down-Left-Right
- Every move has a punishment unless otherwise.
- We calclulate the
reward
for each action, based on itsneighboring cells
:
if current elevation
else if current elevation
Notice the denominator, going down the hill has ten times more punishment than going uphill; we want our agent to ascend unless it is necessary
we chose 100 in denominator since we ran a 3x3
kernel and maximum difference in elevation between two adjacent pixels were 99.0, so the punishment is somthing betwen 0
and -1
The idea is that the steeper the slope, the higher the chance our agent will fall. I chose an exponential function with the power of 10 for this. You can see after the slope of 60
percent that the chance of falling down increases immensely
The aspect comes in if the agent falls; if the aspect is, for example, towards the south, and the agent decides to go right, but he falls, he will fall into the northern pixel.
The edge pixels are dead zones, and has a reward of -100
The summit and our two Resting points has a rewarad
of +100
For the ground truth, I created a line feature using Arcgis from the Google Base Map, then converted it into a raster.
The intersection of ground truth and the final results were used to show the matching paths.
I buffed the ground truth line by 5 pixels to better show how close our results and the ground truth are.
To be honest, I didn't expect such good results as I was using SRTM 30m DEM and no other complimentry data like snow and rain data. Hopefully, with a higher resolution DEM and more complex inputs, this algorithm is a small step for a man :))