A* with an RL Heuristic
A research team from an Israel-based company Imagry announced at the annual NVIDIA's GPU Technology Conference (GTC) having developed a deep-learning powered A* modification. They call it א* (Aleph*). Here's why their approach seems interesting:
- א* uses an RL-based heuristic. This solves the key pain point of the A* algorithm - developing heuristics that are effective and generalize well is hard. Learning them from experience to reduce the search tree is the way to go.
- א* draws inspiration from AlphaGo Zero. א* though replaces the MCTS with A* to reduce the explored tree size. Both the accumulated reward and the expected reward are taken into account to make search decisions.
- They also mention that A* solutions are optimal. MCTS approaches optimality yet there's no proof for that. The former issue inspired them to use the good old A* instead of relying on sampling techniques. Though, heuristic has to be admissible and consistent. This is a bummer. Let me explain. It looks like they engineered the heuristic to eventually converge to making optimal forecasts, as the algorithm maximizes the future reward. The reward R = 1/cost in א*. Equivalently, we can say that the heuristic function is trained to minimize the cost. A perfect heuristic of A* always yields minimal cost. So the problem here is that the heuristic function is inadmissible (overestimating) unless converged, which yields sub-optimal solutions. Maybe adding admissibility and consistentness (haven't found such a word in the dictionary though, closest one was uniformity) constraints to the heuristic function would help the algorithm to converge to optimality faster, who knows.
I think I'll try experiment with their solution on a real road network to see how well it behaves. For the last 2 months I've been working on applying RNNs for optimal route planning on real road networks, and the key component my search algorithm lacked was the robustness of A* (my algorithm gets stuck in loops quite often).