Abstract: In the Vision-and-Language Navigation (VLN) task, an agent must comprehend natural language instructions and execute precise navigation in complex environments. While significant progress ...