Deep Reinforcement Learning for Dynamic Multichannel Access (Invited Paper)


We consider the problem of dynamic multichannel access in a Wireless Sensor Network (WSN) containing N correlated channels, where the states of these channels follow a joint Markov model. A user at each time slot selects a channel to transmit a packet and receives a reward based on the success or failure of the transmission, which is dictated by the state of the selected channel. The objective is to find a policy that maximizes the expected long-term reward. The problem can be formulated as a partially observable Markov decision process (POMDP), which is PSPACE-hard and intractable. As a solution, we apply the concept of online learning and implement a Deep Q-Network (DQN) that can deal with large state space without any prior knowledge of the system dynamics. We compare the performance of DQN with a myopic policy and a Whittle Index-based heuristic through simulations and show that DQN can achieve nearoptimal performance. We also evaluate the performance of DQN on traces obtained from a real indoor WSN deployment. We show that DQN has the capability to learn a good policy in complex real scenarios, which do not necessarily show Markovian dynamics.

In International Conference on Computing, Networking and Communications - ICNC, IEEE.