FORECASTING POST-PERIODONTAL SURGERY HEALING USING FEDERATED DQN MODELS: WHEN REINFORCEMENT LEARNING ISN’T ENOUGH
This work is licensed under Creative Commons Attribution–NonCommercial International License
(CC BY-NC 4.0).
Abstract
Background: Predicting healing outcomes after periodontal flap surgery is important for personalized patient care, but
conventional machine learning methods dominate this area. Reinforcement learning (RL) approaches are rarely applied in periodontal clinical datasets due to data scarcity, class imbalance, and high outcome variability. This study addresses this gap by exploring a split-federated MonAco-style Deep Q-Network (DQN) encoder model for classifying postsurgical healing outcomes (Good, Fair, Poor) from a small multi-center dataset.
Materials and Methods: We analyzed a dataset of 300 periodontal surgical patients with five numeric features (age,
probing depth, attachment loss, gingival index, and plaque index) and four categorical features (sex, smoking status,
diabetes status, and procedure type). The data were one-hot encoded and standardized, then split into 80/20 training and testing sets. The training set was further partitioned into 3 “clients” to simulate federated learning across clinics. We implemented a MonAcoFed-DQN encoder, consisting of an online multilayer perceptron and a momentum-based target encoder (updated via an exponential moving average), along with a classification head. A contrastive mean-squared error loss was added to the cross-entropy loss to stabilize the training process. Federated training used FedAvg over three local epochs per round. A baseline model with a hidden size of 64, a learning rate of 1e-3, and three federated rounds was evaluated. A grid search over hyperparameters (hidden units {32, 64, 128}, learning rate {1e-3, 5e-4, 1e-4}, local epochs {5, 10}) optimized accuracy. An extended 10-round federated training with the best hyperparameters examined learning dynamics. Performance was compared to SOTA results from the literature on larger datasets.
Results: The split-fed DQN encoder scored 36% accuracy, just above the 33% chance level due to class imbalance.
Hyperparameter tuning improved accuracy to 37.1% with a wider encoder, lower learning rate, and 10 epochs. Most
setups ranged from 20% to 35%, with a median of ~34%, where smaller models or higher learning rates underperformed.
The learning curve peaked early at 33%, then oscillated near 30–31%, and finally ended at 31.4%. These results are
much lower than traditional ML on larger datasets, which often achieve 78–87% accuracy/AUC.
Conclusion: This pilot study employed a split-federated DQN encoder (MonAcoFed-DQN) on periodontal surgery data, demonstrating technical feasibility with momentum contrast; however, it achieved only 30–37% accuracy due to
limitations in the dataset. It emphasizes the need for more data and balanced classes in deep reinforcement learning for clinical use. Simpler models or tree-based methods are suitable for small medical datasets. Future efforts should enlarge datasets, incorporate domain knowledge, and develop hybrid or tabular-specific deep models to enhance periodontal treatment predictions.