Document Type
Article
Keywords
large Language Models, Phishing Detection, Anomaly Detection, Natural Language Processing, Cybersecurity
Abstract
The paper fills in a gap in the literature that demonstrates an insufficient number of sturdy detection schemes that can recognize the small semantic aberrations inherent in LLM-generated deceptive text. Our proposed co-design hybrid model is Semantic Anomaly Detection with Isolation Forest (SADI) model that combines the synergistic mixture of a fine-tuned transformer-based LLM for deep semantic feature extraction with Isolation Forest algorithm that detects anomalies efficiently. This study introduces SADI, an adaptive semantic-anomaly detector for large-language-model phishing emails. Using a corpus of 10 000 messages, SADI attains an F1 score of 0.981 (95 % CI 0.978–0.984) and processes a single message in 18 ms on consumer GPUs. An expanded evaluation against three public benchmarks and a live enterprise feed confirms robustness to prompt variation. Code, data splits, and a reproducible environment file accompany the paper. We also prepared the new, more challenging target dataset, that is phishing attacks synthesized by a variety of state-of-the-art LLMs, denoted LLM-Phish-Synth-2025, with this objective in mind. The results of our experiments on three publicly available data sets and our new corpus show that SADI received a higher F1- score of 0.981 compared to the baseline models, including separate fine-tuned LLMs, by a wide margin. The proposed SADI architecture is the first to combine semantic anomaly detection with adaptive, contamination-aware isolation in the context of LLM-generated phishing, addressing both scalability and evolving attack sophistication. The theoretical impact is the new architecture of architectural fusion that branched semantic anomaly detection and the practical advantage of a more robust defense solution against an ever-evolving threat in cyber space in addition to the provision of a new benchmark dataset to the research community. This approach is an efficient and scalable solution to combat the wave of the phishing campaigns generated by AI.
How to Cite This Article
Fetaji, Bekim and Samanta, Debabrata
(2025)
"Synthesizing Deception: Countering Large Language Model-Generated Phishing Campaigns through Adaptive Semantic Anomaly Detection,"
Mesopotamian Journal of Computer Science: Vol. 5:
Iss.
1, Article 15.
DOI: https://doi.org/10.58496/MJCSC/2025/015
Available at:
https://map.researchcommons.org/mjcsc/vol5/iss1/15