Synthesizing Deception: Countering Large Language Model-Generated Phishing Campaigns through Adaptive Semantic Anomaly Detection

Bekim Fetaji, Rochester Institute of Technology, Computing and Information Technologies, RIT Campus, Kosovo
Debabrata Samanta, Rochester Institute of Technology, Computing and Information Technologies, RIT Campus, KosovoFollow

Document Type

Article

Keywords

large Language Models, Phishing Detection, Anomaly Detection, Natural Language Processing, Cybersecurity

Abstract

The paper fills in a gap in the literature that demonstrates an insufficient number of sturdy detection schemes that can recognize the small semantic aberrations inherent in LLM-generated deceptive text. Our proposed co-design hybrid model is Semantic Anomaly Detection with Isolation Forest (SADI) model that combines the synergistic mixture of a fine-tuned transformer-based LLM for deep semantic feature extraction with Isolation Forest algorithm that detects anomalies efficiently. This study introduces SADI, an adaptive semantic-anomaly detector for large-language-model phishing emails. Using a corpus of 10 000 messages, SADI attains an F1 score of 0.981 (95 % CI 0.978–0.984) and processes a single message in 18 ms on consumer GPUs. An expanded evaluation against three public benchmarks and a live enterprise feed confirms robustness to prompt variation. Code, data splits, and a reproducible environment file accompany the paper. We also prepared the new, more challenging target dataset, that is phishing attacks synthesized by a variety of state-of-the-art LLMs, denoted LLM-Phish-Synth-2025, with this objective in mind. The results of our experiments on three publicly available data sets and our new corpus show that SADI received a higher F1- score of 0.981 compared to the baseline models, including separate fine-tuned LLMs, by a wide margin. The proposed SADI architecture is the first to combine semantic anomaly detection with adaptive, contamination-aware isolation in the context of LLM-generated phishing, addressing both scalability and evolving attack sophistication. The theoretical impact is the new architecture of architectural fusion that branched semantic anomaly detection and the practical advantage of a more robust defense solution against an ever-evolving threat in cyber space in addition to the provision of a new benchmark dataset to the research community. This approach is an efficient and scalable solution to combat the wave of the phishing campaigns generated by AI.

How to Cite This Article

Fetaji, Bekim and Samanta, Debabrata (2025) "Synthesizing Deception: Countering Large Language Model-Generated Phishing Campaigns through Adaptive Semantic Anomaly Detection," Mesopotamian Journal of Computer Science: Vol. 5: Iss. 1, Article 15.
DOI: https://doi.org/10.58496/MJCSC/2025/015

Download

Included in

Computer Sciences Commons

COinS

Synthesizing Deception: Countering Large Language Model-Generated Phishing Campaigns through Adaptive Semantic Anomaly Detection

Authors

Document Type

Keywords

Abstract

How to Cite This Article

Included in

Share

Search