•  
  •  
 

Document Type

Article

Keywords

large Language Models, Phishing Detection, Anomaly Detection, Natural Language Processing, Cybersecurity

Abstract

The paper fills in a gap in the literature that demonstrates an insufficient number of sturdy detection schemes that can recognize the small semantic aberrations inherent in LLM-generated deceptive text. Our proposed co-design hybrid model is Semantic Anomaly Detection with Isolation Forest (SADI) model that combines the synergistic mixture of a fine-tuned transformer-based LLM for deep semantic feature extraction with Isolation Forest algorithm that detects anomalies efficiently. This study introduces SADI, an adaptive semantic-anomaly detector for large-language-model phishing emails. Using a corpus of 10 000 messages, SADI attains an F1 score of 0.981 (95 % CI 0.978–0.984) and processes a single message in 18 ms on consumer GPUs. An expanded evaluation against three public benchmarks and a live enterprise feed confirms robustness to prompt variation. Code, data splits, and a reproducible environment file accompany the paper. We also prepared the new, more challenging target dataset, that is phishing attacks synthesized by a variety of state-of-the-art LLMs, denoted LLM-Phish-Synth-2025, with this objective in mind. The results of our experiments on three publicly available data sets and our new corpus show that SADI received a higher F1- score of 0.981 compared to the baseline models, including separate fine-tuned LLMs, by a wide margin. The proposed SADI architecture is the first to combine semantic anomaly detection with adaptive, contamination-aware isolation in the context of LLM-generated phishing, addressing both scalability and evolving attack sophistication. The theoretical impact is the new architecture of architectural fusion that branched semantic anomaly detection and the practical advantage of a more robust defense solution against an ever-evolving threat in cyber space in addition to the provision of a new benchmark dataset to the research community. This approach is an efficient and scalable solution to combat the wave of the phishing campaigns generated by AI.

Share

COinS