Objective: This study investigates web-based threat signals using predictive analytics and feature attribution to determine whether a webpage is phishing or legitimate.
Theoretical Framework: The research is grounded in Protection Motivation Theory (PMT), which offers a behavioral lens to interpret phishing indicators. PMT connects web features to users’ cognitive threat and coping appraisals, providing a theoretical rationale for selecting and organizing features.
Method: A logistic regression model, regularized with L1 (Lasso), was developed for its interpretability and ability to handle feature sparsity and convergence issues. Using a dataset of 11,055 labeled websites, the model incorporates three core feature sets: structural (e.g., IP-based URLs, SSL status), behavioral (e.g., redirection, form handler anomalies), and domain metadata (e.g., traffic rank, Google indexing).
Results and Discussion: The model rejects the null hypothesis that website-level features are non-predictive, confirming that structural, behavioral, and metadata-based signals significantly distinguish phishing from legitimate sites. This thematic decomposition supports both the conceptual framework and the empirical model design.
Research Implications: The findings offer actionable insights for cybersecurity professionals, especially those in regulated industries. The model enhances detection capability while maintaining transparency, crucial for compliance and risk management.
Originality/Value: This study contributes to literature by integrating PMT into a predictive modeling framework for phishing detection, an approach that bridges behavioral theory and machine learning. Its originality lies in aligning cognitive appraisal theory with interpretable statistical methods. The results are highly relevant to cybersecurity practice, offering scalable, transparent tools that support real-time decision-making and inform strategic defenses in high-risk sectors.