Multimodal Deep Learning Based Wildlife Intrusion Perception Using YOLOv12 and YAMNet
by Arun Kumar Ankeshwarapu, Dr. B. Venkat Raman, Madhu Kumar Bolle, Vamshi Krishna Velpula
Published: April 27, 2026 • DOI: 10.51244/IJRSI.2026.1304000040
Abstract
Crop damage caused by wildlife intrusion is a major challenge for farmers near forest boundaries. Traditional monitoring methods are labor-intensive and ineffective under poor visibility conditions. This paper proposes a multi-modal wildlife intrusion detection system that combines visual object detection and environmental sound classification.
The system utilizes the YOLOv12 model for real-time animal detection from surveillance video and YAMNet for identifying animal sounds. By integrating visual and auditory sensing, the proposed framework improves detection reliability in low-light or occluded conditions. Experimental evaluation demonstrates improved detection accuracy compared to single-modal approaches. The system can be deployed on edge devices such as Raspberry Pi or Jetson Nano, enabling real-time monitoring of agricultural fields.