We introduce DentalNet, a multi-modal deep learning framework that fuses 2D views and 3D geometry through cross-attention for fine-grained relational reasoning. Setting a new benchmark and the first work for IOTN classification, that may also generalize to broader 3D reasoning domains.