fn mul_wide(a: &[u64; 4], b: &[u64; 4]) -> [u64; 8]
Schoolbook multiplication producing an 8-limb (512-bit) result.