The fundamental problem with transforming normals is largely
a product of our mental model of what a normal really is. A normal
is not a geometric property relating to points of the surface,
like a quill on a porcupine. Instead normals represent geometric
properties on the surface. They are an implicit representation of
the tangent space of the surface at a point.
In three dimensions the tangent space at a point is a plane.
A plane can be represented by either two basis vectors, but such
a representation is not unique. The set of vectors orthogonal to
such a plane is, however unique and this vector is what we use
to represent the tangent space, and we call it a normal.