Legend is an... unpredictable element at best. Much like the rest of form elements it does not take width, height or padding the same way in any two browsers.
This is why I usually put a span inside the legend, pad the top of my fieldset and absolute position the span -- a technique Dan Schulz showed me a few years ago.
Same goes for fieldset which can also react unpredictably, which is why sometimes you end up having to wrap fieldset in a predictable tag like DIV.
That way you have the fieldset/legend for accessibility and semantics, with tags you can actually predict the behavior of... and trust me, it's more than just legacy IE where differences are going to rear their ugly head.
It feels dirty, it's more markup, but it's the best solution I've found and means you don't need browser specific hacks to make things work.
Besides, you can always find markup savings elsewhere. Forms are a royal pain in the ass and you're best bet in most any case is to strip the formatting off anything you can't predict the behavior of in CSS, and wrap it in tags that actually behave. See attempting to style an input.
... and you can blame this on the specification since it doesn't actually say how fieldset, legend, input, textarea or select are supposed to accept styling, or even if they should!