The easiest but (but not very semantic) method is to use a table with two rows of 6 cells and in the top row you put the images and on the next row you put the captions. The cells will then automatically equalise. You can do it in the same manner using display:table (ie8+) but again it needs two separate rows.
The css method would require using fixed heights which would mean truncating content or allowing a maximum amount of height to cater for all the captions. You'll have to wait for some of the new css3 modules to do this properly.