It can be done easily by using a table row philosophy but then you have to dis-associate the text from the images and contain each in a new row which isn't semantically very good.
There are some things you can do if you want to keep things logical but you may have to make compromises in some way.
You could for example align the text neatly but that would mean the top of the images would be uneven as you would simply vertically-align them to the bottom (using inline-block instead of floats). This just changes the uneven aspect to the top of the images rather than the text.
You also have to consider whether the text under the image is going to run to two lines and that would again mess up the alignment unless they were all in separate cells and rows as mentioned firstly.
So what else can you do?
1)You could set min-heights for the elements so that that they line up and then control your data to be within those limits.
2) Assuming your text under the image is only one line you could create some padding-top above the image and absolutely place the image into the padding and that would mean the text would line up but you would need to keep the images within those constraints (or hide the overflow).
padding-top:210px;/* arbitrary height */
3) Use display:table and table-cell but you would need to do it in separate rows so all images go n the first row and then text on the second row and buttons in the third row. Something like this example but you need an extra row and align the images vertical-align:top;
4) The flexible box model allows for things like this but support is minimal at the moment so I would wait unless you have a fall-back for other browsers.
The easiest solution would be as shown in Number 2 above but of course does not allow for full flexibility.
Years ago I did a few examples that might be of interest but they possibly need updating.