There's two things you might be getting mixed up with here.
What people typically refer to as the "duplicate content penalty" isn't an actual penalty at all, it's just down to badly targeted googlejuice. Let's say that you have a page that can be accessed at example.com or example.com/index.htm or www.example.com or www.example.com/index.htm. If you don't have a canonical tag set and you don't do any visible rewriting or matching of the URL, there's a danger that Googlebot treats those four URLs as four separate pages. Then it might be the case that a quarter of your incoming links point to each one – when that's the case, you're dividing your link juice four separate ways. The result of that is that none of the formats are getting the full benefit of all the available link juice, so are not likely to rank as well as they would do if you were channelling all link juice into a single URL.
The second issue is about plagiarised, stolen or scraped content. If Google believes that the content on your site has been illegitimately copied from another website then it is likely to actively penalise you, and probably blacklist your site altogether.