## User Tag List

1. Originally Posted by SJH
Oh, and don't be fooled into thinking that more rows = less efficient, or that more rows = lack of flexibility for displaying the data at the front end. If data is stored in an efficient manner, you will be able to display it however you want. Trust me on this
You can't tell me that matthijsA's schema is more efficient or flexible than mine?

Originally Posted by SJH
Wardrop, please explain to me, using your proposed schema, how choosing 'cotton' as the material could increase the price by \$2 if the shirt is in medium, and by \$3 if large is selected. Not that you'd necessarily need to, but I still think that's another potential shortcoming of your table structure.
I just noticed that I had cotton repeated twice in my example table. Can you look at that post again and tell me if your question still applies?

EDIT: Oh wait, I think I just got what you were asking. You're asking me to demonstrate something to you if I'm not mistaken. There's two ways I can interpret that though. You're either saying, how can the total price of the shirt be increased by \$2 if medium cotton was chosen, and by \$3 if the large cotton combination was chosen, in which case I'd present you with this...

Code:
```ID	Product_ID	Attribute	Value		Price_Diff
--	----------	---------	-----		----------
388	45		size		medium		+\$1
389	45		size		Large		+\$2
390	45		material	Cotton		+\$1```
That would make a medium cotton shirt cost \$2 more than the base price, and a large cotton shirt cost \$3 more than the base price. However, you may be asking, how do you (using my schema), make the cost of selecting cotton, an extra \$3, if large has been selected, but only an extra \$2 if medium has been selected. So if you were to use the price_diff values for the two sizes in the above example table, it would make the cost of a medium cotton shirt \$3, but the \$cost of a large cotton shirt, \$5. In which case, that would be a minor limitation of my schema I guess, but it could be easily worked around by merging the two separate attributes into one (essentially putting the possible combinations as a value), which you'd probably want to do anyway seeing as though in such a circumstance, both attributes and value sets directly relate to one another. So what you'd end up with is a 'size/material' attribute which would look like the following...

Code:
```ID	Product_ID	Attribute	Value		Price_Diff
--	----------	---------	-----		----------
388	45		size/material	medium/cotton	+\$3
389	45		size/material	large/cotton	+\$5```
Not really a limitation when you think about it, as I can't think of (off the top my head), a schema that could achieve such a thing. Being a complex pricing scheme, I imagine it would require a relatively complex schema - probably too complex to worry about implementing unless it was an absolute requirement of the application.

2. Originally Posted by matthijsA
I just proposed an alternative schema. I really don't feel like or need to defend it.
The only reason we had this debate (I guess you could call it), and the only reason you have needed to defend it is because you said when you first proposed it, that it essentially achieved the same thing (which I guess technically it does, albeit in a more impractical, cumbersome way - that is however, if you were trying to achieve the same goal as what I was trying to achieve). At least that's how I interpreted it.

3. Originally Posted by Wardrop
I don't think there was a stage where I didn't understand your schema and that it was "possible", it was just so damn impractical I thought that maybe you were missing something.
Originally Posted by Wardrop
This whole time I've been trying to prove to you that your solution didn't support such dynamic pricing, and only supported one price altering attribute (when there could potentially be many). However it's clear now that you weren't trying to achieve such a thing, which makes me ask what relevance your solution has to the original post, as the whole point of this thread I thought, was about linking prices to attributes
...
That's my reason for trying to prove the point I'm trying to prove - your schema does not meet those requirements. Unfortunately it's taken me many posts to try and explain that
...
Your schema does not allow any more than one price changing attribute
Can you understand that if I read this, I interpret it as meaning: "your schema is not doing what the requirements are, allowing price differences for different attributes"?

Again, what is "impractical" is something else and depends on other factors. For example, SJH's example, is a situation which my schema allows, without adaptation. Maybe allspiriteve needs that specific flexibility, maybe not

4. i just thought i'd jump in and correct a misunderstanding about normalization

normalization is ~not~ about removing duplication

you can have a perfectly good normalized database schema with duplication all over the place

normalization has to do with the relationship between attributes and keys

that is all

now back to your regularly scheduled discussion of the intricacies of EAV models

5. Originally Posted by matthijsA
Can you understand that if I read this, I interpret it as meaning: "your schema is not doing what the requirements are, allowing price differences for different attributes"?
Yes, that's how it should be understood. If you're trying to emphasise the point that it is possible to achieve variable pricing based on attributes using your schema, then don't. I mean, it's possible to achieve that with a single products table. For example...

Code:
```ID	Title					Price_Diff
--	-----					----------
388	T-Shirt - Green Large Cotton		\$23
389	T-Shirt - Orange Medium Polyster	\$20
389	T-Shirt - Orange Small Satin		\$24```
Would you look at that, I've got price altering product attributes.

All of these schemas essentially store the same data. I didn't think this thread was about finding a schema that can achieve the outcome any old way, I was under the impression this thread was about finding the most efficient/reliable/friendly method of doing so.

Originally Posted by matthijsA
Again, what is "impractical" is something else and depends on other factors. For example, SJH's example, is a situation which my schema allows, without adaptation. Maybe allspiriteve needs that specific flexibility, maybe not
SJH's example can also be achieved in that single products table example above. It's of course possible in mine as well as I demonstrated. In all 3 schemas, SJH's example is stored as a combined set of attribute values (with a pre-determined price). You could say mine goes a little further by allowing multiple sets of combined attribute values which mixes it all together (although you could probably claim the same thing about your schema)...

Code:
```ID	Product_ID	Attribute	Value		Price_Diff
--	----------	---------	-----		----------
388	45		size/material	medium/cotton	+\$3
389	45		size/material	large/cotton	+\$5
390	45		colour/finish	green/matte	+\$0
391	45		colour/finish	orange/matte	+\$0
392	45		colour/finish	orange/gloss	+\$3```
Anyway, it's bed time for me.

6. Originally Posted by r937
i just thought i'd jump in and correct a misunderstanding about normalization
I think we've moved on

However, if we ever revisit normalisation, I'll be sure to refer back to your post

7. Originally Posted by Wardrop
You can't tell me that matthijsA's schema is more efficient or flexible than mine?
I'm afraid so

Originally Posted by Wardrop
SJH's example can also be achieved in that single products table example above. It's of course possible in mine as well as I demonstrated. In all 3 schemas, SJH's example is stored as a combined set of attribute values (with a pre-determined price). You could say mine goes a little further by allowing multiple sets of combined attribute values which mixes it all together (although you could probably claim the same thing about your schema)...

Code:
```ID    Product_ID    Attribute    Value        Price_Diff
--    ----------    ---------    -----        ----------
388    45        size/material    medium/cotton    +\$3
389    45        size/material    large/cotton    +\$5
390    45        colour/finish    green/matte    +\$0
391    45        colour/finish    orange/matte    +\$0
392    45        colour/finish    orange/gloss    +\$3```
Anyway, it's bed time for me.
This breaks first normal form. What happens if I want to query the database for all shirts that are in medium, or all shirts that are in orange and in cotton? I guess we could use WHERE Value LIKE '%cotton%' or some variant thereof, but that's going to be one hell of a slow query.

8. Originally Posted by SJH
This breaks first normal form. What happens if I want to query the database for all shirts that are in medium, or all shirts that are in orange and in cotton?
You do have a point, and I'm glad you pointed out that fact. I'll probably think of a response to it after I get some sleep

9. Ok, I've had some sleep.

I'll admit, my schema doesn't allow for such information to be stored, but it really doesn't worry me when I think about it. That's about the only minor pitfall of the schema.

To compare, my schema, and matthijsA's schema store information in a completely different way. In my schema a single product, with multiple attributes (and values), is treated as such (a single product). matthijsA's schema takes the approach that every possible combination of attributes and values, should be it's own product.

matthijsA's schema would work if every product only had very few attributes and attribute values (3 attributes, each with 3 values would probably be as far as you'd want to go). In other words, it doesn't scale well (takes up a lot of space in the database for complex products, and data entry for such products would be a nightmare to say the least). That would be suitable however for the majority of online stores, but that's the main pitfall.

My schema on the other hand, scales very well. It's most suited to shops that sell made-to-order products, such as Dell, but can accommodate simple products at the same time (such different colour t-shirts). The only pitfall of my schema (that has be found at least), is that you can't do complex price altering which involves linking to values from separate attributes. For this you need a schema such as
matthijsA's which stores all the combinations of a product.

So there you go. Both schema's do essentially the same thing, but both have their pitfalls I guess. From allspiritsteves original post, I got the impression that he wanted a schema that supported scalability in terms of the number of attributes (so it could support customisable made-to-order products). matthijsA's interpreted the original the other way. Who's right? allspiritsteve will have to clarify won't he.

I'm off to work anyway... again (bloody work).

10. Originally Posted by Wardrop
From allspiritsteves original post, I got the impression that he wanted a schema that supported scalability in terms of the number of attributes (so it could support customisable made-to-order products). matthijsA's interpreted the original the other way. Who's right? allspiritsteve will have to clarify won't he.
Nobody's right or wrong here. However, I'm looking for a schema that is not EAV-based. Both yours and Matthijs' solutions are EAV-based. We've established EAV is extremely slow and hard to search. Thus, I've come to the conclusion EAV is not a 'best practice', but I have yet to hear what a best practice schema is for a complex e-commerce site that has to deal with many and varying attributes.

11. Originally Posted by allspiritseve
Thus, I've come to the conclusion EAV is not a 'best practice'...
okay, let's move on to the next approach

do some research on supertype/subtype

you would have a main products table with as many common attributes as possible -- e.g. description, supplier, price, etc.

(yes each product has a single price, so if there are 24 size/colour variations in t-shirts, then there are 24 separate products -- this is how the global SKU system works)

then for products which require additional oddball attributes, another table to hold just those attributes, with a foreign key back to the main products table, a one-to-one relationship for ech different product subtype table

12. Originally Posted by r937
okay, let's move on to the next approach

do some research on supertype/subtype
Also known as "Class Table Inheritance"

13. Originally Posted by r937
okay, let's move on to the next approach

So basically a simple site would have just a products table, and then as attribute columns became specialized, move them out into a separate table. Is there a rule of thumb for when to separate columns out into the subtype table?

14. Originally Posted by allspiritseve
Is there a rule of thumb for when to separate columns out into the subtype table?
well, my particular rule of thumb is whenever the number of specialized columns becomes much larger than the number of common columns

which, in the number of times i have implemented it, has never happened

i see nothing wrong with several sets of columns, that are all NULLable, where, depending on the product type, only one set will have any values

15. Originally Posted by allspiritseve
We've established EAV is extremely slow and hard to search.
Mind elaborating on that. I believe I asked this earlier in the thread but don't believe I received a response.

Originally Posted by allspiritseve
Nobody's right or wrong here.
Just to clarify, I meant who's interpretation (out of myself and matthijsA) of your original post/requirements was correct. I don't really care about getting an answer though, I was just making a point.

16. Originally Posted by r937
you would have a main products table with as many common attributes as possible -- e.g. description, supplier, price, etc.

Then for products which require additional oddball attributes, another table to hold just those attributes, with a foreign key back to the main products table, a one-to-one relationship for each different product subtype table
Isn't that essentially what my schema does? The attributes table is completely optional, and a major intention of the products table is to store common attributes as you mentioned.

I personally don't believe any attributes, which apply only to a minority of products, should be stored as columns in the products table. If you start doing this, you'll never know where to draw the line and what you'll eventually end up with is a schema which stores uncommon attributes in two separate locations. On top of that, if you stop selling products which would normally populate that specialised column, you end up with an extra, completely useless column in your products table, unless you modify the schema of the database. I don't know about you, but modifying the schema whenever as a change the products I sell, isn't something I'd like to be doing.

17. Originally Posted by r937
well, my particular rule of thumb is whenever the number of specialized columns becomes much larger than the number of common columns

which, in the number of times i have implemented it, has never happened

i see nothing wrong with several sets of columns, that are all NULLable, where, depending on the product type, only one set will have any values
Geez, might as well go back to my boss's favorite schema: one table for the whole application

In all seriousness though, do you have some links on where I can read up on when it is appropriate to move columns into another table and when to leave them in the first one? I'd like to better understand the tradeoffs involved in each decision.

18. <ignore this post, it was an accident.>

19. What I wonder though: do people have actual experience with the scaling (or not) of different schema's? What kind of numbers are we talking about? With how many products and attributes do things start to get slow? Have people put up benchmarks with dummy products to test this stuff?

I've been reading up a bit about when databases become the limiting factor and what measures are taken to scale, but the only articles I find are about massive sites like Flickr, Digg, Delicious, etc. Now with millions of users, millions of visits a day, and millions of requests an hour, I can understand some draconian measures are needed to scale the stuff. But I assume that, in this thread, we are not talking about building another Amazon.com, aren't we?

I still have to find out why an app like Magento is slow (if it is). Maybe they did some weird stuff so that they have loops of 100 queries instead of one single smart join?

20. I still have to find out why an app like Magento is slow (if it is)
http://www.crucialwebhost.com/blog/i...l-query-cache/
http://www.byte.nl/blog/2008/09/03/speeding-up-magento/
It seems that Magento needs a lot of server tuning.

21. Mastodont, thanks for the links. However, those articles still don't explain exactly why Magento is slow. It's something to do with complex code and slow queries. But then we still don't know how much the the EAV db schema itself, independently, is responsible for the slowdown.

By the way, I'm fine leaving the EAV subject behind (or split it in a new thread) and exploring other possibilities, if everybody already agrees that it is not a useful pattern in this case. I can do my own research on EAV to find out why it is so slow

22. Originally Posted by matthijsA
However, those articles still don't explain exactly why Magento is slow. It's something to do with complex code and slow queries.
I do not know Magento and am not involved in any OS shop, but lately I see some interesting presentation:
So I think that both mentioned sources are valid ...

23. @mastodont: please read carefully what I said. I said: yes, magento might be slow. And yes, there is bad code in there. But that's not the question here. The relevant question is:
"is it the EAV-db schema used in Magento that is causing the slowdown?"

This question is relevant to the discussion here, because Allspririteve want to know what is considered best practice for designing his database model. Both Wardrop and I brought forward some sort of EAV model (at least my schema looks like that). So the question is, does that schema scale or not?

In my schema, if you have 10,000 products, with each on average 10 attributes (color, size, weight, etc), you end up having a table products with 10,000 rows. A table attributes with say a few hundred attributes. And a lookup Product2Attribute table with 10x10,000 = 100,000 rows.

24. Originally Posted by matthijsA
What I wonder though: do people have actual experience with the scaling (or not) of different schema's? What kind of numbers are we talking about? With how many products and attributes do things start to get slow? Have people put up benchmarks with dummy products to test this stuff?
It depends on how the schema stores the data, and how many rows it uses to do so. As I said in my PM to you, storing a single product using your previously proposed schema, which had 10 attributes, each of which had 5 values each, would require 100 millions rows to store just that single product (if anyone doubts this, I've got equations to prove it). Obviously that schema wasn't designed to store products with that many attributes and values, but it's an example of why your question can't be answered as is.

The two things which can slow a database are size, and the queries which are required to retrieve certain information. The latter being the most common. That's why it's not possible to say that the EAV model is slow, without knowing what data is being stored, and without knowing how it needs to be accessed.

That's about all I can say in reply to such a general question.

25. You are right that in a situation in which each attribute can be combined with any of the other values for other attributes, you get millions of combinations.

I understand your calculation. But what we are thinking/talking about are different situations. Your calculation is indeed correct for a situation like the computer store in which you have many, many (almost endless) possible combinations of attributes. Each motherboard can have each memory and can have each case, can have each mouse, can have each monitor, etc. Then, indeed you are right, my schema will not scale.

However, I was more thinking about for example a clothing store. That clothing store has a limited inventory. There are not 100 million different clothing items lying in the store. There might be 10,000 or 100,000 different variations of products in total. That is doable for a database, as far as I know.