Help with database design for personal project

Hi,

This is a bit of a longish post, sorry.

For a bit of practice I am working on a personal project. The idea being I take some json data, wrangle it into a relational database, and hopefully at the end create a nice form that will filter results e.g. based on mileage, colour choice, make, model etc.

My question relates to the database design, something I am bit of a novice on.

This is an example of the json data I have created.

locations.json

[
    {"county": "Mid Glamorgan", "town": "Aberdare"},
    {"county": "Aberdeenshire", "town": "Aberdeen"},
    {"county": "Gwynedd", "town": "Aberdovey"},
    {"county": "Gwent", "town": "Abergavenny"},
    {"county": "Clwyd", "town": "Abergele"},
    {"county": "Gwent", "town": "Abertillery"}
	...
]

A limited car choice in that I haven’t included the model variation, M series etc.
cars.json

[
	...
	{
        "model": "Puma",
        "make": "Ford",
        "transmissions": ["manual", "automatic"],
        "fuel_types": ["petrol", "diesel", "hybrid", "electric"],
        "colours": ["black", "blue", "grey", "white", "silver", "red", "green", "brown", "gold", "orange", "purple", "yellow"],
        "retail_price": 28000
    },
    {
        "model": "model y",
        "make": "tesla",
        "transmissions": ["automatic"],
        "fuel_types": ["electric"],
        "colours": ["black", "blue", "grey", "white", "silver", "red"],
        "retail_price": 56000
    },
	...
]

Using the above with a bit of randomness and some calculations taken into account e.g depreciation for year, mileage I can generate however many cars for sale.

cars_for_sale.json

[
	...
    {
        "make": "Mercedes-Benz",
        "model": "C-Class",
        "year": 2017,
        "fuel_type": "hybrid",
        "transmission": "automatic",
        "color": "bronze",
        "location": {
            "county": "Bedfordshire",
            "town": "Biggleswade"
        },
        "mileage": 42596,
        "price": 13330
    },
    {
        "make": "BMW",
        "model": "1 Series",
        "year": 2016,
        "fuel_type": "petrol",
        "transmission": "manual",
        "color": "green",
        "location": {
            "county": "West Sussex",
            "town": "Arundel"
        },
        "mileage": 71571,
        "price": 9191
    }
	...
]

Using Pandas and SQL alchemy I will be creating the tables.

I have been considering the relationships and come up with the following

Towns and Counties (many-to-many)
A county has many towns
A town has many counties

e.g. The town Newport can be found in Gwent, Isle of Wight and Shropshire

Makes and Models of car (one-to-many)
A make can have many models
A model has only one make

Models and Colours (many-to-many)
A model can have many colours
A colour can have many models

Models and Fuel Types (many-to-many)
A model can have a few fueltypes
A fueltype can have many models

Models and Transmissions (many-to-many)
A model can have a few transmission types
A transmission type can have many models

Cars for sale
I am thinking the following schema (note have removed on delete cascade etc. just to simplify)

TABLE cars_for_sale {
    car_id INTEGER [primary key]
    model_id INTEGER
    colour_id INTEGER
    fuel_type_id INTEGER
    transmission_id INTEGER
    town_county_id INTEGER
    year INTEGER
    price FLOAT
    mileage INTEGER
	FOREIGN KEY(model_id) REFERENCES models (model_id)
    FOREIGN KEY(colour_id) REFERENCES colours (colour_id)
    FOREIGN KEY(fuel_type_id) REFERENCES fuel_types (fuel_type_id)
    FOREIGN KEY(transmission_id) REFERENCES transmissions (transmission_id)
    FOREIGN KEY(town_county_id) REFERENCES towns_counties (town_county_id)
}

Finally thanks to the wonderful dbdiagram.io here is an ERD

Any pointers would be appreciated. If you could please bare in mind I am a bit of simpleton when it comes to this :biggrin:

1 Like

I would question this part. Newport in Gwent is not the same location as Newport in Shropshire, so to my mind it should be a different entity with a different ID.
If I’m in Shropshire looking for a car in Newport, I don’t want to look at ones in the Isle of Wight.
Though there are many ways to store things like places where there is a “parent child” type hirearchy. So you could have a table of unique pairs of place_name_id and parent_id if you wanted to, but I’m not sure about that method.

3 Likes

I was wondering about this. I know in the US a town can (rarely) span multiple counties but didn’t know if they happen elsewhere.

The model_* tables confuse me a bit unless you’re using them just as filters for the for sale functionality (though you’re going to need to include year on them as well since the options can change from year to year). I will say that models_colours could cause you problems since a person can easily paint a car any color they want.

I don’t think this spanning, but different towns sharing the same name.

1 Like

Isn’t that the point of the town_county_id?

town_county_id  town_id  county_id
1               1        2
2               1        27
3               1        63

Or have I got that wrong?

An excuse to try out JS’s new Object.groupBy
https://codepen.io/rpg2019/pen/ExqpyOm

I think there are Whitchurches all over the UK, just as an example

1 Like

You should not share the town name in different counties. So let’s say you have town name A in country A and county B. What would be the advantage to have town A only as one entry in the database? Nothing as town A in county A has nothing to do with town A in county B. No one will ever search for a car available in town named A doesn’t matter if it is in county A or B.

So on that basis, what’s the fix? Would you just keep it as a comma separated string?

First I should think of the big picture. A quick review I found 2 main tables.

  1. The car
  2. Where is the car? And what status?

This means that each car can have several status (for sale, in stock, sold etc)
All other tables may be “lookup tables” that supports these main tables.

Or do I miss something obvious?

This means that each car can have several status (for sale, in stock etc)

I am not going that deep with this. I’m just thinking about cars that are available for sale. My focus really is on a bit of data wrangling, database creation and a dynamic form at the end.

@SamA74 Again isn’t that the junction table? Could that table just be by name instead e.g.

id County Town
1 Shropshire Newport
2 Gwent Newport

I only wanted to say, that county to town is a 1:n relation and one county can have multiple town, but one town can definitely only have one county

1 Like

I confess I can’t see the wood for the trees at the moment. Trying to take onboard comments, I need a bit more practice at this.

I will say that models_colours could cause you problems since a person can easily paint a car any color they want.

This has raised a question for me, whether I should have the junction table links to models at all. Within the context of what I am doing, what purpose do they serve?

Here are a couple of refactored ERDs. In the cars for sale, I have substituted town_id for customer_id. The customer table holds the town and county. I am intentionally keeping the customer table simple, as I am only interested in the location at this point.

Option A with junction tables

Option B without

As mentioned the point of this exercise is not to create a fully fledged car auction app/site.

Option B seems cleaner to me. But why counties?

Option B seems cleaner to me. But why counties?

I think I agree.

Re counties, my thinking is that you can see what cars are available in your county. That seems a sensible filter to me.

Inspired somewhat by indeed’s job search

every time you have only one attribute next to the id in a table, you should think about the need. For me all this tables for county’s, colors, transmissions etc make no sense.

@Thallius

So are you saying that rather than cars_for_sale being like this

Table: cars_for_sale
car_id
model_id
colour_id
fuel_type_id
transmissions_id

it should be
Table: cars_for_sale
car_id
model_id
colour_name
fuel_type
transmissions_type

Each row of cars for sale would have the string names/values of these properties (repeated), rather than an id?

Or have I got the wrong end of the stick?

What I will add, in the back of my mind is the thought that when I come to work with this data, selections etc. the mistakes in my design will become apparent. I guess that is how you learn though.

If you solve this issue, you could perhaps sell it to Ford / Microsoft for their sat-nav systems. Sitting in my mate’s Ford Edge in the middle of the Shropshire countryside having delivered some parts with no idea how to get back to civilisation, we asked it to navigate to Newport. It presented three options, which were “Newport, England”, “Newport, England” and “Newport, Wales”. Neither of the England ones were the one we wanted, but by the time we realised it was taking us to the Isle of Wight we were back on a main road we recognised.

2 Likes

Welcome to Shropshire :biggrin: I like your comment about getting back to civilisation.

1 Like

So new to me ENUMS, is this a bad idea/bad practice?

I have just been having a play

CREATE TYPE fuel_type AS ENUM ('petrol', 'diesel', 'electric', 'hybrid');
CREATE TYPE transmission_type AS ENUM ('manual', 'automatic');

CREATE TABLE cars_for_sale2 (
    car_id SERIAL PRIMARY KEY,
    model_id INTEGER NOT NULL,
	colour VARCHAR(50) NOT NULL,
	fuel_type fuel_type NOT NULL,
	transmission_type transmission_type NOT NULL
);
INSERT INTO cars_for_sale2 
	(model_id, colour, fuel_type, transmission_type) 
VALUES 
	(5, 'green', 'petrol', 'automatic');
-- INSERT 0 1
INSERT INTO cars_for_sale2 
	(model_id, colour, fuel_type, transmission_type) 
VALUES 
	(5, 'green', 'jelly', 'automatic');
-- ERROR:  invalid input value for enum fuel_type: "jelly"