SitePoint Sponsor

User Tag List

Results 1 to 11 of 11
  1. #1
    SitePoint Zealot RyanKing1809's Avatar
    Join Date
    Oct 2011
    Location
    Melbourne, Australia
    Posts
    170
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Article Database Design Scalability

    Ok lets say I have a page and the body of that page is made up of blocks of text, lets call them sections. Each section is made up of a title and a body text, it may also have sub-sections containing also a title and body text. So we have something like this:

    <page>
    <section>
    <section>
    <sub-section>
    <sub-section>
    <section>
    <section>
    </page>


    When it comes down to designing the database for this, it would be easier to have 3 tables, one for each level, the page, the section and the sub-section (the section/sub-section could probably easily function as one table).

    Would there be scalability issues when storing the data this way?

    Say the site developed 100 000 pages and each contained 10 sections then there'd be 1 000 000 rows in one table.
    Would mysql significantly slow down searching a table that size?

    Is it a better option to store all sections and sub-sections as a string in the page table then split it with php?

    Thankyou.

  2. #2
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    unless you're planning to re-use the sections and subsections, i.e. the same section on multiple pages, then there's no reason why you'd need more than one table, containing one chunk of page text

    on the other hand, things like nav bars and footers would be re-used, so you might want sections after all

    me, i'd still put them all into one table, such that each section links to the page it belongs to, in a hierarchical structure
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  3. #3
    SitePoint Zealot RyanKing1809's Avatar
    Join Date
    Oct 2011
    Location
    Melbourne, Australia
    Posts
    170
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The page and section have different attributes so they can't be stored in the hierarchal structure I think you're suggesting.

    A page would contain something like:
    id
    title
    type
    parent_id
    path
    img_url
    body
    --> this is where sections could be stored in a large string


    where as section:
    id
    parent_page_id
    parent_section_id
    title
    body




    The sections are unique to each page, and wont be used anywhere else, so storing them in one of the page table columns as a large block of text might be the best way of doing things. It seems much easier now i think about it.

    Thanks

  4. #4
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,191
    Mentioned
    17 Post(s)
    Tagged
    4 Thread(s)
    Considering a section looks like a categorization for a page I would use a m:n approach rather than 1:m.

    page
    - id
    - ...

    section
    - id
    - parent_id | fk: section(id)
    - ....

    page_section
    - page_id | fk: page(id)
    - section_id | fk: section(id)
    The only code I hate more than my own is everyone else's.

  5. #5
    SitePoint Zealot RyanKing1809's Avatar
    Join Date
    Oct 2011
    Location
    Melbourne, Australia
    Posts
    170
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ah ok, I haven't thought of it like that. That seems a simpler database structure. I would prefer to use a structure like this but it would take 3 queries to display read the sections, wouldn't it? Do you think that would suffer from performance issues on a large scale? I would like to have the structure for this website to grow very large and suffer as little performance issues as possible.

  6. #6
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,191
    Mentioned
    17 Post(s)
    Tagged
    4 Thread(s)
    Adjacency list a is proven method/pattern to handle hierarchical relationships. There are other techniques such as; nested set, closure table and path enumeration. However, given the initial question I would recommend sticking with the adjacency list considering it is the simplest one grasp. There are quit a few people that swear by nested set but it is much more complex and difficult to grasp. Especially true for someone who does not yet have a solid grasp of relational database fundamentals. Just stick with the adjacency list.
    The only code I hate more than my own is everyone else's.

  7. #7
    SitePoint Zealot RyanKing1809's Avatar
    Join Date
    Oct 2011
    Location
    Melbourne, Australia
    Posts
    170
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry just so I can gain a proper understanding, I apologise if it's at all frustrating to explain.
    I've been using this as a reference: http://www.datanamic.com/support/lt-...-modeling.html



    The relationship between the page & section are as follows:

    page --> section: 1 page contains many sections -- 1:n
    section --> page: 1 section is contained within 1 page -- 1:1

    Based one what I understand from the method linked above, I've concluded that the relationship between page:section is 1:n

    Just to be clear both page & section are recursive and use the adjacency list method.



    Why do you suggest it should be an m:n relationship?

    Does it have anything to do with both page & section being recursive?

    Why shouldn't the section & page_section tables you suggested be one table?


    Thanks for your help.

  8. #8
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    why is page recursive?

    i understand how you want a section to have subsections, but then why would a page have subpages? wouldn't those subpages just be other sections?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  9. #9
    SitePoint Zealot RyanKing1809's Avatar
    Join Date
    Oct 2011
    Location
    Melbourne, Australia
    Posts
    170
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    No. Sub-pages would be a way of easily creating 'categories'. With the data i'm dealing with, really the only difference between a page and category is how it's displayed. The sections would be the content displayed within the page.

    Say I had apage on architecture. This page on architecture would have lots of content broken up into sections. There would be an introduction section, a styles section, a materials section etc. Each section would consist mainly of a title, a body of text and maybe a few images.

    Each section may have sub-sections. For example the styles section may have subsections georgian, roman or egyptian architecture.

    Then say I come along and expand the georgian architecture section. It becomes really big, in fact so big that it should be its own page. So I separate it into it's own page but it is still really a part of the architecture page, it should be under an architecture category. Making it a sub-page is a way of creating that category without knowledge it was needed before hand.

    Data used in both a category and page are the same, just displayed differently depending on whether they have children. This is also convenient as the situation arises where it is better to display a page with children as a page and not a category. It's a bit hard to explain, but it's the best way to display my content and have that flexibility. I think.


    Does that make sense?

  10. #10
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,338
    Mentioned
    63 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by RyanKing1809 View Post
    Does that make sense?
    yup

    seems you've thought this through fairly well

    i'm wondering why you posted this thread, now
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  11. #11
    SitePoint Zealot RyanKing1809's Avatar
    Join Date
    Oct 2011
    Location
    Melbourne, Australia
    Posts
    170
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Haha me too, but I have learnt a lot as a result.

    To be honest I think the question has been misinterpreted. I'm really new to using databases and am quite unfamiliar with its limitations.

    Instead of having a sections table, I was considering having sections written as a large block of text in the pages table which would be in a coded format like html, or wikitext for example, that would then be later decoded for display by php. This would save creating a second connection to the database to retrieve the section table, i was wondering if it would cut down processing costs? I thought this may be a minimal performance issue that might increase as the tables grew.

    I have been looking at wikimedia's database schema though and they seem to be running many database queries without any issues. So I don't think it will be a problem.

    I just want to allow for this website to get large without any performance issues, and would rather deal with them now than many years down the track after it has all been implemented.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •