Select by first 2 characters only

Hey all

I have inherited a table like this


|catCode|catDesc|

BO bosch
BO01 bosch DIY
BO02 bosch trade
MK makita

I need to select all records with only 2 letters as these are the parent categories.

I tried:

SELECT LEFT (`catCode`,2) FROM `cats` GROUP BY `catCode`

but obviously this will select all records not just the ones with 2 characters.

am I looking for something to do with length? select where max length = ???

wow! great work! I’ve read through that Sitepoint article and some others and feel I have a better handle on it now.

Many thanks again!

Yup, that all makes sense, I just wondered if there is a term that describes the method you have used here so I can go away and learn more about it.

The solution is derived from the section The Adjacency List Model which can be found here :slight_smile:

Also, I just thought of two things:

  1. The query doesn’t return “childless” parents. The following query doesn’t have that problem:

SELECT
   c1.catCode AS parentCatCode
 , c1.catDesc AS parentCatDesc
 , c2.catCode AS childCatCode
 , c2.catDesc AS childCatDesc
FROM
  cat c1
  LEFT JOIN
  cat c2
    ON
  CHAR_LENGTH(c2.catCode) > 2
  AND SUBSTRING(c2.catCode,1,2)=c1.catCode
WHERE CHAR_LENGTH(c1.catCode) = 2

Output:


parentCatCode  parentCatDesc  childCatCode  childCatDesc
-------------  -------------  ------------  ------------
BO             bosch          BO01          bosch DIY   
BO             bosch          BO02          bosch trade 
MK             makita         (NULL)        (NULL)      

Note that MK is now shown where it’s wasn’t before, and the fact that the fields childCatCode and childCatDesc are both (NULL) for that row indicates it doesn’t have any child categories.

  1. You don’t necessarily need to create an extra table for the childs. You can also number all the records and add an extra field called “parent”.

The data you provided will then look like this:


id | catCode | catDesc     | parent
-----------------------------------
1  | BO      | bosch       | (NULL)
2  | BO01    | bosch DIY   | 1
3  | BO02    | bosch trade | 1
4  | MK      | makita      | (NULL)

And the query to retrieve all data will look like this


SELECT
   c1.catCode AS parentCatCode
 , c1.catDesc AS parentCatDesc
 , c2.catCode AS childCatCode
 , c2.catDesc AS childCatDesc
FROM
  cat c1
  LEFT JOIN
  cat c2
    ON
      c2.parent=c1.id
WHERE
   c1.parent IS null

So that basically does the same as the query I showed earlier but is WAY faster because it doesn’t have to execute any functions :slight_smile:
You could the same if you make the parent field a varchar field and set the values to “BO” for example. Then you don’t need an id field. This does however have the drawback that everything breaks when there are new categories with more or less than two characters for the catCode.

You can get it all in one go. Assuming the table is called “cat”:

SELECT
   c1.catCode AS parentCatCode
 , c1.catDesc AS parentCatDesc
 , c2.catCode AS childCatCode
 , c2.catDesc AS childCatDesc
FROM
  cat c1
  INNER JOIN
  cat c2
    ON
      CHAR_LENGTH(c1.catCode) = 2
  AND CHAR_LENGTH(c2.catCode) > 2
  AND SUBSTRING(c2.catCode,1,2)=c1.catCode

Note however that the problem I outlined above is even worse now since you have to perform 2 CHAR_LENGTH functions and 1 SUBSTRING function on every combination of rows. Basically for your example you need to perform 4 x 4 x 3 = 48 functions.
Since this is a quadratic problem you will get serious performance issues once the table grows. This is not something you should take lightly!
As an example for a 1,000 row table you need to perform 1,000 x 1,000 x 3 = 3,000,000 functions!

ok, now heres the doozie…

I need to produce a list of all top level cats (2 character ones) and display them with their child cats below all on one page.

I’d like to display this in a tree structure.

do I do this with 2 queries? or one query with sub query? (not done one of these before!)

many thanks ScallioXTX

I completely agree.

I’ve grabbed this out of our borland DB and just need to get this working for now. I plan to completely redo this.

SELECT
   something
 , anything
 , just_not_the_star
FROM
  `cats`
WHERE
   CHAR_LENGTH(`catCode`)=2
GROUP BY
  `catCode`

:slight_smile:

PS. I would seriously consider creating two tables for this problem. One for the parent categories and one for the child categories. What if some point in time there is a new parent category with 3 characters, or 1 character? Exactly, mayhem :slight_smile:
Also, if the table becomes large and you have to perform the CHAR_LENGTH() function on each and every row the query will become terribly slow. As a general rule of thumb you should never use a function in the WHERE clause unless you absolutely have to.

That is exactly what I needed!! Really appreciate that.

I really appreciate the advice too and I wont be using this as the final system. Its more to demonstrate something to the powers that be using what they’ve already got.

So I can go away and learn more about this…what is ‘parentCatCode’ called in this instance? It would appear to be a variable that you are creating within a mysql statement?? I this the right terminology?

Select fieldName AS someThingElse

When you run this query on the data you supplied you will get the following result


parentCatCode  parentCatDesc  childCatCode  childCatDesc
-------------  -------------  ------------  ------------
BO             bosch          BO01          bosch DIY   
BO             bosch          BO02          bosch trade 

the parentCatCode and the parentCatDesc indicate the catCode and catDesc of the parent category respectively and the childCatCode and the childCatDesc indicate the catCode and catDesc of the child category respectively. Because they are all on the same row you can see the relation right there.

So when you start you output the parent information, and then indent and display all the child data for the rows that have the same parent category.

Does that make sense?

Note that the solution doesn’t scale; you can only use for parent-child, not for parent-child-grandchild or more.