SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Evangelist
    Join Date
    Jul 2004
    Location
    USA
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Grabing Text from Complex Page

    Hello,

    I want to grab 5 "levels" from a variable generated page like the one below:

    http://www.bungie.net/Stats/PlayerSt...yer=Comrade2k7

    I just the actual level number and the percentage threw that level.

    Ive seen other sites do it but have no idea how to even begin.
    BKerr

  2. #2
    SitePoint Member
    Join Date
    Feb 2005
    Location
    Zurich, Switzerland
    Posts
    1
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Grabbing information from a HTML document is always a bad idea:

    - It's hard - you'll have to parse the document for the information which isn't meant or designed for getting parsed
    - Hence it's slow
    - The document's stucture could change without notice, making your parser useless

    Ask the webmaster if there is some sort of interface for the information (like an XML version, web service, etc.).

    If you'd still like to grab the infos from the HTML page, use http://www.php.net/preg_match along with 'regex coach' to create the right regular expressions. For more information visit www.regular-expressions.info

  3. #3
    SitePoint Evangelist comfixit's Avatar
    Join Date
    Dec 2004
    Location
    Pasadena
    Posts
    537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What do you mean when you say 5 levels? I play Halo 2 so I understand all the terms and this looks like a cool idea you have. So tell me what are you trying to do exactly?

    I know you want information off a page, but specificaly what information are you trying to get?

  4. #4
    SitePoint Evangelist
    Join Date
    Jul 2004
    Location
    USA
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The 5 levels, Team Slayer, Team Skirmish, Rumble Pit....

    There listed on that playerstats page i listed to.

    And emailing the webmaster would never change anything, Its a major game producer (Makers of the famous, Halo) who doesnt have time for these things.

    I want to get the players level in each Playlist, including the % amount they are threw the level.
    BKerr

  5. #5
    SitePoint Evangelist comfixit's Avatar
    Join Date
    Dec 2004
    Location
    Pasadena
    Posts
    537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok I think I see what your looking for. The percentage complete may be the hard part..... be back in a sec with an idea

  6. #6
    SitePoint Evangelist comfixit's Avatar
    Join Date
    Dec 2004
    Location
    Pasadena
    Posts
    537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It looks like this information will be pretty consistent. I think you could grab most of the relevent information via the DOM. Kind of sucks since most of the non DIV elements dont have names.

    If you can't get it via DOM then it will have to be scraped using regular expressions which is a pain in the ***.

    I'll tell you what, this looks like a fun project, if you just want to be able to display the information on your website I will see about writing a component that will scrape Halo stats from Bungie's site and allow you to access them programaticaly with ease.

    If this is what you were planning to do then the two methods I would suggest exploring is using Javascript and DOM or using Regular expressions to pattern match.

  7. #7
    SitePoint Zealot
    Join Date
    Aug 2004
    Location
    Madison, WI
    Posts
    191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    So before you get all headlong involved in this project, I would take a look through Bunige.net's terms of use:
    http://www.bungie.net/Help/NoticePag...ion=TermsOfUse

    In particular, I would pay attention to these phrases:

    "You may not modify, copy, distribute, transmit, display, perform, reproduce, publish, license, create derivative works from, transfer, or sell any information, software, products or services obtained from the Bungie Web Sites."

    and

    "You may not obtain or attempt to obtain any materials or information through any means not intentionally made available or provided for through the Bungie Web Sites."

    Just my $0.02. I'm sure Microsoft wouldn't be happy if you used their resources for your own personal project, especially if you wind up making money off of it.

  8. #8
    SitePoint Evangelist
    Join Date
    Jul 2004
    Location
    USA
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I could care less what Microsoft wanted.
    BKerr

  9. #9
    SitePoint Zealot
    Join Date
    Aug 2004
    Location
    Madison, WI
    Posts
    191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm just saying that:

    a) what you want to do seems to be legally restricted, and

    b) if you do it and Microsoft finds out, there is a good chance that they'll shut you down and all your work will have been in vain.

    I'm just hoping to keep Sitepoint's reputation as a place where good, legal things are discussed. There are a lot of bad things you can do over the web, and Sitepoint isn't the place to discuss them. The principle behind this bit of programming that you are discussing is cool, but in this particular case the application of the principle isn't the best.

  10. #10
    SitePoint Evangelist
    Join Date
    Jul 2004
    Location
    USA
    Posts
    594
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Im sure this is perfectly legal, as ive seen other sites use this information (Sig Builder with the levels) and bungie has even posted this on there website.
    BKerr

  11. #11
    SitePoint Evangelist comfixit's Avatar
    Join Date
    Dec 2004
    Location
    Pasadena
    Posts
    537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have to side with BKerr.

    First of all MS does not usually get involved with what happens with third party companies such as Bungie (which franchise happens to be based on the MS XBox Platform).

    Second Bungie does seem to support the concept of fan sites and anything else that gets people more involved with playing Halo 2. MS is happy with anything that promotes people to continue paying for their XBox Live.

    You have to ask why the laws are written. They are written to protect Bungie in this case. These give Bungie the ability to attack another site who tries to do something harmfull. Perhaps trying to take away traffic from the Bungie site or do something to otherwise harm Bungie.

    If BKerr is scraping the stats so that he can make his clan website look cool and that helps immerse people into their brand more I don't think he will be seeing a cease and desist letter anytime soon.

    I respect the fact that you don't think people should discuss things that are illegal like pirating software etc.... But I think in this case the spirit of what he is most likely trying to accomplish is probably pretty resonable and will not advertently or inadvertently harm or anger Bungie.

  12. #12
    SitePoint Evangelist Daijoubu's Avatar
    Join Date
    Oct 2002
    Location
    Canada QC
    Posts
    454
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's inside a table, tr, td, so have fun with strpos, substr and arrays
    Speed & scalability in mind...
    If you find my reply helpful, fell free to give me a point


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •