SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Enthusiast
    Join Date
    Sep 2005
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Python or C for a web search engine

    Hello,

    I am going to start working on a customized and comprehensive web search engine project.

    I am so undecided as to what's the best, fastest , and most reliable programming language for this project.
    I'll be running 24/7 web robots which need to be super fast.

    My choices came down to these two:

    Python
    C/C++

    Which one do you think is best in terms of speed.

    P.S. I don't care if C is harder to program. I am looking for performance.

  2. #2
    SitePoint Wizard chris_fuel's Avatar
    Join Date
    May 2006
    Location
    Ventura, CA
    Posts
    2,750
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    C because guess what python is written in If you're concerned about speed as well, try to keep C++ out of the loop. That's just another library / set of headers you're going to have to bring in.

  3. #3
    SitePoint Enthusiast
    Join Date
    Dec 2004
    Posts
    42
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    C is obviously going to be faster than Python.

    Saying that, Python is so quick that it may be a better choice if you're just experimenting with new ideas. You can always port it back to C once you've worked out what you're doing.

  4. #4
    Afraid I can't do that Dave Hal9k's Avatar
    Join Date
    Mar 2004
    Location
    East Anglia, England.
    Posts
    640
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by chris_fuel View Post
    C because guess what python is written in If you're concerned about speed as well, try to keep C++ out of the loop. That's just another library / set of headers you're going to have to bring in.
    Actually, I'd have to say that is very misleading. The expertise invested in the STL in C++ means that if you want to use any of the most basic high level language features, C++ beats C coupled with features you invent to make it usable.

    Of course if you're a kernel developer, you wouldn't want these high level features, but you would still be working with an existing highly-scrutinised infrastructure.

    It's ridiculous not to want objects / templates / basic memory management when you're doing generic programming. It's like the argument that assembler would be quicker than C, but considering the extra amount of time it would take the programmer, it ceases to be economical.

  5. #5
    SitePoint Wizard chris_fuel's Avatar
    Join Date
    May 2006
    Location
    Ventura, CA
    Posts
    2,750
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Misleading how? C has to link against less, so it's going to be faster (unless you can't program well, but I don't think C++ is going to help much with that). I never said anything about maintainability, as the poster specifically said:

    P.S. I don't care if C is harder to program. I am looking for performance.

  6. #6
    Afraid I can't do that Dave Hal9k's Avatar
    Join Date
    Mar 2004
    Location
    East Anglia, England.
    Posts
    640
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by chris_fuel View Post
    Misleading how? C has to link against less, so it's going to be faster (unless you can't program well, but I don't think C++ is going to help much with that). I never said anything about maintainability, as the poster specifically said:
    I wasn't referring to maintainability, I was looking at it from a speed standpoint.

    But that's the point, if you can't program well, C++ and the STL will help you to some extent. Unless you're do something inefficient like using a list for random access, using something out of the STL is going to save some from messing up with their own data structures.

    Linking against less isn't really a compelling argument. For one the general rule of thumb is that as long as you aren't using certain features of C that are incompatible with C++, programs you write in C and compile using a C++ compiler will have comparative speed. C++ doesn't add that much overhead, unless you want to use certain features, which if you wanted to use, you'd have to make your own version in C anyway and ultimately at best incur the same performance overhead.

    Have a look at this sorting example demonstrating the point I'm trying to make. Especially this part:

    STL has optimized algorithms that I could write, if I had the time and desire to read research papers in journals about the state of the art in sorting algorithms.* However, I don't have a lot of time, so it is likely that if I were forced to write a sorting algorithm, I would end up writing insertion sort or (if running time was important) quicksort, and my own quicksort is unlikely to be as fast as the one included with STL.
    ...

    Template functions save both development time and run time. They retain the flexibility of general-case library routines. No longer do we have to make this tradeoff. We can get better algorithms, a good implementation, less coding time, and fewer bugs.
    Considering the original poster is even comparing Python in terms of speed to C or C++, it is a fair assumption to make that they haven't spent most of their time reading research journals to create highly optimised algorithms.

  7. #7
    chown linux:users\ /world Hartmann's Avatar
    Join Date
    Aug 2000
    Location
    Houston, TX, USA
    Posts
    6,455
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    From a simple memory management perspective, I'd go with C++.

    You could build an extension of Python that does the brunt of the work and use Python for the front-end.

  8. #8
    SitePoint Wizard samsm's Avatar
    Join Date
    Nov 2001
    Location
    Atlanta, GA, USA
    Posts
    5,011
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I wouldn't think language matters much, seems like the wrong question at this stage.

    We also don't know how much of the web is being crawled ... if it is all (or a lot) of it, that brings up some substantial data storage/indexing issues as well as queuing, etc. These concerns kind of dwarf language of robot.

    Given the unavoidable delay when sending a request and getting a response back, I don't think you'd see much of a difference in the robot portion of the application.

    It's going to be your architecture (not the language) that is going to make this work.
    Using your unpaid time to add free content to SitePoint Pty Ltd's portfolio?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •