Freelancer

Website crawler for HTML content  

Website crawler for HTML content is project number 556542
posted at Freelancer.com. Click here to post your own project.


| More Free Trial For New Buyers
 

Status: Cancelled

Selected Providers: -

Budget: $30-250

Created: 11/23/2009 at 4:26 EST

Bid Count: 6

Average Bid:
$ 177

11/28/2009 at 4:26 EST

Project Creator: tompoes
Employer Rating: (No Feedback Yet)

Bid On This Project
 

Description

I need a crawler to identify phrases in the html of websites, for example "google analytics".

There will be about 5 phrases in total, i want this to be an input that i can control. I want to be able to control the depth of the crawl in terms of how many levels "deep" the crawler goes into the website (e.g., home page --> about us --> management would be 3 layers deep).

Also, i want to be able to control the total number of pages crawled per site, e.g., cut-off search after 100 pages crawled.

Finally, the crawler needs to be able to crawl 20,000 sites in about a week. Therefore, the winner bidder needs to be able to build a "fast" crawler--e.g., utilizing multi-threading etc. Also, i will need to be able to upload the urls of the websites I want to crawl.

Finally, this crawler needs to be completed in a couple days.

This is something that was allready asked a couple of months ago by somebody else. But I need it as well now.

Messages Posted:1 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

 

150

2 days

11-23-2009 05:55 EST

Hi, Please check your PM. Thanks.

help

 

150

7 days

11-23-2009 05:01 EST

I'm interesting in it. check pmb for detaisl.

help

 

150

2 days

11-23-2009 05:07 EST

(No Feedback Yet)

Contact me to clarify details on the project

help

 

230

3 days

11-23-2009 07:44 EST

(No Feedback Yet)

Hi, Please see some websites we've developed: http://yamaha-motor.com.vn/ http://megashares.vn/ ... and at http://vngia.com/ we've created price search engine website. In which have several crawler modules to crawl information over the internet. So that, I believe we can do your project well. You can email to these webmasters to confirm my name is Nguyen Minh Tuan. We are waiting for your reply. Thank you.

help

 

200

7 days

11-23-2009 15:59 EST

(No Feedback Yet)

Hello, Thank you for your clear specification and requirement, I wish all jobs on getafreelance.com were as clear and concise as your post. I suggest having a screen where you would enter (a) the phrases to search (b) search depth (c) max number of pages to search per site (d) file path for websites to process (e) file path for the output (f) other control information that may be required to help with the performance of the tool, like a restart from last site processed checkbox. The data entered above would be stored into the registry so that when you start the program again you would not have to re-enter it. You would press the 'crawl' button and away it would go. I propose building you a stand-alone program in Microsoft VB.NET to do this work, not PHP as you have indicated in the job type. The reasons for this are performance and usage related. You will get a much high processing rate with VB.NET as opposed to PHP. With PHP you have to spend time working with a web server and this adds another layer of complexity and things you have to do, with a stand-alone vb.net program you simply run it from your PC that has an internet connection. I'm a seasoned programmer/developer with 30 years experience building and supporting IT systems. I live in Wellington, New Zealand. I'm however very new to getafreelance.com, in fact this is my first bid ever. I pride myself for producing high quality software and I'm sure you won't be disappointed with my work. On winning the bid i would start immediately and have a first cut program for you to look at within 3 days, I would then proceed to complete fine tuning and adjustments to the development as required. Thanks again for considering my bid. Kind regards Nik.

help

 

180

7 days

11-26-2009 02:34 EST

(No Feedback Yet)

I can do this in PHP. This will be a multi-threading script, if we can say this. PHP doesnt naturally support it, but there are some tricks to implement it. I've the similar experience.

help


    Bid on this Project