Project Detail

Website Crawler to identify phrases in html source code  

Website Crawler to identify phrases in html source code is project number 293178
posted at GetAFreelancer.com. Click here to post your own project.

 

| More
Free Trial For New Buyers
 

Status: Cancelled

Selected Providers: -

Budget: $30-250

Created: 07/28/2008 at 21:36 EDT

Bid Count: 10

Average Bid:
$ 197

07/30/2008 at 21:36 EDT

Project Creator: PatrickKahuna
Employer Rating: 10/1010/1010/1010/1010/1010/1010/1010/1010/1010/10 (43 reviews)

Bid On This Project
 

Description

I need a crawler to identify phrases in the html of websites, for example "google analytics". There will be about 10 phrases in total, i want this to be an input that i can control. I want to be able to control the depth of the crawl in terms of how many levels "deep" the crawler goes into the website (e.g., home page --> about us --> management would be 3 layers deep). Also, i want to be able to control the total number of pages crawled per site, e.g., cut-off search after 100 pages crawled.
Finally, the crawler needs to be able to crawl 20,000 sites in about a week. Therefore, the winner bidder needs to be able to build a "fast" crawler--e.g., utilizing multi-threading etc. Also, i will need to be able to upload the urls of the websites I want to crawl.
Finally, this crawler needs to be completed in a couple days.
This crawler should be straightforward for a skilled programmer.

Job Type

Messages Posted:0 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

 

215

2 days

07-28-2008 21:57 EDT

Hi, Please check PMB.

help

 

150

5 days

07-28-2008 22:01 EDT

I'm the skilled developer that you need, regards...

help

 

200

5 days

07-29-2008 04:22 EDT

Hi Patrick, I can develop such crawler as python/django web-service (more time is required) or as standalone python script (lessen time). Optimistic timeframe is just about two days. Pessimistic estimation (with huge amount of risk time) is the whole current week. I have experience with sites parsing/crawling and a script usually can parse about 50,000 pages per day without multi-threading. Indeed, multi-process/multi-thread crawler will be even faster. Have a nice day, Nikolay.

help

 

188

5 days

07-28-2008 22:21 EDT

please check pm

help

 

150

2 days

07-29-2008 01:15 EDT

I already have done very similar script.Please see PMB for more info and demo. Thanks for Your time.

help

 

250

5 days

07-29-2008 03:30 EDT

Hi, Highly skilled & Experienced with crawling application. Please see PMB. Regards, Shyam

help

 

200

9 days

07-29-2008 03:17 EDT

Dear sir, Thank you very much for giving us opportunity to participate your project. We possess 5 years of Experience in such operation. Please check the PM for more details. We provide 100% perfect result. We look forward to hearing from you. Thank you for your consideration. Thanking you Wasimul Haque

help

 

175

5 days

07-28-2008 22:53 EDT

(No Feedback Yet)

I'm a skilled programmer in need of work! Good communication & experience.

help

 

250

7 days

07-29-2008 01:44 EDT

(No Feedback Yet)

We have similar project and we can offer you the professional services, the team with over 10+ years working experiences, established in Jul, 2004. please kindly check PMB for details, thank you.

help

 

190

3 days

07-29-2008 02:51 EDT

(No Feedback Yet)

Concoct Information Technology is a global technology solutions provider to enterprises, consumers & technology companies. Concoct follows a support- centric model to all its services that help its clients leverage IT to align business objectives.

help


    Bid on this Project