Project Detail

News Aggregator/Crawler for 27 Sites - Perl/python  

News Aggregator/Crawler for 27 Sites - Perl/python is project number 272574
posted at GetAFreelancer.com. Click here to post your own project.

 

Bookmark and Share
Free Trial For New Buyers
 

Status: Closed
(Cancelled by Service Buyer)

Selected Providers: -

Budget: $250-750

Created: 06/09/2008 at 18:52 EDT

Bid Count: 6

Average Bid: N/A

06/10/2008 at 18:52 EDT

Project Creator: programmingbids View PM Post PM
Employer Rating: 10.00/1010.00/1010.00/1010.00/1010.00/1010.00/1010.00/1010.00/1010.00/1010.00/10 (42 reviews)

Bid On This Project
 

Description

Hi there,

We need experienced professionals with following coding languages: Perl maybe Python and we are not sure if its can be done in PHP (we are though open to suggestions).

The work will involve three main aspects:

1) A boot to crawl each site (27 sites).

2) A script to find related/similar news by text linguistics -patter analysis- or any way that you know that can this be done.

3) What is crawled has then to be indexed to a database and made available by search, possibly using this open source search software; http://www.sphinxsearch.com

We have 27 news sites that we want to be crawled/spider with a boot; we assume that as the 27 sites are different the code for each might have to be slightly different as well.

Most sites will need to be crawler every 10 to 15 minutes and some other every 30 minutes. Only the front pages of each site are to be checked, but maybe we can in some sites juts check the RSS feed and get the data from there possibly.

What coding language to use here: we know that Perl by default a very good text based coding language; that is why we suggest that work would be done with Perl; that is the crawling + the related stories script. The other reason is that we know that sites like www.techmeme.com and www.megite.com have been coded in Perl and as you can see the stories under the “RELATED” for example in techmeme.com are very good. Therefore, it seems Perl as a text-based language can achieve a good results here. At the end of the day we leave it to your expertise. We also know that a web spider can be written also in PHP, but we are not sure of its capability!!!

Note: all work you will do has to be documented, because we want that if the future both you and another coder has to fix something they will understand the code and be able to read it through. THEREFORE, WE WANT VERY PROFESSIONAL WORK.

So let us know your experience/expertise with both scripts that can spider web pages and in addition scripts that can do text linguistics analysis and be able to find related news by text analysis on the news tiles and news description. we want to build a relation with you as well. Once of the reason why we will probably not work with previous suppliers is because they don’t have expertise in Perl or Python; so this is an opportunity to you or your company to join us as a future long term partner.

You will also be working on your local production server until we are ready to move the our live server that we still need to set up; so once we are happy with your work then we move to our server.

You will be only doing programming; another company will do all graphics and some other small things like users account section…etc; so only the three aspects mentioned above is what you are biding for. Crawling + related script + search: in other words you will do the core of the whole project.

Send your bid as soon as possible; we have some detail description about each section and we can provide it to you upon request. If possible, it would be nice to know if you have extensive experience with web crawler or aggregators and in special if you can accomplish the “Related” news script, which is very important to us.

We can probably use escrow account and pay as per steps are accomplished...

PB.

Messages Posted: 0 View project clarification board Post message on project clarification board

Bid On This Project
 
If you are the project creator or one of the bidders Log In for more options
Bids are hidden by the project creator. Log in as the project creator or as one of the bidders to view bids. You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.

    Bid on this Project