GetAFreelancer.com
 
Find projectsSearch
Sign Up | Log in | Top Rated Users | Browse projects | Post Project | RSS feeds | Articles
 

News Aggregator/Crawler for 27 Sites - Perl/python

   Click here to post similar project

News Aggregator/Crawler for 27 Sites - Perl/python is project number 272574 posted at GetAFreelancer.com. Click here to post your own project.

Status: Closed (Cancelled by Service Buyer)
Selected Providers: -
Budget: $250-750
Created: 06/09/2008 at 18:52 EDT
Bidding Ends: 06/10/2008 at 18:52 EDT
Project Creator: programmingbids View PM Post PM
Buyer Rating: 10.00/1010.00/1010.00/1010.00/1010.00/1010.00/1010.00/1010.00/1010.00/1010.00/10
(37 reviews)
Description: Hi there,

We need experienced professionals with following coding languages: Perl maybe Python and we are not sure if its can be done in PHP (we are though open to suggestions).

The work will involve three main aspects:

1) A boot to crawl each site (27 sites).

2) A script to find related/similar news by text linguistics -patter analysis- or any way that you know that can this be done.

3) What is crawled has then to be indexed to a database and made available by search, possibly using this open source search software; http://www.sphinxsearch.com

We have 27 news sites that we want to be crawled/spider with a boot; we assume that as the 27 sites are different the code for each might have to be slightly different as well.

Most sites will need to be crawler every 10 to 15 minutes and some other every 30 minutes. Only the front pages of each site are to be checked, but maybe we can in some sites juts check the RSS feed and get the data from there possibly.

What coding language to use here: we know that Perl by default a very good text based coding language; that is why we suggest that work would be done with Perl; that is the crawling + the related stories script. The other reason is that we know that sites like www.techmeme.com and www.megite.com have been coded in Perl and as you can see the stories under the “RELATED” for example in techmeme.com are very good. Therefore, it seems Perl as a text-based language can achieve a good results here. At the end of the day we leave it to your expertise. We also know that a web spider can be written also in PHP, but we are not sure of its capability!!!

Note: all work you will do has to be documented, because we want that if the future both you and another coder has to fix something they will understand the code and be able to read it through. THEREFORE, WE WANT VERY PROFESSIONAL WORK.

So let us know your experience/expertise with both scripts that can spider web pages and in addition scripts that can do text linguistics analysis and be able to find related news by text analysis on the news tiles and news description. we want to build a relation with you as well. Once of the reason why we will probably not work with previous suppliers is because they don’t have expertise in Perl or Python; so this is an opportunity to you or your company to join us as a future long term partner.

You will also be working on your local production server until we are ready to move the our live server that we still need to set up; so once we are happy with your work then we move to our server.

You will be only doing programming; another company will do all graphics and some other small things like users account section…etc; so only the three aspects mentioned above is what you are biding for. Crawling + related script + search: in other words you will do the core of the whole project.

Send your bid as soon as possible; we have some detail description about each section and we can provide it to you upon request. If possible, it would be nice to know if you have extensive experience with web crawler or aggregators and in special if you can accomplish the “Related” news script, which is very important to us.

We can probably use escrow account and pay as per steps are accomplished...

PB.
Report violation
Job Type:
  • Perl/CGI
  • Python
Database: (None)
Operating system: (None)
Bid count: 6
Average bid: N/A

 

Related project
 
Nonpublic project #341535 Featured Nonpublic
login to view
 

FREE Trial project for new buyers!

 

View Project Message Board     Post Message on Project Message Board
Messages Posted: 0

If you are the project creator or one of the bidders Log in as project creator or bidder for more options

Bid on This Project

Service Providers PMB Bid Delivery Within Time of Bid Provider Rating
Bids are hidden by project creator. Log in as project creator or bidder as the project creator or as one of the bidders to view bids.
Bid on This Project

 

[ Outsourcing ] [ GAF Top Users ] [ Secure Forms ]

What is GetAFreelancer.com? ( Read about the company )

A freelancer is an independent worker, not on salary, hired instead on a project basis. Would you like to outsource your next project? Would you like to make money as a freelancer? Click Sign Up to start! We provide a safe escrow environment and you don't release the money until project is completed. Bookmark our homepage to make sure you don't forget about our website next time you need to develop an IT project for yourself or your company. Outsource projects and save a lot of money. Getting affordable freelance work, freelance programming and custom web design done for your website has never been easier. Our mission is to find the best possible freelance workers at the best possible price.

Our escrow feature is developed to protect both buyers and sellers. Buy services with help from our secure escrow system. We have thousands of satisfied clients around the world. Web Development doesn't have to be expensive. Outsourcing will cut your expenses by more than 50%. Outsourcing is hiring an outside organization to perform services such as information processing and applications development.

Find Webmaster Resources and Webmaster Forum. Take a look at Search Engine Submission.