Recipe Crawler is project number 485843
posted at Freelancer.com. Click here to post your own project.
Bid Count: 9
Average Bid:
$ 508
08/15/2009 at 5:48 EDT
Project Creator:
travisdh
Employer Rating: ![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
![]()
(11 reviews)
08/10/2009 at 17:17 EDT:
Edit: Can be in windows if needed, but linux is prefered!
08/11/2009 at 5:34 EDT:
I have updated the list of sites i would like to crawl, we may as well keep it simple for the start and aim to keep costs low as this is a small home project with a small budget. This list is the list of sites i would like to crawl, and get all recipe information from the entire domain.
Information that should be collected is all the recipe information, including title, ingredients, description / summary, serving sizes, notes, categories, recipe types (dinner, supper etc) recipe page url, recipe source, any recipe information, any nutritional information. Basically any part of the site that is used for the recipe. Contact for more info.
Information should be stored in the database, (the entire page, images etc) and then that information should be processed and stored in the database with your own fields, and then that information taken and inserted into the phprecipebook database i mentioned earlier.
So your database that the html page is stored in, would be processed and put into your own database table that would include ingredients, descriptions, instructions, titles, sources, images, categories etc. Then the information from that table should be inserted into the phprecipebook.
www.taste.com.au
www.epicurious.com,
www.recipesource.com,
www.cooking.com,
www.recipezaar.com,
www.allrecipes.com
www.Foodnetwork.com
http://fooddownunder.com/
http://www.yumyum.com/
www.chow.com
www.cdkitchen.com
http://recipes.alastra.com/
What will happen is in the future i will contact you to add new sites to the crawler, i want to be able to run the crawler on my computer, or multiple computers if possible, but i also want to keep costs down so whatever you can do.
MUST BE ABLE TO WORK THROUGH PROXY SITES SO MY HOME IP ADDRESS DOES NOT GET BLOCKED, BUT CANT RECORD ANY HTML FROM THOSE PROXY SITES, AND LINKS / IMAGES ETC STORED MUST REFERENCE THE FOOD RECIPE SITE AND NOT INCLUDE THE PROXY INFORMATION / URLS.
|
Job Type |
|