Web Scraper 01
Web Scraper 01 is project number 95861 posted at GetAFreelancer.com. Click here to post your own project.
Closed
(Selected Service Provider thenetfire)
| Status: |
Closed
|
| Budget: |
$30-100
|
| Created: |
10/08/2006 at 1:50 EDT |
| Bidding Ends: |
10/09/2006 at 1:50 EDT
|
| Project Creator: |
broomie123
Buyer Rating:           (2 reviews)
|
| Description: |
I require a program that will run in Windows adeally as a GUI that will harvest australian fax numbers from the internet.
I have found a program at http://www.troyeesoft.com/download.htm that does gives a good idea of what I want the program to do (mainly in terms of the output) but this software will not allow me to batch keywords that are used in the searches.
IE I have a LIST1 (can be a local .TXT file on my hdd) containing keywords "automotive, air, car, electric, house, ant, ..." and a LIST2 containing keywords "fax 02, fax 03, fax 05, ...)
Basically take a keyword from list1 and perform the search on all the search engines, combined with list2.
IE First search would search on "Automotive fax 02", then "automotive fax 03" etc .... and output to a CSV file called automotive.csv
Then go back and run the entire search on the next keyword from List1.
The output CSV file would contain the original faxnumber, normalized faxnumber in the format +61nnnnnnn, website it came from, any adjacent telephone number.
There should be a few rules to create the numbers. such as i) ignore numbers less than 10 digits. Ignore numbers not beginning 02,03,05,06,07,08,1800,1300,+61. Remove hyphens,underscores,brackets,spaces from the numbers to create normalised numbers.
A facility to be able to modify the search string sent to the search engines would be desirable.
I would like the program to search through as many pages of search results as is possible :)
I require the full source code.
Any ideas you come up with to make this better appreciated :-)
|
| Job Type: |
|
| Database: |
(None)
|
| Operating system: |
MS Windows
|
| Bid count: |
7
|
| Average bid: |
$ 83
|
|

|