Project Detail

crawling and extracting program  

crawling and extracting program is project number 425800
posted at Freelancer.com. Click here to post your own project.

 

| More Free Trial For New Buyers
 

Status: Cancelled

Selected Providers: -

Budget: $30-250

Created: 05/11/2009 at 9:46 EDT

Bid Count: 12

Average Bid:
N/A

07/10/2009 at 9:46 EDT

Project Creator: demols
Employer Rating: 10/1010/1010/1010/1010/1010/1010/1010/1010/1010/10 (5 reviews)

Bid On This Project
 

Description

this is a part of the description

it starts with the output from an extractprogram

like this

"Company","Address","Telephone","Mobile","Website","Email"
"Apotheek Centrum Schelle","Provinciale Steenweg 95 2627 Schelle","03 887 54 72","","",""
"De Lindeboom","Nationalestraat 119 2000 Antwerpen","","","",""
"Morel E","Brusselsesteenweg 298 2800 Mechelen","015 41 55 65","","",""
"Van De Mierop-Mestdagh BVBA","Lindenlaan 66 2340 Beerse","014 61 13 64","","",""
"Hooijmaaijer J","Clemenceaustraat 43 2860 Sint-Katelijne-Waver","015 21 22 93","","",""
"Horsten L NV","Vrijheid 98 2320 Hoogstraten","03 314 57 24","","",""
"Vandeweyer R","Oranjestraat 94 2060 Antwerpen","03 233 82 75","","",""
"Danckaert J","Ter Heydelaan 173-175 2100 Deurne (Antwerpen)","03 324 95 30","","",""
"Vermylen K BVBA","Leo Kempenaersstraat 7 2223 Schriek (Heist-Op-Den-Berg)","015 23 33 70","","",""
"Onze Apotheek cv","Antwerpsesteenweg 146 Bus 1 2500 Lier","","","",""
"Peleman","Schipstraat 1 2870 Puurs","03 889 23 63","","",""
"Ter Borcht BVBA","Bernard van Orleyplein 5 2650 Edegem","03 440 64 91","","",""
"De Lindeboom-Apotheek","Nationalestraat 119 2000 Antwerpen","","","",""




the program must work with diffrent steps

the first step is checking if there is a site ( iff i the input fille already has an url it can go imidiatly
to step 2 )


exemple "Pica Pica","Hofkwartier 20 2200 Herentals","014 22 02 55","",""

try the following urls www.picapica.be www.pica-pica.be they both have a site so it must be
crawld to look for an adress in this example this url www.pica-pica.be is the right one it can
use pica pica as businessname in the output and must add the url in the output

iff in the businessname is bvba , nv , one letter , and 't it may not be used in the url
exemple pica pica nv only try www.picapica.be or www.pica-pica.be not www.pica-pica-nv.be

another example "Sleepwise","Turnhoutsebaan 328B 2970 Schilde","03 385 31 21","",""
url www.sleepwise.be is a site crawl for the adres

this is the adress on the site
Turnhoutsebaan 225 - B-2970 Schilde

only the number 225 is diffrent , make the program so that it then uses this adress because only 1
thing is diffrent but use the adress from the site then in the output

also iff there is no .be try .com then like this example
"Poppels Meubelhuis","Zandkuilstraat 23 2382 Poppel (Ravels)","","",""

there is no .be but www.PoppelsMeubelhuis.com is a site an on that site is the right adress
Poppels meubelhuis, Tilburgseweg 64 (Slaapwinkel),
Zandkuilstraat 23 (Woonwinkel), B-2382 Poppel, België
tel.: +32 (0)14 65 78 54, fax: +32 (0)14 65 94 69
e-mail:

also here can the e-mail and faxnumber being added to the output the things between ( ) are not important

example "C-Meubel","Antwerpsesteenweg 19 2840 Rumst","015 31 77 16","",""
there is no www.c-meubel.be so also try www.cmeubel.be and that does exist there is also
the adress on the site so it is a good one las e-mail that can be addad to the output


another example "VI-Spring","Dorp 78 2230 Herselt","014 54 55 11","",""
has a site http://www.vi-spring.be/ but ther is not the adress so this cannot be use and this
businessname must be checked in the next step

another exaple businessname Hof van Aragon NV ( something between ( ) must not been used )
www.hofvanaragon.be is a site bus imediatly rediricts you to www.hva.be it must then look
for the adress on www.hva.be

another example businessname Zuid-West
www.zuid-west.be is a site and has the right adress

another example "Odrada Interieur NV","Molsesteenweg 46 2490 Balen","014 34 66 00","",""
no site for www.odradainterieur.be , www.odrada-interieur.be , www.odradainterieur.com www.odrada-interieur.com
but for www.odrada.be is a site and that site has the right adress

another example businessname is "de lindeboom" the urls that need to be tryd are www.lindeboom.be
www.delindeboom.be www.de-lindeboom.be www.delindeboom.com www.de-lindeboom.com www.lindeboom.com
iff there are sites the site must be crawld for the adress ( if the url i rediricted then that site must be crawled )
in this case www.delindeboom.be is the right site






iff there is an e-mail also add it to the output





----------------------------------------------------------------------------------------------------

Step 2 ( iff it has already a site )

if the listing has a site the program must check iff the site is still online
then the program must check iff the bussinesname is in the title

example "Luigi Lloyd Loom","Puursesteenweg 392B 2880 Bornem","03 899 26 35","","http://www.luigi.be"


the site is still online http://www.luigi.be/ the title is
<title>:: Luigi - Original Lloyd loom - Exclusive Rattan furniture - Outdoor furniture - Bedrooms ::</title>
Luigi Lloyd Loom is in the title so the businessname can be the same
( iff there was only luigi in the title the businessname must be changed in luigi

iff there is noting in the title crawl te site looking for the adress and the businessname , iff it doesnt
find an adress and a part ofthe businessname it must go to step 3 if it finds the adress and part of the
bussinesname ( example only luigi ) then it must use the part of the businessname ( luigi and not luigi Lloyd loom )
and adress in the output

Messages Posted:1 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

Bids are hidden by the project creator. Log in as the project creator or as one of the bidders to view bids. You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


    Bid on this Project