Project Detail

3 easy straightforward scripting tasks - data scraping  

3 easy straightforward scripting tasks - data scraping is project number 343612
posted at Freelancer.com. Click here to post your own project.

 

| More Free Trial For New Buyers
 

Status: Cancelled

Selected Providers: -

Budget: $30-250

Created: 11/14/2008 at 14:46 EST

Bid Count: 10

Average Bid:
$ 151

11/17/2008 at 14:46 EST

Project Creator: parsingjobs2
Employer Rating: (No Feedback Yet)

Bid On This Project
 

Description

_________

list with 1800 URLs will be given:
need: firstname, lastname
and if given: title, field_n, position, picture_url, email, tel, fax

example1:
http://www.helios-kliniken.de/klinik/wuppertal/fachabteilungen/medizinische-klinik-3-kardiologie-herzzentrum/mitarbeiter.html
firstname: Reiner

lastname: Füth
title: Dr. med.
field1: Internist und Kardiologe (directly under the name)
field2: 5Medizinische Klinik 3 - Kardiologie - Herzzentrum (taken from the last step in the breadcrumb navigation on top)
position: Oberärztinnen / Oberärzte
picture_url: http://www.helios-kliniken.de/uploads/pics/fueth.png
email: reiner.fueth AT helios-kliniken.de

some pages will only have one or two contacts:
example2:

http://www.helios-kliniken.de/klinik/bad-nauheim/fachabteilungen/anaesthesie-und-schmerztherapie/anaesthesiologie-notfallmedizin-und-sch

merztherapie.html
title: Dr. med.
firstname: Klaus-Peter
lastname: Ratthey
position: Chefarzt (says so on the right, in the blue box)
field: Anästhesiologie, Notfallmedizin und Schmerztherapie (taken from the last step in the breadcrumb navigation on top)
field2: Anästhesie und Intensivmedizin (in big font, like on all other pages)
email: Klaus-Peter.Ratthey AT helios-kliniken.de
picture_url: http://www.helios-kliniken.de/uploads/pics/Dr._Ratthey_01.jpg

some random examples from the 1800 list:
http://www.helios-kliniken.de/klinik/borna-leipziger-land/fachabteilungen/frauenheilkunde-und-geburtshilfe/helios-brustzentrum-nordsachsen/team.html?0=
http://www.helios-kliniken.de/klinik/bad-saarow/fachabteilungen/anaesthesiologie-intensivmedizin-notfallmedizin-und-schmerztherapie/intensivmedizin.html?0=
http://www.helios-kliniken.de/klinik/berlin-buch/fachabteilungen/allgemein-und-viszeralchirurgie/unser-team.html
http://www.helios-kliniken.de/klinik/bad-groenenbach/fachabteilungen/depressive-stoerungen.html?0=
http://www.helios-kliniken.de/klinik/krefeld/fachabteilungen/innere-medizinische-klinik-iii/unser-team.html?0=

ZIP,CITY and CLINIC need to be added to each contact, there are only 62 different ones and they can be derived from the base URL.
for example every contact that came from http://www.helios-kliniken.de/klinik/wuppertal
has ZIP: 42283 , CITY: Wuppertal , CLINIC: Helios- Wuppertal
there will be a list of the 62 base URLs with their corresponding ZIP,CITY,CLINIC matches.


_______

around 1000 contacts from:

STARTURL: http://www.kfh-dialyse.de/kfh-nierenzentren/nierenzentrum,,1,2.html
STOP: http://www.kfh-dialyse.de/kfh-nierenzentren/nierenzentrum,,254,2.html

Only scrape the contacts under "Ärztliche Ansprechpartner" NOT the ones at the bottom uner "Ansprechpartner Verwaltung"
example: http://www.kfh-dialyse.de/kfh-nierenzentren/nierenzentrum,,100,2.html
in this case there are two contacts, one of them:
title: Dr. med.
firstname: Otmar
lasname: Dörner
field: Internist
specialty: Nephrologie
E-Mail: Otmar.Doerner AT kfh-dialyse.de
ZIP: 55743
City: Oda-Oberstein
clinic: KfH Kuratorium für Dialyse und Nierentransplantation e.V. Ida-Oberstein Saarstraße 2 (can be found in breadcrumbs on top)

the others will have the same ZIP, city, and clinic, as they count for all contacts on the page

note: the emails are in images, but any simple ocr script (or even AutoIt) can read them (example: http://www.kfh-dialyse.de/dz_cnt/e-mail/254, )

__________


all 500 or so contacts from this page:
http://www.klinikverbund-suedwest.de/315.0.html?&no_cache=1&tx_kvswma[submit]=alle

extract the url from the "Visitenkarte" link and their field (because the field will not be listed in the detailed link)

example: 10th from the Top from the above link:
visitenkarte-URL: http://www.klinikverbund-suedwest.de/315.0.html?&no_cache=1&tx_kvswma[submit]=alle&tx_kvswma[uid]=104
field: Klinik für Neurologie
...and the rest from the link
title: Priv. Doz. Dr.
firstname: Guy
lastname: Arnold
position: Chefarzt
Telefon: 07031 98-12362
Fax: 07031 98-12364
E-Mail: neurologie.si AT klinikverbund-suedwest.de
clinic: Klinikum Sindelfingen-Böblingen
zip: 71065
city: Sindelfingen

______

Messages Posted:0 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

 

200

15 days

11-14-2008 19:51 EST

Hi.I have a lot experience with exctract data.thanks

help

 

250

20 days

11-14-2008 15:21 EST

(No Feedback Yet)

I am doing this for the first time. Need more info on this

help

 

100

15 days

11-14-2008 15:31 EST

(No Feedback Yet)

Hi, Ready for this. Please check PM

help

 

100

4 days

11-14-2008 20:25 EST

(No Feedback Yet)

I have done several web scraping scripts to harvest data. Would be happy to help. Please let me hear from you on this.

help

 

50

3 days

11-14-2008 23:05 EST

(No Feedback Yet)

Sir, I have looked at all three websites and have changed my scraping scripts to work with them. Please see my attached examples. I would be glad to work with you, just close the project and award it to me. Thanks you!

help

 

250

3 days

11-15-2008 01:22 EST

(No Feedback Yet)

Hello, Job is ready. Check please PM for more details. Thanks :) Soma

help

 

210

18 days

11-16-2008 06:45 EST

(No Feedback Yet)

Hi, I am willing to work on this project. Thanks.

help

 

50

4 days

11-17-2008 00:10 EST

(No Feedback Yet)

Please consider me in this project. I have understood what you want for the project and I am willing to work for it. Thank you

help

 

45

4 days

11-17-2008 11:07 EST

(No Feedback Yet)

pls see pm.

help

 

250

1 day

11-17-2008 13:19 EST

(No Feedback Yet)

I am very dedicated worker.. I can have this done for you in 24hrs from time of your approval! Thanks for the opportunity. Rhonda S. Twitty United States

help


    Bid on this Project