Project Detail

Parsing messy HTML using PERL/REGEX  

Parsing messy HTML using PERL/REGEX is project number 419157
posted at Freelancer.com. Click here to post your own project.

 

| More Free Trial For New Buyers
 

Status:

Selected Providers: zeke

Budget: $30-250

Created: 04/15/2009 at 15:59 EDT

Bid Count: 12

Average Bid:
$ 68

04/25/2009 at 15:59 EDT

Project Creator: docdudetheman
Employer Rating: 10/1010/1010/1010/1010/1010/1010/1010/1010/1010/10 (1 reviews)

Bid On This Project
 

Description

This project consists of writing a perl script which would allow to parse a large number of messy html files with non-standard regular expressions. I have attached a representative html file to give you an idea. I would like to transform the content of the html file into a comma separated file of the following form:

Category 1; Subcategory A; Text of paragraph 1 in Subcategory A
Category 1; Subcategory A; Text of paragraph 2 in Subcategory A
Category 1; Subcategory A; Text of paragraph 3 in Subcategory A

Category 1; Subcategory B; Text of paragraph 1 in Subcategory B
Category 1; Subcategory B; Text of paragraph 2 in Subcategory B

Category 2; Subcategory C; Text of paragraph 2 in Subcategory C
Category 2; Subcategory C; Text of paragraph 2 in Subcategory C

Category 2; Subcategory D; Text of paragraph 1 in Subcategory C


In total there are in between 1 and 12 categories per html page. The number of categories varies for each html page. Furthermore, the amount of subcategories and corresponding paragraphs differs. I have played around with Perl and I have identified some useful regular expressions. However, I am not a programmer and I just don't have the time at the moment to learn perl and code it myself. I think it is a relatively straightforward task for somebody knowing how to program in perl.


Additional files submitted:
example.htm

Messages Posted:0 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

 

60

2 days

04-15-2009 17:55 EDT

I can do this work. Thanks, Suresh

help

 

30

1 day

04-16-2009 02:07 EDT

I can do this job for you. See PM for details.

help

 

50

0 days

04-15-2009 19:07 EDT

I am experienced Perl programmer. Ready to start right now and finish within several hours. My bid is for fast professional job. Please contact in PMB if you have any questions. Best Regards, Zeke

help

 

60

2 days

04-15-2009 19:46 EDT

Pleaes check PM. Thanks.

help

 

50

2 days

04-15-2009 19:57 EDT

Check the Pm pls

help

 

35

1 day

04-15-2009 23:50 EDT

Hi, I am an expert in Perl, and have written several html parsing scripts in Perl that power some nice websites. I can do this job easily and start right now. Kindly revert back to take this forward. Nick

help

 

45

1 day

04-17-2009 11:37 EDT

I have a working prototype that does the parsing on your sample and most of the output formatting ready now. I would expect to be able to provide a completed program in a few hours.

help

 

100

5 days

04-15-2009 19:04 EDT

(No Feedback Yet)

Looks like an interesting project; see PM.

help

 

70

2 days

04-15-2009 22:15 EDT

(No Feedback Yet)

I can do it! I have 8 years of experience in PERL.

help

 

150

7 days

04-16-2009 00:03 EDT

(No Feedback Yet)

Experienced Perl programmer - scraped lot of sites before

help

 

100

1 day

04-16-2009 12:58 EDT

(No Feedback Yet)

I can fix this task. Warm Regards Sagar

help

 

70

3 days

04-17-2009 16:58 EDT

(No Feedback Yet)

Hi! I'm ready to it. The delivery date: about 3 days with test and fixing.

help


    Bid on this Project