Project Detail

Perl script to match text in different files  

Perl script to match text in different files is project number 130764
posted at Freelancer.com. Click here to post your own project.

 

| More Free Trial For New Buyers
 

Status:

Selected Providers: dims

Budget: $30-100

Created: 03/08/2007 at 23:34 EST

Bid Count: 8

Average Bid:
$ 60

03/22/2007 at 23:34 EDT

Project Creator: coombs
Employer Rating: 10/1010/1010/1010/1010/1010/1010/1010/1010/1010/10 (6 reviews)

Bid On This Project
 

Description

We need a simple Perl script.

Basically this is what needs to be done:

Each line in file 1 must be matched against those in the file 2 and inserted into file 2 immediately after the closest match.

Format of the files:
File 1 "Newdata.txt" is a list of addresses. All lines in this file contain one to four segments of text. Each segment is separated by a comma. Some segments contain 3 segments and others contain all 4.

Here is an example:

NEWCASTLE/UNIV NEWCASTLE,FAC MED & HLTH SCI,HLTH SERV FLIGHT 302,RAAF BASE WILLIAMTOW

Segment 1: Format is LOCATION/INSTITUTION NAME. Maximum length of this segment is 51 characters
Segment 2: Maximum length is 30 characters
Segment 3: Maximum length is 20 characters
Segment 4: Maximum length is 20 characters

This file is not sorted in any order and may be sorted if necessary.

File 2 "Database.txt"
This is a quite large file with more than 200,000 lines sorted by different address groups. Each address group is separated by a blank line and the first line of the each group ends with a colon and a letter code with one to three characters. All the other lines are in the same format as those in Newdata.txt.

This is what needs to be done in detail:

For nearly all the lines in Newdata.txt file, there is almost an identical line in the Database.txt. However, the match in the database file may contain only 19 characters (instead of 20) for segments 3 and 4.
Therefore, the Perl script should match:
segment 1 exactly as it is in both files
segment 2 exactly as it is in both files
segment 3, if there is one, up to the first 19 characters in both files
segment 4, if there is one, up to the first 19 characters in both files

Once the match is found, the line should be inserted immediately after the closest match in the Database.txt and must be removed from the Newdata.txt. Those not matched are left in Newdata.txt.

Also, whenever a line is inserted into the Database.txt please mark it with * , preferably at the beginning of the line, so we can check the accuracy of the script.

Please make sure that the Perl script will not alter the existing data or the order of each line in Database.txt.

I have included the two sample files and the first 11 lines in "Newdata.txt" do have a match in "Database.txt". For example, the line 1 in Newdata.txt matches the line 3 in Database.txt, except for the last character in segment 3.


Additional files submitted:
newdata.txt
database.txt

Messages Posted:2 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

 

50

1 day

03-09-2007 02:12 EST

Can do it easy

help

 

30

1 day

03-10-2007 11:48 EST

Dear Sir, I am experienced perl developer. I have already completed a number perl jobs successfully. I have gone through your given project description. I can provide the solution in one day.

help

 

35

1 day

03-09-2007 16:56 EST

Hello, I'm ready to start right now

help

 

100

3 days

03-09-2007 01:19 EST

I have more than 7 years of perl programming experience, so this is not a difficult task at all. With best regards, Gaspar Chilingarov

help

 

75

1 day

03-09-2007 04:45 EST

should be a fairly easy task to accomplish. - Mickalo

help

 

40

2 days

03-09-2007 02:51 EST

(No Feedback Yet)

details on pm!

help

 

100

3 days

03-09-2007 04:12 EST

(No Feedback Yet)

This is an easy task.

help

 

50

0 days

03-09-2007 06:38 EST

(No Feedback Yet)

Good job for perl programmer.

help


    Bid on this Project