Freelancer

Duplicate word Report Generator support english and chinese  

Duplicate word Report Generator support english and chinese is project number 59070
posted at Freelancer.com. Click here to post your own project.


| More Free Trial For New Buyers
 

Status:

Selected Providers: basestring

Budget: $30-100

Created: 05/04/2006 at 6:46 EDT

Bid Count: 6

Average Bid:
N/A

05/19/2006 at 6:46 EDT

Project Creator: davidchan
Employer Rating: (No Feedback Yet)

Bid On This Project
 

Description

Input A:
multiple files in CSV, TXT format contains several thousand word or words list seperated by comma, space or new line. eg. level 1.txt, level 2.txt, level 3.txt, level 4.txt, dictionary.txt

Input B:
multiple files in TXT format contains story from 1 page to several hundred pages.

Output:
A txt file and html file contain the whole Input B, with word count, word appear frequency, word count after remove duplicate, word list from each level.txt, percentage of word from level 1.txt, level 2.txt, level 3.txt, level 4.txt and un-identific word percentage. When a dictionary file is present, the explaination will be insert beside the word.

Tricky part:
Chinese character is in 2 bytes, sometime 3 bytes. Word can consist of more than 1 characters. And there is no space between words.
eg. a ab abc abcd is different word.
the sentence aabcdababaabc has 6 different words.

The budget can increase, if more options is provided


Additional files submitted:
Level 1.txt
Level 2.txt
story.txt

Messages Posted:0 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

Bids are hidden by the project creator. Log in as the project creator or as one of the bidders to view bids. You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


    Bid on this Project