Project Detail

Words used in Wikipedia  

Words used in Wikipedia is project number 387392
posted at Freelancer.com. Click here to post your own project.

 

| More Free Trial For New Buyers
 

Status:

Selected Providers: egor10257

Budget: $30-250

Created: 02/16/2009 at 17:01 UTC

Bid Count: 3

Average Bid:
N/A

02/19/2009 at 17:01 UTC

Project Creator: cmbant
Employer Rating: 9.9444/109.9444/109.9444/109.9444/109.9444/109.9444/109.9444/109.9444/109.9444/109.9444/10 (36 reviews)

Bid On This Project
 

Description

Using the full latest English Wikipedia database, write a program to generate a frequency-ranked case-sensitive list of words used in the main entry pages. These should include single words and groups up to four words (hyphen or space-separated), only text (not Wiki tags), and taken from the middle of sentences (not the first word in each sentence, so all are correctly capitalized).

Provide list of all words and word groups that appear at least 10 times in Wikipedia, and provide a file containing ten complete sentences in which each word appears and name of wiki page on which it appears, e.g.

hypothesized
[page: Prion]
Prions are hypothesized to infect and propagate by refolding abnormally into a structure which is able to convert normal molecules of the protein into the abnormally structured form.
[page: Mars_Ocean_Hypothesis]
The blue region of low topography in the Martian northern hemisphere is hypothesized to be the site of a primordial ocean of liquid water.
...

I'm flexible in exactly what format the data is provided, and you can skip groups starting and ending with common stop words (a, the, etc).

The main objective is the result, so you can write the program in any language you like. You'll need to download the Wikipedia database from download.wikimedia.org; the project is very straightforward, but the database is quite large.

Messages Posted:0 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

Bids are hidden by the project creator. Log in as the project creator or as one of the bidders to view bids. You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


    Bid on this Project