Freelancer

Data Mining - Scrape a database from a website  

Data Mining - Scrape a database from a website is project number 323328
posted at Freelancer.com. Click here to post your own project.


| More Free Trial For New Buyers
 

Status:

Selected Providers: SigmaVisual

Budget: $30-250

Created: 10/02/2008 at 7:44 EDT

Bid Count: 17

Average Bid:
$ 141

10/04/2008 at 7:44 EDT

Project Creator: Lyricist
Employer Rating: 10/1010/1010/1010/1010/1010/1010/1010/1010/1010/10 (33 reviews)

Bid On This Project
 

Description

I need you to make a script that will run either www.metrolyrics.com or www.lyrics007.com and populate a mysql database (the reason that this needs to be a script is there are probably 700,000+ entries in total). I prefer to the script to be in PHP, but if you have other, faster methods, then you can use those as well.

The scraped data should go into 2 tables in mysql.
1st table: artists

fields:
a_name (name of the artist, i.e. "Britney Spears"),
a_id (incremental artist ID 1 by 1 starting from 1)
a_alias_plain (url field - it'll be structure "artist-name" the multiple words are separated by dashes. All words are lower case. All non-numeric/non-alphabet characters must be parsed out. Make sure there is only 1 dash separating each word)
a_alias_lyrics (url field - it'll be the structure "artist-name-lyrics", mutliple words are separated by dashes and "-lyrics" is appended at the end. All words are lower case. All non-numeric/non-alphabet characters must be parsed out. Make sure there is only 1 dash separating each word)


2nd table: songs
fields:
s_id (id of the song, incremental 1 by 1 starting with 1)
s_name (the name of the song, i.e. "Feel The Way")
s_text (the actual text of the song, I only want the text and not any other stuff on the page)
s_artist (this is going to be the Artist's ID from a_id - this is so that I can associate which song is for which artist)
s_alias_plain (this is an url field - structure is "song-name", each word is separated by dashes. All words are lower case. All non-numeric/non-alphabet characters must be parsed out. Make sure there is only 1 dash separating each word)
s_alias_lyrics (this is the 2nd url field just in case, each word is separated by dashes with "-lyrics" appended at the end. All words are lower case. All non-numeric/non-alphabet characters must be parsed out. Make sure there is only 1 dash separating each word)

Database should have proper collation so that all special characters are displayed.

The whole database should probably have 700,000+ entries. I don't want to wait more than 5 days, so if you can complete it within that time frame, feel free to bid. I am not paying more than $100 so please don't bid higher. I need to start as soon as possible, so if you give me a good bid, you could even start working today.

Please only bid if you have read the requirements fully.


Additional information submitted:

10/02/2008 at 9:31 EDT:
Just to clarify, I want the whole database completed, and I also want to have the script from you just in case.


Messages Posted:2 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

 

100

5 days

10-03-2008 04:29 EDT

Please check PMB.

help

 

100

0 days

10-02-2008 08:48 EDT

Hi, Kindly have a look at PM, Thanks.

help

 

100

4 days

10-02-2008 09:13 EDT

I can help with that

help

 

80

2 days

10-02-2008 09:20 EDT

I am 5 year experienced Linux based programmer. See PM for Details

help

 

100

2 days

10-02-2008 10:56 EDT

Highly excited to do this job. Please check PMB.

help

 

200

5 days

10-02-2008 12:22 EDT

hi.. those people who had bid lesser than 100 or 100 dollar's are all fakes or don't know scrapping is all about.And you want within that amount of money and time, it is really tough.I'm saying this because I've just finished scrapping/crawling a site using regular expressions and how tough and hectic i cannot express. so please mark my words, even i didn't get its not a great problem. but go with real programmers...Regards

help

 

250

2 days

10-02-2008 14:59 EDT

Hi,please check PM.

help

 

230

5 days

10-02-2008 07:50 EDT

We can do this for you. The task is interesting. Please let us know to which site you wanna give priority to fetch data from the given two. The task is not critical at all. but it should get completed with proper care as we have to deal with html formates to fetch data. Pelase check your PM, We are ready to start with :) and just waiting for you to select us. We will definetly deliver you the expected output with no compromise. Regards

help

 

100

5 days

10-02-2008 09:13 EDT

Hello, This is a placeholder bid - please see pm for details. regards, Satsco.

help

 

50

3 days

10-02-2008 09:32 EDT

Please check my PMB. Thanks.

help

 

245

10 days

10-02-2008 08:00 EDT

We will deliver you what you need in the specified time with 100% surety of Clear Data . Waiting for your PM Thanks

help

 

200

2 days

10-02-2008 08:37 EDT

(No Feedback Yet)

Kapow Robot is the solution to this problem. WayNwill ( http://www.waynwill.com/kapow.htm ) recognized expert in Kapow Robot Development.And having a development facility in India helps us provide extremely cost-effective robot development. Thanks WayNWill

help

 

200

1 day

10-02-2008 09:37 EDT

(No Feedback Yet)

I can do that with another method fast method. Contact me for details

help

 

50

2 days

10-02-2008 10:28 EDT

(No Feedback Yet)

Hello I can realize it on Perl in 2 days max

help

 

50

1 day

10-02-2008 11:48 EDT

(No Feedback Yet)

Web scraping is my strong point. Please view pmb for an example.

help

 

250

7 days

10-02-2008 13:16 EDT

(No Feedback Yet)

Hi! I have Kapow Mashup Server to do it faster. But i need win or linux server to run bot.

help

 

100

4 days

10-02-2008 14:00 EDT

(No Feedback Yet)

Please check your PM.

help


    Bid on this Project