Project Detail

Data collection program  

Data collection program is project number 364777
posted at Freelancer.com. Click here to post your own project.

 

| More
Free Trial For New Buyers
 

Status:

Selected Providers: Zlatipln

Budget: $30-250

Created: 01/02/2009 at 20:52 EST

Bid Count: 45

Average Bid:
$ 122

01/09/2009 at 20:52 EST

Project Creator: andrew4gaf
Employer Rating: 10/1010/1010/1010/1010/1010/1010/1010/1010/1010/10 (1 reviews)

Bid On This Project
 

Description

I have a need to harvest data from a web site on a weekly basis and need a program to do the work. I currently do it with a program I wrote but I need something more robust in order to run this more often.

The project is simple: download some html files and then extract the data in them and put them in a standard delimited ASCII file.

The program will simply query a web site to get the html pages that are available. There are about 100 per calendar date and the need is to be able to download the files starting with the current date and going to an end date. This means downloading between 35,000 and 100,000 html files each time the program runs.

The program should begin by getting the html files and storing them in a temporary folder.

The second step of the process is to parse the html files and extract the data contained in them. The HTML files keep the same format and are easy to extract as they are simple lists. ALL fields must be extracted.

The list have header information, so the header information must be repeated on each record of data created to ensure that the information stays together. For example, it will give the name and then a list of all the clients for that name.

The extracted data must be saved in a standard ASCII file format where each field is delimited by a character to be configurable in the program (example a tab). I will then take this data and import it into a database system for processing.

I do not need a program with a fancy user interface. It needs to be simple and functional. It must work on Windows XP Pro.

Attached is a ZIP file containing samples of the html files as well as the web site information.

The provider must submit a final program in executable format with all necessary files, and also the source code. He must have tested the program and must submit the results for one run between two dates, of a year time period (example: the provider can run the program on february 1st 2009 and put the end date february 1st 2010. The data collected must be submited to show the program works.).

If the provider does a good job on this, there are several other similar projects available for him. In the future when the format of these html files changes, I will ask the provider to modify the program.

This is a simple project but please only bid if you have done this type of work before and are sure you can deliver the work. I do not want to waste your time or mine. If you have questions please message me before you bid.

Thank you


Additional information submitted:

01/02/2009 at 21:43 EST:
Question asked: why keep the html files, why not just parse them and create the resulting output file.
Answer: The HTML files are kept in a temporary directory because they must be saved as backups in case the data is damaged in the future and needs to be re-parsed.

01/02/2009 at 22:02 EST:
Payment for this project will be made via escrow only.

01/03/2009 at 13:02 EST:
The final output should be in ONE delimited ASCII text file. All fields from all html files that are downloaded should be included on each record. The variation between the 3 types of html files in this project are minor so each record line may have about 30 fields, with some used or unused. The field is simply left blank if it is unused.

The program is to be run manually. It will not be run by MS Scheduler or other automated tool. No fancy automated feature is necessary, just the ability to specify a date range, specify the delimiter charachter, specify an output file path and name, and start and stop buttons. The program should also have a small option to play a sound file when the process is done, and another option to shutdown the computer after the process is done.

01/04/2009 at 22:57 EST:
"We'll charge you $30 per 200 entries"
Bids with comments like this are unprofessional and will be ignored. This defeats the purpose of using GAF.



Additional files submitted:
project-info.zip

Messages Posted:2 View project clarification board Post message on project clarification board

Bid On This Project
 

If you are the project creator or one of the bidders Log In for more options

 

250

0 days

01-03-2009 02:37 EST

help

 

200

5 days

01-03-2009 10:40 EST

Hello, please refer your PMB. Thank you.

help

 

100

3 days

01-02-2009 21:02 EST

Hello,Please refer your PMB.Thank you.

help

 

250

4 days

01-03-2009 09:12 EST

We can help in your project, please check PMB to see our related experience.

help

 

245

3 days

01-02-2009 23:53 EST

Hi, More info is in the PM. Best Regards, Yousef

help

 

90

3 days

01-03-2009 01:39 EST

See private message.

help

 

40

6 days

01-02-2009 23:31 EST

We are ready to do this project

help

 

80

2 days

01-03-2009 11:27 EST

I can do this job for you. See PM for details.

help

 

250

2 days

01-03-2009 04:48 EST

I am an expert in such tasks. Ready to start right now and finish as soon as possible. My bid is for fast professional service exciting my customers. Please contact in PMB to discuss details. Best Regards, Zeke

help

 

100

8 days

01-03-2009 04:36 EST

Hi, I am currently working on a scrapper project which is quite similar to this one. I will do this using C#. I am an expert with data processing and text extraction, that is my field of work. I would like to have a long term working relation with you and I'm sure I will deliver up to your needs. I'm open for discussion so hope to hear back from you soon. Regards, Ancosys

help

 

150

3 days

01-08-2009 22:46 EST

Very interested in your data collection project. Please check your PMB. Thanks.

help

 

200

3 days

01-03-2009 00:02 EST

Please see PMB for details

help

 

220

2 days

01-03-2009 18:47 EST

Hi there, I am a expert data extractor, I have been doing it for over 11 years. I have completed many tasks both on GAF and other sites including extracting info from websites and other places. Please see my reviews for previous references and let me know if you require anything further. Kind regards, Nash

help

 

100

1 day

01-03-2009 19:47 EST

i like this kind of job, will do it easily.

help

 

70

2 days

01-02-2009 21:55 EST

Hello, Please Check PMB

help

 

100

2 days

01-03-2009 10:29 EST

hi,please check your pm.

help

 

30

1 day

01-03-2009 14:27 EST

plz check pm

help

 

100

2 days

01-04-2009 15:17 EST

I worked on many similar scraping projects before. I'm a professional scrapper working in C#, C++, php. I can finish and deliver the program in a fastest possible time.

help

 

200

3 days

01-03-2009 00:20 EST

Please refer to PMB

help

 

50

1 day

01-02-2009 22:48 EST

I've done a lot similar projects. I have special modules in Python. PyCurl+MultiThreads (with errors reprocessing).

help

 

30

1 day

01-05-2009 18:29 EST

Hi! Please check PMB for demo.

help

 

100

7 days

01-03-2009 05:55 EST

Please See PMB.

help

 

150

2 days

01-08-2009 17:02 EST

I have a ready to go software for you which saves data in csv, text(ascii) and few other formats as well. It will help you to extensive extract data directly from the website without first extracting the html files. But in the case if it's compulsory for you to extract html files and then extract data indirectly from the html files, then this software has that capacity also. I have complete solution for you whether you want to extract directly or indirectly via html files. I am a security expert with world records in my field and lots of global achievements. I kindly request you to have a look at my GAF profile for further details. I provide exclusive and unique services which require excellent talent and expertise and which no one other then me can perform. I hope for a long term relationship with you for such extraction services. Kalpesh Sharma

help

 

150

2 days

01-03-2009 07:46 EST

Hi.I can do this quickly fast in C# or PHP.Best regards Ruben

help

 

126

2 days

01-04-2009 22:04 EST

I can complete it for u... I am expert in AJAX, WSDL, Web Services & Clients, JSF, J2ME, J2SE, J2EE/EJB3, JavaScript, NetBeans IDE v6.0, JBOSS, TOMCAT, HTML, XSLT, XML, ORACLE, MS-ACCESS, MySQL, SQL Server, etc... I am also readily available for chat (10 hrs/day) in skype / gtalk / MSN / YAHOO Meesenger Available Days : Mon - Fri Available Time : 8:30 AM to 6:30 PM Singapore Time [GMT + 8] Winners never Quits Quitters never Wins

help

 

95

1 day

01-04-2009 05:50 EST

Hi Andrew(?) I have done this sort of thing for the last 2 years. I am a professional software developer, with 12 years commercial experience. (Coincidentally, my daughter was in Montreal, studying at McGill, until last month!) To prove to you that can do it, I'll post the output of the 3 sample files as soon as I'm done. And if my output doesn't arrive by the time you're ready to award the project, then please pick someone else. The trick to making it more bulletproof is to convert the HTML content into XHTML using a free Microsoft utility. Then it is easy to convert the data to CSV using XSLT. Four questions: 1. Do you want the CSV output of all the files in the folder put into a single CSV file (for a single dB import operation)? 2. Do you want a console application (with configuration in a .config file), to make it easier to schedule it using something like Windows Scheduler? 3. My preferred tools are C#.NET. Any restriction on the version of .NET Framework? (Otherwise I'd automatically use v3.5SP1.) 4. Do you want my Unit Tests too? I'll write them anyway, as that's my style to ensure it all works correctly. I look forward to working with FineSoftware.com. regards, david

help

 

55

2 days

01-02-2009 22:49 EST

(No Feedback Yet)

Please see PM

help

 

50

2 days

01-02-2009 23:12 EST

(No Feedback Yet)

we are ready to start.

help

 

80

5 days

01-02-2009 23:49 EST

(No Feedback Yet)

I worked on this type of project, I scraped some websites to get inforamtion. Those pages on remote website are created runtime. I have good expeirence in this field. I have good knowledge of .Net Http request and regular expression. I also did give functionality to send request using available different IP address. I am ready to work on this project.

help

 

180

20 days

01-03-2009 02:02 EST

(No Feedback Yet)

Fugenx Technologies is a team of experts dedicated to Web development with updates technologies like Ajax, Ruby on rail, Joomla web 2.0 services. Professionals on php, html, xml, java scripting.

help

 

155

6 days

01-03-2009 02:58 EST

(No Feedback Yet)

Kindly check PMB.

help

 

30

2 days

01-03-2009 03:24 EST

(No Feedback Yet)

i have done proogs tht do data extraction frm websites and i have prebuilt custom components for tht

help

 

40

2 days

01-03-2009 03:41 EST

(No Feedback Yet)

I am experienced in .net 3.5/2.0 and visual studio 2008/2005 with sql server 2005. Have a good working knowledge in this type of projects.

help

 

200

20 days

01-03-2009 04:15 EST

(No Feedback Yet)

Hello,my name is Ren yang,and I have been interesting in your project.So I have owned more than 2 years' experiences of VB designing project,which can mastered ADO,COM and Windows application,or ASP.So I can do it very well.

help

 

250

1 day

01-03-2009 06:21 EST

(No Feedback Yet)

hi i have this project already..... kindly PM me

help

 

100

2 days

01-03-2009 09:06 EST

(No Feedback Yet)

I have a very robust service that does this type of work, highly configurable and I would just need to adapt for your need sir. Please contact me for more info. Here: Senior Developer Engineer in US with 12+ years of experience.

help

 

30

1 day

01-03-2009 12:52 EST

(No Feedback Yet)

Hi, Sir I am a veteran regarding your queiries,all you have mentioned I did in several projects on freelance basis.For your particular task I made the bid too lower as per the entry in getafreelancer.Please help me achieving review. Thanks Samir Ram

help

 

50

3 days

01-03-2009 16:52 EST

(No Feedback Yet)

I have done this type of work earlier also.

help

 

60

3 days

01-05-2009 06:46 EST

(No Feedback Yet)

Please read my PM and reply me!!!

help

 

150

5 days

01-03-2009 18:29 EST

(No Feedback Yet)

I have experience in building parsing application.. Provide more info.

help

 

100

20 days

01-04-2009 04:37 EST

(No Feedback Yet)

I can do it in 20 days

help

 

30

14 days

01-04-2009 08:01 EST

(No Feedback Yet)

Hello. See your PM, please.

help

 

100

3 days

01-06-2009 00:30 EST

(No Feedback Yet)

Hi, a trial version of program fulfilling your requirements is done. Please see PM for further details.

help

 

250

12 days

01-07-2009 05:33 EST

(No Feedback Yet)

See the Pm and attached documents please

help

 

100

3 days

01-09-2009 03:42 EST

(No Feedback Yet)

I done a similar Project, i just need the adapt the parser to your files and the output to the format you want.

help


    Bid on this Project