Here’s a request that happens fairly regularly in different offices I’ve been associated with:
- We receive surveys/orders/documents that have been filled out by various people as Excel spreadsheets
- We need to put the information into a database
- We want you to write a script to automate the imports.
My thoughts as a scripter:
- This is an excellent way to keep a contractor in a job for a long time.
- You will never get everybody to input the data the same way into Excel. Nobody ever has.
- People will put things on different lines, because it is Excel, and they can.
- People will never enter data the same way. If you give them a yes/no column in a spreadsheet, some time you will get “yes/no”, sometimes “y/n”, “t,f”, “X, ”, “☑,☒”, “☺,☹”, “+,-”. Every time somebody creates a new item, the script breaks, and has to be modified to compensate, or you have to send the form back or correct it. With a web form, you just give them a yes/no box, or a check box, and they can’t put in different values (it’s self-correcting).
- Whenever the user changes the values, if you haven’t predicted what they will enter, you will have to manually redo those files, because they will break the script. This always happens. Then you have to go back and fix the script.
- If you want to run the script on a server, you either have to work out how to dissect XLS files, or install Excel on the server. The first is a royal pain, and the second shouldn’t be allowed to happen.
The best way to deal with this is to avoid it altogether:
- Get your database in a SQL database back end (and don’t confuse Excel with a database.)
- Set up a web form that looks like the spreadsheet, to do data input straight into the database
- Set up list boxes, etc, so that the data is picked, and not created on the fly.
- Instead of filling out the spreadsheet, users just fill out a web page.
- If you really want, you can send out Access-via-Outlook forms to anybody with Outlook. They are like Access forms, except they get emailed, and they send the data straight into a database.
Miscellaneous advantages to web form:
- It takes almost all of the work out
- You don’t have to correct data for people. If there’s a problem, it can let them know as they enter it, and they can correct it on the spot.
- You don’t have to worry about different versions of Excel and interoperability.
- You don’t even have to worry if people have Excel.
- People can fill in a web form from any machine–PC, Macintosh, Blackberry.
- If you change things, the web form changes along with the back end, and anybody going to the form automatically gets the new version–with Excel, you get back what you sent out any time ago.
Doing this as a script will cost you a lot of time and effort (which probably translates into money), and will never work right (which will cost you a lot more money). Doing this as a web form will be relatively easy (and cheap), and the data will be self-correcting. If you want to do Access-via-Outlook forms, you will have plenty of money left over, in case you need to hire somebody to create that (it shouldn’t be hard, nor take long to make.) Besides, starting with a website, and just having an input form added to it is barely more work than the website by itself–and whoever builds the website will probably build the form as they build the site, anyway, just to do test data (I always do, when I build databases.)
If you still insist on doing this as a script, do it in Perl. Perl has modules to parse XLS files (so it doesn’t need Excel) and is built to parse native language text–which it will have to do for anything people can type in wrong–and can compile into an EXE, if you need to run it on a server without installing a new language. If you don’t need to run it on a server, Perl is still the best language to do it in. It’d still be an expensive and troublesome pain, but not as much as other scripting languages.
This kind of project is a knock-off for a web-developer. It’s a career for a scripter.
15 responses so far ↓
1 Ibrahim // Aug 9, 2008 at 6:20 am
Hi, I randomly came across this through reddit, and I’d like to add that you could use a Google Spreadsheet if your work allows that, because it lets you set up web forms automatically.
Reply to Ibrahim
2 Ian Welsh // Aug 9, 2008 at 6:46 am
As long as you can export to excel, folks will be happy. Most managers/analysts are pretty excel-addicted.
Reply to Ian Welsh
3 Sid // Aug 9, 2008 at 8:50 am
I completely agree. At previous jobs, I’ve seen people burned by the same thing.
In fact, now just about anytime I see some potential for data analysis, my gut reaction is to keep the door open (mentally) to doing it with some sort of web based database backed system. Of course, this sometimes also leads to “when all you have is a hammer, everything is a nail” syndrome =)
Reply to Sid
4 Anonymous // Aug 9, 2008 at 12:08 pm
I wish my boss would understand this *sigh*
Reply to Anonymous
5 Python Crusader // Aug 9, 2008 at 12:38 pm
Reply to Python Crusader
6 Debra Dalgleish // Aug 9, 2008 at 12:55 pm
Reply to Debra Dalgleish
7 Spune // Aug 9, 2008 at 1:15 pm
Reply to Spune
8 Mark // Aug 9, 2008 at 3:30 pm
It’s a knock-off for a scripter, too. And then another knock-off. And another. And another. Hey, you’re right, pretty soon that is a career.
I once worked with someone who was in charge of managing all the data for a team, who seemed to have the job only because she was incapable of doing anything else. She did everything with Excel, and there problems everywhere. What a mess. When I needed some data collected and processed, I had to go through her, according to the rules (she inserted herself in the pipeline between data and build). So I built a web app, as you suggest, for my portion of the data collection, and had her use that. She never got to touch the data, only the web UI. Worked like a charm.
Reply to Mark
9 Kevin Merritt // Aug 9, 2008 at 4:37 pm
This is one of the primary use case scenarios that we are solving at blist. We have created an online database that allows mainstream users to work in an environment that looks and feels like Excel, but the columns are “typed” - meaning the creator of the table can control the type of content people enter. We have checkboxes, T/F booleans that can be customized (hire/no-hire; pass/fail; go/no-go), picklists, etc.
In addition to its ease of use, there’s a huge benefit downstream in that it really is a database. You can analyze, print, chart, export, etc.
We support import and export from and to Excel.
It’s free to use. Try it out at http://www.blist.com.
Reply to Kevin Merritt
10 Neil // Aug 9, 2008 at 6:09 pm
Interesting thought, Ibrahim. And I’ll have to look into it.
It won’t happen in the office, as it’s a secure site. If infomration got posted on Google’s servers, our security depeartment would have a well-earned cow <g>
Reply to Neil
11 Neil // Aug 9, 2008 at 6:14 pm
PC, I have nothing against Python…I haven’t used it simply because it hasn’t shown up as the right tool for the jobs I’ve done. (I will say, though, that flaming Perl isn’t much of a reccomendation, in and of itself. How is Python in regards to parsing English text? I know it rocks at number-crunching. And how is Python for compiling into an EXE? That was one of the specific points.)
BTW, I’m also not a Perl developer. The email was about *not* doing this sort of project in Excel, not about any given language.
Reply to Neil
12 Neil // Aug 9, 2008 at 6:35 pm
Nice, Mark.
The last time I actually wrote an Excel import script was for a manager who had already sent out several thousand questionnaires, and was starting to get them back. He couldn’t figure out how to turn them into usable data.
I wrote him a script (VBS, monitored a folder, and ripped any XLS that was dropped into it, cleaned it up and shoved it into Access.) Told him how to set it up, and how he’d have to fix the spreadsheets if the script spit them back at him, and that if he detailed anywhere it was consistently breaking, we could update the script.
After that, he started asking for my opinion earlier in the decision tree.
Reply to Neil
Leave a Comment