Here’s a request that happens fairly regularly in different offices I’ve been associated with:
- We receive surveys/orders/documents that have been filled out by various people as Excel spreadsheets
- We need to put the information into a database
- We want you to write a script to automate the imports.
My thoughts as a scripter:
- This is an excellent way to keep a contractor in a job for a long time.
- You will never get everybody to input the data the same way into Excel. Nobody ever has.
- People will put things on different lines, because it is Excel, and they can.
- People will never enter data the same way. If you give them a yes/no column in a spreadsheet, some time you will get “yes/no”, sometimes “y/n”, “t,f”, “X, ”, “☑,☒”, “☺,☹”, “+,-”. Every time somebody creates a new item, the script breaks, and has to be modified to compensate, or you have to send the form back or correct it. With a web form, you just give them a yes/no box, or a check box, and they can’t put in different values (it’s self-correcting).
- Whenever the user changes the values, if you haven’t predicted what they will enter, you will have to manually redo those files, because they will break the script. This always happens. Then you have to go back and fix the script.
- If you want to run the script on a server, you either have to work out how to dissect XLS files, or install Excel on the server. The first is a royal pain, and the second shouldn’t be allowed to happen.
The best way to deal with this is to avoid it altogether:
- Get your database in a SQL database back end (and don’t confuse Excel with a database.)
- Set up a web form that looks like the spreadsheet, to do data input straight into the database
- Set up list boxes, etc, so that the data is picked, and not created on the fly.
- Instead of filling out the spreadsheet, users just fill out a web page.
- If you really want, you can send out Access-via-Outlook forms to anybody with Outlook. They are like Access forms, except they get emailed, and they send the data straight into a database.
Miscellaneous advantages to web form:
- It takes almost all of the work out
- You don’t have to correct data for people. If there’s a problem, it can let them know as they enter it, and they can correct it on the spot.
- You don’t have to worry about different versions of Excel and interoperability.
- You don’t even have to worry if people have Excel.
- People can fill in a web form from any machine–PC, Macintosh, Blackberry.
- If you change things, the web form changes along with the back end, and anybody going to the form automatically gets the new version–with Excel, you get back what you sent out any time ago.
Doing this as a script will cost you a lot of time and effort (which probably translates into money), and will never work right (which will cost you a lot more money). Doing this as a web form will be relatively easy (and cheap), and the data will be self-correcting. If you want to do Access-via-Outlook forms, you will have plenty of money left over, in case you need to hire somebody to create that (it shouldn’t be hard, nor take long to make.) Besides, starting with a website, and just having an input form added to it is barely more work than the website by itself–and whoever builds the website will probably build the form as they build the site, anyway, just to do test data (I always do, when I build databases.)
If you still insist on doing this as a script, do it in Perl. Perl has modules to parse XLS files (so it doesn’t need Excel) and is built to parse native language text–which it will have to do for anything people can type in wrong–and can compile into an EXE, if you need to run it on a server without installing a new language. If you don’t need to run it on a server, Perl is still the best language to do it in. It’d still be an expensive and troublesome pain, but not as much as other scripting languages.
This kind of project is a knock-off for a web-developer. It’s a career for a scripter.
15 responses so far ↓
1 Ibrahim // Aug 9, 2008 at 6:20 am
Hi, I randomly came across this through reddit, and I’d like to add that you could use a Google Spreadsheet if your work allows that, because it lets you set up web forms automatically.
2 Ian Welsh // Aug 9, 2008 at 6:46 am
As long as you can export to excel, folks will be happy. Most managers/analysts are pretty excel-addicted.
3 Sid // Aug 9, 2008 at 8:50 am
I completely agree. At previous jobs, I’ve seen people burned by the same thing.
In fact, now just about anytime I see some potential for data analysis, my gut reaction is to keep the door open (mentally) to doing it with some sort of web based database backed system. Of course, this sometimes also leads to “when all you have is a hammer, everything is a nail” syndrome =)
4 Anonymous // Aug 9, 2008 at 12:08 pm
I wish my boss would understand this *sigh*
5 Python Crusader // Aug 9, 2008 at 12:38 pm
Perl? Really? The one language that isn’t developing the slightest?
Tell you what, you tell me when your Perl 6 comes out, and I’ll tell you when the next major Python release is out (1st of October is the current plan.)
6 Debra Dalgleish // Aug 9, 2008 at 12:55 pm
You’re right, Excel isn’t a database. ;-)
However, for data input, if you set up the Excel file correctly, with data validation drop downs and locked cells, you can avoid most of the problems that you mentioned.
7 Spune // Aug 9, 2008 at 1:15 pm
WOW, you hit the nail on the head. the excuse that we get alot at work is we can’t modify hundreds of entries at once thru the web like we can in an excel file. The problem is parsing as you’ve already stated and we still have to treat the whole excel sheet as a single transaction. Most of this data is coming from other systems, but each time I mention some kind of API to have those applications upload their data automatically I get stares of confusion, anger, or fear.
8 Mark // Aug 9, 2008 at 3:30 pm
It’s a knock-off for a scripter, too. And then another knock-off. And another. And another. Hey, you’re right, pretty soon that is a career.
I once worked with someone who was in charge of managing all the data for a team, who seemed to have the job only because she was incapable of doing anything else. She did everything with Excel, and there problems everywhere. What a mess. When I needed some data collected and processed, I had to go through her, according to the rules (she inserted herself in the pipeline between data and build). So I built a web app, as you suggest, for my portion of the data collection, and had her use that. She never got to touch the data, only the web UI. Worked like a charm.
9 Kevin Merritt // Aug 9, 2008 at 4:37 pm
This is one of the primary use case scenarios that we are solving at blist. We have created an online database that allows mainstream users to work in an environment that looks and feels like Excel, but the columns are “typed” – meaning the creator of the table can control the type of content people enter. We have checkboxes, T/F booleans that can be customized (hire/no-hire; pass/fail; go/no-go), picklists, etc.
In addition to its ease of use, there’s a huge benefit downstream in that it really is a database. You can analyze, print, chart, export, etc.
We support import and export from and to Excel.
It’s free to use. Try it out at http://www.blist.com.
10 Neil // Aug 9, 2008 at 6:09 pm
Interesting thought, Ibrahim. And I’ll have to look into it.
It won’t happen in the office, as it’s a secure site. If infomration got posted on Google’s servers, our security depeartment would have a well-earned cow <g>
11 Neil // Aug 9, 2008 at 6:14 pm
PC, I have nothing against Python…I haven’t used it simply because it hasn’t shown up as the right tool for the jobs I’ve done. (I will say, though, that flaming Perl isn’t much of a reccomendation, in and of itself. How is Python in regards to parsing English text? I know it rocks at number-crunching. And how is Python for compiling into an EXE? That was one of the specific points.)
BTW, I’m also not a Perl developer. The email was about *not* doing this sort of project in Excel, not about any given language.
12 Neil // Aug 9, 2008 at 6:22 pm
Debra, I’m certainly not knocking Excel as a product. In fact, it’s pretty handy as an output device (charts go over well with management) and it’s easily automated.
The article was in response to a couple of projects that consisted of sending out a lot (literally tens of thousands) of spreadsheets, which would have a fair number of free-text fields) and then importing the data on an ongoing basis.
In this project, just the version overlap alone (ie, you come up with a glitch in the spreadsheet, you discover it will import better with a different value (or management changes something…); you still have thousands of spreadsheets floating around. There’s no updgrade potential.) suits it for a centralized input device.
For smaller, simpler (and certainly quicker) jobs, Excel is low-overhead, and low learning-curve.
13 Neil // Aug 9, 2008 at 6:25 pm
Spune, if you have any database people where you work, get their support. Update queries are easier (ie, less money spent on labor) than updating spreadsheets.
14 Neil // Aug 9, 2008 at 6:35 pm
Nice, Mark.
The last time I actually wrote an Excel import script was for a manager who had already sent out several thousand questionnaires, and was starting to get them back. He couldn’t figure out how to turn them into usable data.
I wrote him a script (VBS, monitored a folder, and ripped any XLS that was dropped into it, cleaned it up and shoved it into Access.) Told him how to set it up, and how he’d have to fix the spreadsheets if the script spit them back at him, and that if he detailed anywhere it was consistently breaking, we could update the script.
After that, he started asking for my opinion earlier in the decision tree.
15 Johan L // Aug 10, 2008 at 10:35 pm
Nice troll.
Reality check
Leave a Comment