14 Δεκ 2010

Statistical software for audience research - Προγράμματα στατιστικής επεξεργασίας ερευνών κοινού

Ανακτήθηκε από audiencedialogue




------------------------------------------------------------------------------
Ένα δωρεάν πρόγραμμα διαχείρισης ποιοτικών δεδομένων
μπορείτε να κατεβάσετε από
εδώ
------------------------------------------------------------------------------

A guide to some of the widely available statistical and survey software packages, with comments on their suitability for various types of research.

If you're a statistician this page may seem insultingly over-simplified - but it's not aimed at you. For a thorough overview of research software, try this Glasgow University stats site instead.

With audience research (as with social and market research) there are three types of things that software needs to do:
(1) Get data into a computer (input)
(2) Process data (analysis)
(3) Get data out of the computer, in a form which human brains can understand (output).

The data comes in two quite different varieties:
(a) The numbers and coded data produced by questionnaires (covered on this page)
(b) The words produced by qualitative research (covered on a separate page).

This page covers...
1. Software for coded data entry
2. Statistical software
3. Survey tabulation programs
4. Software for web survey
5. Using spreadsheets and database programs for surveys
...

1. Software for coded data entry

This sounds incredibly technical, but what it actually means is typing in all the answers given in a survey. If you've only used spreadsheet or database software before, you may not have encountered coded data entry. Because most survey questions are multiple-choice, and a respondent is asked to choose from a limited range of answers, the tradition in surveys has been to give each possible answer a code - usually a single-digit number. 1 means the first possible answer to the question, 2 is the second listed, and so on.

For example, the survey might include the question:

Q26. How much do you like reading this page?
To answer, tick the appropriate box
[]1 Not at all
[]2 A little
[]3 A lot
[]4 Won't know till I've finished it
Notice the number beside each box? Instead of keying in "not at all", the data entry operator keys in 1. A "codebook" is set up in a computer file, which tells the survey software that "1" in Q26 means "not at all" - and so on.

The survey software turns the codes back into full wording when the results are displayed. This method of coding produces small data files, fast processing, and very fast data entry. But it's also possible to make serious mistakes. If the first possible answer for every question is 1, you can easily miss a question or enter the same answer twice. Such errors are prevented by good coding design, and good data entry software. If the operator tries to answer "5" to that question, the software should beep, display an error message, and refuse to continue.

Most of the statistical programs allow spreadsheet-style data entry. After a survey has been done, with printed questionnaires, you can sit down and enter all the results into this spreadsheet. It works, but it's not very efficient. And it's very easy to make a mistake.

Of the statistical packages listed below, SPSS has a data entry module, but that costs a lot extra. Epi Info shines here: it has a built-in data entry system which enables extensive checking - though not as thorough as a purpose-built data entry program.

The easiest data entry program I've ever seen is Epidata. A friend of mine did a small survey - his first survey ever - and he's no computer expert. Not knowing how to process survey data on a computer, he asked me for help. At this stage I'd discovered Epidata, but hadn't yet tried it. My friend agreed to be a guinea pig for Epidata. So one afternoon I visited him. We downloaded Epidata (which is freeware) from the Web, unzipped the file, created a data entry template, and set up a file to enter the data into. In less than an hour he was entering data from his questionnaires. That's how easy it was.

I left him to it. A few days later I called him, and asked how he was going with Epidata. No problems, he said - in fact he had changed the template to include another question, which he hadn't originally planned to enter into the file. Any beginners who tried that with most other statistical software would either create horrible errors, or not be able to convert the file.

Computer based interviewing systems

Many statistical packages seem to assume that data from a survey has already been entered into a computer system, somehow. (They don't care how - they don't see it as their concern.) This was a safe assumption back in the days when printed questionnaires were filled in by interviewers, and the codes were punched onto cards. These days, things are different - and not all the software has caught up. A lot of this software has been around for decades. I first used SPSS in the late 1970s, and apart from the Windows interface, most aspects of it have hardly changed.

These days, many interviews are done by phone. The interviewer sits in front of a computer, reading questions off the screen and typing the answers in directly, without ever seeing a printed questionnaire. And with more and more people able to use computers, the respondents themselves can sit at a computer and answer the questions. Typically, each question occupies a full screen. When the answer is given, the screen is cleared and the next question appears. Elaborate branching is thus possible. Instead of everybody being asked the same questions, you can design a tree-like questionnaire, of which most respondents might see only a small part. Some programs which enable this are:

Sawtooth Ci3 and Sensus Q&A

Produced by Sawtooth Software, one of the pioneers of CATI (computer-assisted telephone interviewing), as it used to be called. A powerful program, designed for regular users. Its easy for them to set up a survey quickly, not so easy for beginners.

Scytab

Surveycraft (recently acquired by SPSS) has produced an incredibly powerful program. You want to enter dates in the Japanese Imperial Calendar? No problem. But data entry can be surprisingly complex, as I discovered when I designed a customized CATI program. Scytab seems to do everything possible - but the disadvantage is the learning time involved. Not for casual users.

2. Statistical software

SPSS

SPSS is the most widely used software in social research. It's been around since the 1970s, seems to be used in every university, and includes the most common procedures of social statistics. Its manuals are comprehensive, clear, and well-indexed - I suspect they were a large part of the reason for the success of SPSS. It has always been reasonably easy to use (compared with other statistical programs) but until recently has been fairly slow, and its output has looked messy - not suitable for including in reports directed to non-statisticians. In the late 1990s (at last!), the usability of SPSS took a leap forward. Even so (and it's now up to version 15) the interface could still stand a lot of improvement. In many ways the "easy" GUI version is harder to use than the ancient version for which commands had to be written. All the regular users I know prefer to write syntax files than endless pointing and clicking with the mouse.

One problem with SPSS is the price - around 1000 US dollars for the basic system. You can do a lot with the basic system, but if you want to use advanced statistical techniques, you have to buy more modules. If you buy them all, it costs around 10,000 dollars. At the other extreme, there's an annoyingly limited student version, available (only to full-time students) for about 50 dollars.

On top of the initial price, you have to pay an annual fee to use it. If you forget to renew the license, SPSS stops working. (Of course, this always happens at the most inconvenient possible time.) To add to the bad news, the manuals are no longer supplied with the software - nor are they as clear as they used to be. Our view: SPSS has become stale, resting on its laurels - and probably can't change much without upsetting its millions of users. So it's ready to be swept away by some new software that's much more convenient and user-friendly: the only question is when this will happen. When it does happen, it will be quick, because of the annual licensing system. If your SPSS stops working till you type in a new 80-character(!) serial number, it's a great incentive to look around for an alternative.

Epi Info 6
Epi Info 6 is designed for epidemiologists, so it includes a slightly different set of statistical procedures from most of the other programs: mapping tools for measuring the spread of epidemics, and so on. I've found Epi Info very useful for audience research: some key epidemiological concepts are media research concepts under another name - and audiences spread in much the same way as contagious diseases do. The two big advantages of Epi Info are that it's free - it was funded by the World Health Organization, and it's available in 13 languages. Courses on it are taught at schools of public health in many countries. Its main disadvantage is that it's not good at labelling data. The key programs fit on a single floppy disk, so it's ideal for anybody using an old, slow PC. Unlike SPSS - which only deals with the actual statistical analysis, unless you buy extra modules, Epi Info 6 is a complete statistical system. It includes a word processing program for producing questionnaires and writing reports, and has a powerful data entry program which can detect many errors as soon as they%27re typed in. If you're doing mail surveys, it can automatically generate reminder letters, to people who haven't returned their questionnaires yet. Sounds great, doesn't it?" There's only one catch - Epi Info 6 is a DOS program. Remember MS-DOS? Maybe you don't...it seems like a long, long time ago. But if you hunt for "Command prompt" in WIndows, it's still there.

But if you don't want to learn DOS, there's a WIndows alternative...

Epi Info 2002

Epi Info 2002 (or whatever year it's up to now) is more difficult to use than the DOS version, but more powerful. It's a 37 megabyte download, so its former slimness has gone. It uses the Microsoft Access file format, though you don't need to have MS Access to run Epi Info 2002. Don't even think of running this on an old PC.

We tried Epi Info 2000 (very similar to the 2002 version) for one project, and found it cumbersome and awkward to use, specially for data entry. Other people think so too, because there's a third-party piece of software that does data entry in the old Epi Info style, but runs on Windows. This is Epidata (described above). Epi Info 2002 doesn't seem to include the mail-merge features for generating reminder letters, etc, which were a useful part of Epi Info 6. You could probably achieve the same result by writing a Visual Basic for Applications macro, but that would be a lot more difficult.

Update, February 2006: The analysis module of Epidata is working now, though it still has a few minor bugs. (We've tried it, and haven't found anything that affects results - just a few annoying usability problems). So former Epi Info users will no longer need to use DOS any more to analyse their files. The program offers only basic statistical analysis (frequencies, tables, regression), so you'll need other software for anything more elaborate, but for simple surveys not analysed by professionals, Epidata should be fine.

Statistica

This program, similar to SPSS in scope and cost, seems easier to use than SPSS. (I haven't used the real thing, only a demo version.) Its particular strength is its graphing capability. There's a slightly cheaper cut-down version called Quick Statistica, which might handle most people's needs.

SAS

SAS is at heart a database system, which also does statistics. The complete set of manuals is a horrifying sight: it sprawls for a whole metre across a bookcase. The particular strength of SAS is handling large samples: a million cases can be processed in a few seconds, as long as the computer has enough memory. SAS is more of a mainframe program, though there is a PC version. It's not difficult to use for basic statistics, but to me it feels like using the end of an elephant's tail as a paintbrush. Recent versions include a useful "wizard" interface, which helps inexpert users work out which statistical procedures they should use.

Statview

Once this was a Macintosh-only program, but now there's a Windows version as well. The user interface is rather different from other statistical software I've seen: confusing at first, but when you get used to it, its advantages become more obvious. Statview is very fast, and as its name implies it can produce a wide variety of graphs.

Datadesk and JMP

The strength of these two programs is in exploratory statistics, displayed in graphic form. Both of these help you to understand your data better - something the heavy-duty programs such as SPSS often fail to do.

The trouble with all the programs in the above section (except Epi Info and Epidata) is the price. You won't get much change from $1000 for most of these. In my view, a fair price for software is about the same as a large textbook - around $50 to $100: about what most shareware programs charge. Writing a big book and writing software involve about the same amount of work, but CDs and downloads are far cheaper to produce than printed books.

Other general-purpose statistical programs

The peculiarity of surveys, compared with other kinds of statistical data, is that most survey data is measured at a fairly primitive level: mostly nominal, occasionally ordinal, but seldom ratio data. (If you didn't understand that, maybe you need to learn more about statistics. Consult our Learning statistics page.)

However, most statistical programs are designed around ratio or interval data. Even SPSS, designed specifically for the social sciences, expects its input to be in numeric form. There's nothing numeric about a survey question like "Which TV programs did you watch yesterday?" However, to do statistical analysis, each program has to be allocated an arbitrary number. Statistical programs, which pride themselves on their numerical accuracy, will report the mean TV program to an accuracy of at least 8 decimal points.

If you're dealing with data where the difference between 1.000012 and 1.000013 is meaningful, and big numbers give your brain a nice tingling feeling, I recommend these heavy-duty number-crunchers - the software that real statisticians use:

Stata - it's "intercooled" - i.e. like a truck, complete with a plain DOS-style interface. (Truck drivers are supposedly interested in power, not looks!) The 2004 version (Stata/SE 8) can produce publication quality graphics
S-Plus includes a powerful statistical programming language, and has excellent graphics capabilities, but the way it handles files is a bit tricky. Like Stata, S-Plus was designed for professional statisticians, who use it every day. They don't need their hands held - they just want the job done without fuss.
R like S-Plus,is based on the statistical programming language S, but is freeware. Windows and Macintosh versions are available. An good Introduction to R is a book of that title, by WN Venables, D M Smith, and the "R Development core team" (echoes of Sesame Street!). Development teams often don't realize that others have problems with their concepts, but this book avoids such traps. However, this is not a book for novices at either statistics or programming.
Other software - more specialized

John Pezzullo's Statpages website has a lot of interesting material, including a detailed page on free and "free, but" statistical software. Very comprehensive, and regularly updated. Interesting specialized software described there includes:

Instat Plus - interactive statistics package
WinIDAMS - free statistical package produced for UNESCO
SISA - Simple Interactive Statistical Analysis. You don't need statistical software at all, for this: just feed in your data and SISA calculates results online. Of course, feeding in data is the most tedious part of statistical computing - but when there's not much to be fed in, SISA is great.
The Bayesware Discoverer: handles incomplete data.
Amelia: imputation software that substitutes plausible values for missing data.
Clarify: software for interpreting and presenting statistical results
TURNER: Mac software for exploratory data analysis. Also, there's Turner's colleague...
MANET - "Missings are now equally treated" - another interactive tool that takes account of missing data. (Do you begin to get the impression that a recurring issue for Audience Dialogue is how to handle missing data? Our stance: however you handle missing data, you're making some assumption, even if you don't realize it. For accurate analysis, you must be aware of the assumptions you're making.)
Also at www.statpages.net is a page on websites that do statistical calculations. So if you're using Excel, you need to do some statistical tests, and you don't want to do them the long-winded Excel way, find an appropriate calculator here, and paste your data into it.

You'd like more still? Try the ASC Software Register, which has details of more than 200 software packages for statistical and social survey analysis, but not including most of the programs described on this page. See also the Wikipedia page on statistics.

3. Survey tabulation programs

A slightly different class of program is those intended only for processing survey data. They tend to be more limited than statistical software, but this also makes them easier to use. They're designed for market research companies, and their strength lies in producing tables of numbers rather than exploring hypotheses. Many of these programs are described in the Market Research Software Archive, which also offers downloadable demos for most of them.

Like the statistical packages described above, most of these tabulation programs are designed to be used constantly. If you use them only occasionally, you forget the details, and you run up against the same problems every time you begin a new study. Two that I've used and found OK are Statpac, and The Survey System. Serving the same purpose, in a different way, is Quantrix.

Statpac

Statpac is user-friendly statistical software, designed mainly for use with surveys. Only a Windows version is available. Its ability to automatically code open-ended questions looks very promising.

The Survey System

This is more of a market research system than a statistics system - though the differences are becoming less marked. Its strength is in producing extra-wide "banner tables" that market researchers love - but most other people have trouble interpreting accurately.

Quantrix

Around 1990, Lotus released Improv: a program that was not quite a spreadsheet, not quite a database, but between the two. Think of something like a spreadsheet with any number of dimensions instead of just two, with easy access to reordering the data. It was great for statistical tables, but Lotus dropped Improv after a few years. The good news is that it's been improved and re-released (for Windows only) under the name of Quantrix.

Like the statistical packages, the tabulation programs aren't cheap: mostly around 1000 US dollars. I've tried various shareware tabulation programs, but haven't yet found one without serious bugs and limitations. If you know of a good one, please tell me about it, and I'll include it here.

4. Software for web surveys

When I first wrote this page in 1997, I could find only one piece of software for use with web surveys. A few years later, several appeared, such as Powertab for Macintosh and Perseus Survey Solutions for Windows. By 2007, Powertab has vanished, and Perseus has been repurposed on a much larger scale. What happened there? I think the big statistics companies like SPSS decided they'd get into this market - and they already had the customers - large market research companies, government agencies, and the like. But if you just want to do occasional small surveys, such software is overkill.

Web-based software can be very different. Sometimes you don't have to buy it, or install it on your own computer. You can have a questionnaire on your own web site, and get the answers sent to another site for analysis. Two companies which do this particularly well (both American) are Customersat) and Surveypro.

If you're doing small surveys, there are free services around, usually easy to use. Good examples are Zoomerang, Free Online Surveys, and Sysurvey. These suppliers are constantly coming and going. Zoomerang is the best known, Free Online Surveys is the easiest to use (according to students on a course I ran), and Sysurvey is about the most powerful. You can check out the current suppliers by going to the Google search engine and typing in "free web online surveys". About 60 come up - but beware! A lot aren't free at all, some are free only for small surveys (often up to 50 cases), and sometimes you don't discover till you finish your survey that you have to pay to see the results! If you're thinking of using one of these, check its privacy statement very carefully - some have no qualms about passing respondents' email addresses on to advertisers. However, even when these services aren't free, they are cheap, and easier to use than most statistical software.

A different approach is taken by Webstat: free software that you download to run in your browser. Similar to this is OpenEpi - open source software for epidemiological statistics in Javascript and HTML. It can be run from a web site, or downloaded and run without a web connection.

A program with no direct competition is Inspiredata, from www.inspiration.com, which has long produced excellent graphical software. Inspiredata is designed for use in schools, but it looks as if it would work nicely for small surveys that don't need complex analysis, both online and offline. It comes in both PC and Mac versions, making it one of the few low-cost options for survey analysis on the Macintosh.

5. Using spreadsheets and database software for surveys

Spreadsheets

All the main spreadsheet programs - Excel and its competitors - have statistical routines built in. But they are cumbersome to use. It's all too easy to make a serious mistake - and never know it. Missing data can be a problem with spreadsheets, too: if you leave a cell blank when a respondent couldn't answer a question, the cell is usually ignored. If you use survey or statistical software, you can make the often-important distinction between "don't know", "not applicable", and "not answered".

Excel has pivot tables and a Frequency command, but both are cumbersome to use, compared with software like Epi Info or SPSS. I don't recommend using Excel for survey analysis - but it's better than counting questionnaires by hand! If Excel is all you have, consult our Beginner's guide for survey analysis with Excel. See also these helpful notes by Eva Goldwater .

Another problem with spreadsheets is that they're weak on data entry. Most statistical software will let you enter answers to survey questions in time-saving coded form (e.g. "M" for male) and check the validity of answers as you type them in. Though most spreadsheet software allows that, it's not easy to set up - and just because you select a value with a mouse doesn't always mean it's the correct value. (It's very easy to let go of the mouse button in the wrong place, and not notice.)

But the biggest problem with spreadsheets is that it's horrifyingly easy to make mistakes - specially when you're modifying an existing spreadsheet for a new purpose. Two tricks we've found helpful are

showing entered and calculated data in different colours,
calculating everything in two different ways, comparing the results, and showing the difference between the two results in another cell. When that difference is not zero, you know there's a problem.
Errors are specially likely when you add rows or columns to an existing spreadsheet. If you remove columns, and a formula is based on them, you'll get an error message - but if you add rows or columns, the software assumes you know what you're doing. The meaning you intend may not be the meaning the software assumes. Beware!
The University of Leeds has an excellent summary of spreadsheets, and what they are good at.

Using database programs for statistical analysis

Databases are better than spreadsheets for data entry, but weaker for analysis. Some programs I've tried are Filemaker Pro, Lotus Approach, Microsoft Access, and several variants of Dbase, such as Foxpro. These programs let you set up validity checking for data entry, but can't manage coded data efficiently. Quantrix (described above) fits this category too.

When all the questionnaires have been entered on the database, you then face the problem of summarizing the results. Database software is mostly designed for business record-keeping, so these programs have extremely limited ability to find patterns in data - the main purpose of statistics.

Recommendation: by all means use a database for entering data, but export the data to another program for analysis - even a spreadsheet is better. This is most annoying, because there's no reason why you shouldn't be able to do statistics with a database. Epi Info 2000 (see above), which uses the Microsoft Access format, proves that it's possible.

6. How to choose research software

If you want to produce detailed tables, in which the unit is cases or interviews, go for a survey tabulation program.
To produce detailed tables based on aggregates rather than cases (e.g. monthly sales in each province), without doing statistical calculations, I recommend database software that's fairly simple to use, such as Filemaker.
If you have a set of inherently numerical data and want to analyse causes and effects, use a statistical package.
If you have repetitive tasks (e.g. doing the same survey regularly) go for software that allows scripting (e.g. SPSS, Epi Info, or SAS).
If you are trying to make sense of a mountain of words, you need a qualitative data analysis program.
If you have survey data with open-ended answers of more than 80 characters, avoid SPSS, which can truncate long answers if you resize columns - deleting the end of long comments. (A spreadsheet is much better for this.)
Which program is best?

There's no single answer to this question. It depends on:

What level of analysis you need to use
How much money you have available
Which operating system you are using
How much time you're willing to spend learning the program
What skills and statistical knowledge you already have
What software your friends are using (maybe you can ask them for help)
Whichever option applies, you should have - or be prepared to acquire - a reasonable command of descriptive statistics.For a start, you need to understand averages, medians, percentages, frequency distributions, and so on. If you know nothing about statistics to begin with, and decide to do a course, it would need to include the equivalent of a solid week of class time. Even if you plan a qualitative approach, you should at least be able to defend your choice when somebody asks "Why didn't you do a survey?"

As well as knowing something about statistics, you also need to know how to use the software. If you're not willing to do that, find somebody else who will do your survey analysis for you - perhaps in a social science department in a nearby university.

If you're already an expert with a spreadsheet and/or database program, and not planning to do more than one survey, use the program you know. You may have to stretch its facilities, but the latest spreadsheets and databases have some capability for statistical analysis.

If money's not a problem, and you have a computer running Windows, with at least 256 megabytes of memory, I recommend SPSS. It's far from cheap, but it's widely used - so there's a good chance you can take a local course on it, or get help from a local expert. Also, it has several well written books, such as Statistics with Confidence by Mark Rodeghier, and (more detailed) the SPSS Guide to Statistics, by Marija J Norusis. (The latter book can be used as an excellent introduction to statistics, replacing a basic statistics textbook.)

The latest version of SPSS (version 15) is a slight improvement on earlier versions. Though it still has many annoying features, so do its competitors. Its weakness is perhaps in data management - something that SAS and Epi Info do better. The graphical presentation is also a bit lame by 21st- century standards: Statistica and S-Plus are much more versatile with graphs.

If you want to use your file as a database, rather than simply doing statistics with it, SAS could be a better option than SPSS. The SAS manuals, though, are daunting: not just one thick book, but an entire shelf of them. If you're using it every day, SAS would be great. If you're using it only now and again, you'll forget how it works. For one-off surveys, SAS is even less convenient than SPSS.

SPSS is available for the Mac, but a cheaper option is Powertab, which seems very easy to use, judging from the demo version I tried (but is maybe no longer available). (I'm being cautious here - you never discover the problems with a piece of software unless you use it for a real project.) Another Mac program is Datadesk, which is reported to be excellent, though it's very expensive.

If money's a problem for you, and you have access to a computer running Windows, currently the best option is Epidata. Its weak point is the labelling of variables, so the tables and graphs it produces always need to be embedded in an explanation of the exact variables and values being analysed. However, adding labels is not particularly difficult.

If you use neither a Mac nor a PC, SPSS is available on a huge range of computers, though some versions are more advanced than others. SAS is also available for a very wide range of operating systems, including Linux, as is (or will be) PSPP.

Please bear in mind that the above comments are based on my own experience. As it's a few years since I've used some of the programs, some of the problems I've mentioned may no longer exist. Please let me (Dennis List) know if you've found a mistake, and I'll correct it.

For all I know, there's a marvellous piece of survey analysis software that I've never heard of. (After all, I discovered Epi Info only in 1997 - after working with surveys since the 1970s.) If you know of such a software gem, please tell me about it, instantly!. I'm thinking of something easier for beginners than recent versions of SPSS, and as lean as Epidata, but with...

1. Data entry facilities optimized for accuracy and speed (in other words, not a spreadsheet format - which is great for checking data, but terrible for entering it).
2. No problems processing long text data - e.g open-ended answers of 500 characters or more.
3. Powerful data-sifting abilities (e.g. automatically flagging doubtful data).
4. Allowing macros or scripting, to automate repetitive work.
5. Online help that can solve your immediate problems without your having to scroll through endless screens of irrelevant stuff (i.e. as good as the DOS version of Epi Info).
6. A clearly written manual - which exactly matches the software.
7. Quickly producing presentation-quality tables and graphs, in standard formats that can be pasted into reports.
8. Population projection options that take account of the various types of missing data.
9. A "wizard" to help less expert users choose the most appropriate statistical tests for their needs - and explain why.
10. Ability to handle multiple record types - e.g. both household questionnaires and individual questionnaires, for one study.
11. Using auto-saving to make it all but impossible to lose your data or work.
12. Using a standard data format that can be read by other software without using special filters - e.g. tab-delimited. (Not XML - too clumsy!)
13. No errors in the calculations - an obvious criterion, but many statistical programs sometimes produce incorrect results, as noted by an article by B D McCullough in the International Journal of Forecasting in 2000). 14. Finally: a reasonable price - no more than about 100 US dollars.

I haven't yet found any software that meets all those criteria. In fact, most of the software I've tried meets only 3 or 4 of them. Epidata meets nearly all the above criteria, but has no real manual for analysis - yet. The old Epi Info DOS manual often applies, but not always.