Hal Berghel, Cybernautica, Does your Cybercensus; July-August 1996

accesses since May 8, 1996

Hal Berghel's Cybernautica

Cybercensus

WHO'S CONNECTED?

This may be the most interesting (and most important) question about cyberspace at the moment. In fact, no one knows for sure.

One of the problems facing the cybercensus taker is that there is no tradition behind electronic surveying and polling on and for high-speed computer networks. With television, we have a fifty year history of surveying, codifying, quantizing, standardizing and summarizing survey data. There is widespread acceptance at this point that the current surveys provide on balance fairly reliable statistics on the preferences of the viewing audience. This experience has also produced a steady supply of electronic gadgets which reliably record the data from the surveyed homes automatically.

At this point neither the techniques nor the technology for surveying users of digital networks have been adequately refined. This has resulted in considerable difference of opinion concerning the use of the networks.

TELEPHONE SURVEYS MEET THE INTERNET

The most extensive survey of Internet use to date has been the CommerceNet/Nielsen Report, commissioned by CommerceNet and conducted by Nielsen Media Research. This has also produced the hottest debate amongst those interested in collecting accurate statistics about Internet use.

The Nielsen survey was conducted in late summer, 1995. The report, which estimated 22 million Internet users, was then marketed to businesses for $5,000 per copy. Many businesses which were tracking Internet use to estimate the commercial potential of the Internet found this to be an invaluable resource.

The two-part Nielsen survey was drawn from net users and telephone households in the U.S. and Canada. The first part was a month-long, random-sample, telephone survey which began August 3. Over 280,000 telephone calls were made and from this 4,200 interviews were completed (this ratio betrays the fact that I'm not the only one who dislikes having my dinner interrupted for these things). The goal was to have data for at least 1,000 individuals from each of three "cells": people who have direct access to the Internet; people who have indirect access via service providers, and those who have no access at all. A self- selected online survey of 32,000 conducted from August 18 through September 13 was used to measure biases inherent in online surveys of this sort.

The summary results were impressive. Over 37 million people in North America over age 16 had access to the Internet - that's 17% of the population. In the same age range, 24 million (11% of the population) used the Internet within the past three months of the survey, and 18 million (8%) used the World Wide Web during the same period. On average, the majority of Web surfers are "...upscale, professional and well-educated..."

The report came out to rave reviews. Then things began to unravel. First came the public disclaimer last December from one of the original designers of the survey, Donna Hoffman. She claimed the statistical analysis of the data was fundamentally flawed. In her words, "... [Nielsen's] estimates of Internet size appear 'too high' and ... the weighted sample on which the estimates are based does not appear to be representative of the North American population." In short, Hoffman and her colleagues believe that the survey was skewed in several important respects: (1) it under-represented those individuals with less education, lower income and above the age of 55, and (2) it was inconsistent in defining what constitutes Internet access and use.

Overall, Hoffman and her colleagues believe that the Nielsen Report inflated the estimates of Internet use by an average of 38% across all categories. Slightly over half of this error is due to deficiencies in weighting the samples, According to Hoffman, et al, "...[because of these] critical flaws ... these [Nielsen Report] estimates lack validity and are of little value to decision makers."

In their own analysis of the Nielsen data, Hoffman and co-authors William Kalsbeek and Thomas Novak estimate that 28.8 million people in the U.S. 16 and older have access to the Internet, 16.4 million actually use it, and 11.5 million use the Web. Of the Webbers, only 1.5 million actually use it for commerce. Table 1 compares the Nielsen and Hoffman, et al, estimates based upon the original Nielsen data.

Table 1. A COMPARISON OF TWO ESTIMATES OF INTERNET USE IN NORTH AMERICA (adapted from Hoffman, Kalsbeek and Novak, 1995; numbers in millions)

	Nielsen	Hoffman, et al
access to the Internet	36.8	28.8
recent (3 mo.) use of Internet	24.0	18.4
access to the World Wide Web	18.2	14.1
use Web for commerce	2.5	1.58

More importantly, Hoffman, Kalsbeek and Novak claim that the Nielsen analysis may have been too coarse to be useful in predicting trends. Figure 1 provides a further breakdown of Internet users by category.

Figure 1. PERSONAL USE OF THE INTERNET BY CATEGORY OF USE (source: Hoffman, et al) note that most use is not "hard core". The question remains, exactly how revealing are surveys of non-regular users.

Needless to say, the criticism of the CommerceNet/Nielsen Report virtually killed its sales. According to a recent N.Y. Times report, one Nielsen executive labeled the criticism by Hoffman, et al, a "brutal, bitter and unprofessional attack..." and one which "...caused Nielsen all kinds of trouble..."

This story is the stuff of which a dime store novel is made. It combines themes of impure science, corporate greed, maverick investigators, big business, futurist technology. If only George Orwell were here to enjoy it.

THE FUTURE OF THE CYBERCENSUS

Of course there are a number of other surveys of the Internet in its many manifestations. There is even a cyberatlas for the Internet (see inset). And future studies, estimates, predictions, etc. of the net will follow the commercial interests as surely as night follows day. Eventually, we'll be in a position to do the same sorts of things with these Internet surveys, and analyses thereof, as we currently do with television surveys. The future holds great promise for specialists in the emerging field of what we'll call, for want of a better term, "digital market analysis." That this will take place seems quite obvious. Whether it will be a social good is not so obvious.

One good thing that has already come from the interest in Internet surveying is a preliminary understanding of the difference between traditional and electronic surveying methodologies. For example, network surveying adds a new dimension to the problem of self-selection - e.g., where individuals decide themselves whether to become participants. No one knows whether, or to what degree, telephone respondents are similar to network respondents. Are cybernauts who fail to participate in surveys the same sort of folks as those who hang up on surveyors? We don't know.

Other insights will come from studies of sampling. Most surveying at the moment is based upon self-selection. As electronic surveying matures, new techniques for random sampling will be perfected which will minimize biases. Recall that that was one of the key objections to the Nielsen survey!

Faced with unprecedented methodological problems, the science and technology behind such electronic sampling may well be indispensable for future generations of marketers, communicators, and organizers. The interest in surveying the Internet has forced the scientific community to re-visit the issues of how populations look under this new, digital lens.

For further reading:
The original CommerceNet/Nielsen results are summarized in "The CommerceNet/Nielsen Internet Demographics Survey" which is on the Web at http://www.commerce.net/information/surveys/execsum/exec_sum.html.
The criticism of the analysis by Donna Hoffman and Thomas Novak may be found at http://www2000.ogsm.vanderbilt.edu/surveys/cn.questions.html.
The survey of Internet use conducted by Donna Hoffman, William Kalsbeek and Thomas Novak, "Internet Use in the United States: 1995 Baseline Estimates and Preliminary Market Segments," is available at http://www2000.ogsm.vanderbilt.edu/baseline/1995.Internet.estimates.html.

A cyberatlas now exists for up-to-date information on the Internet and it's constituency, viewed from seemingly unlimited perspectives. http://www.cyberatlas/com/.