Jump to content

The Ganzfeld Experiments


Recommended Posts

In the James Randi is a Pompus Twit thread, the Ganzfeld experiments were cited as citing evidence supporting that there may actually be such a thing as psychic abilities.


All I have done so far is read the Wikipedia article. Summarizing what I read there, it seems that the experiment has been repeated quite a few times, addressing the needs for better methodological controls in some of the later experiments. The outcome appears to indicate a statistically significant better than chance rate of selecting the right answer, but the right answer is not selected anywhere close to reliably. There is typically a hit nearly a third of the time instead of a quarter of the time as would be expected by random chance.


I'm curious as to what is going on in these tests. There are criticisms listed in the Wikipedia article, but they are not necessarily global to the entire set of experiments conducted. If I look deeper than the Wikipedia article is there a global objection to the set of experiments that might invalidate them? Has this already been debunked? Alternatively, what might account for a hit rate greater than chance under the conditions that the experiments were performed? Why would the hit rate increase from 25% to around 32% and neither stay at 25% or jump to near 100%? It would be a cop out to conclude it was just a hocus-pocus, unexplained, unknowable phenomenon that we'll call "psychic phenomenon."


Thoughts? Discussion?

Link to comment
Share on other sites

Here's a good article, and Ray Hyman's critique: http://www.skepdic.com/ganzfeld.html


Not until I was asked to write a response to a new presentation of these experiments in the January 1994 issue of the Psychological Bulletin did I get an opportunity to scrutinize the raw data. Unfortunately, I did not get all of the data, especially the portion that I needed to make direct tests of the randomizing procedures. But my analyses of what I did get uncovered some peculiar and strong patterns in the data. All of the significant hitting was done on the second or later appearance of a target. If we examined the guesses against just the first occurrences of targets, the result is consistent with chance. Moreover, the hit rate rose systematically with each additional occurrence of a target. This suggests to me a possible flaw.

And other things...

Link to comment
Share on other sites

I see lots of words and no numbers. The fact is that a "statistically significant correlation" is not necessarily a very strong correlation at all- it simply suggests that something more than chance is going on. Doesn't mean that the predictive value is even remotely reliable, nor does it imply a cause/effect relationship.


Extraordinary claims require extraordinary evidence- and from what I can see, this evidence is a long LONG way from being extraordinary.

Link to comment
Share on other sites

  • Super Moderator

I have kept an eye on Ganzfeld since they started. I'm not impressed, though I wish they could have discovered some real evidence for psi. There has been controversy over the project from the beginning, but in all these years the experimenters have yet to "wow" anyone outside their own lab. Confirmation bias? I know the True Believers think skeptics have something to prove and protect, but in my experience it isn't so. I and the skeptics I know of, including Randi, would like nothing better than to find a real paranormal phenomenon, but our level of proof is higher than most - and that's a good thing because people such as the Fox sisters, Peter Hurkos, Uri Geller, James Hydrick, Peter Popoff, and the thousands who prey on the gullible for personal gain should be exposed as soon as possible. Debunking won't ever put a stop to it because people want to be fooled - Popoff, Geller and the Fox sisters' Spiritualism live on long after being definitively debunked. But it's a start.


Here is a quote from a Paranormal Investigator about the Ganzfeld Effect and how it has no valid use in the field of Paranormal investigations. "How can one draw reliable and impartial conclusions in such circumstances? I do not believe one can. My own conclusion is based not just on reading these published papers but also on my personal experience over many years. I have carried out numerous experiments of many kinds and never found any convincing evidence for psi (Blackmore 1996). I tried my first ganzfeld experiment in 1978, when the procedure was new. Failing to get results myself I went to visit [Carl] Sargent's laboratory in Cambridge where some of the best ganzfeld results were then being obtained. Note that in Honorton's database nine of the twenty-eight experiments came from Sargent's lab. What I found there had a profound effect on my confidence in the whole field and in published claims of successful experiments. These experiments, which looked so beautifully designed in print, were in fact open to fraud or error in several ways, and indeed I detected several errors and failures to follow the protocol while I was there. I concluded that the published papers gave an unfair impression of the experiments and that the results could not be relied upon as evidence for psi. Eventually the experimenters and I all published our different views of the affair (Blackmore 1987; Harley and Matthews 1987; Sargent 1987). The main experimenter left the field altogether. I would not refer to this depressing incident again but for one fact. The Cambridge data are all there in the Bem and Honorton review but unacknowledged. Out of twenty-eight studies included, nine came from the Cambridge lab, more than any other single laboratory, and they had the second highest effect size after Honorton's own studies. Bem and Honorton do point out that one of the laboratories contributed nine of the studies but they do not say which one. Not a word of doubt is expressed, no references to my investigation are given, and no casual reader could guess there was such controversy over a third of the studies in the database. Of course the new autoganzfeld results appear even better. Perhaps errors from the past do not matter if there really is a repeatable experiment. The problem is that my personal experience conflicts with the successes I read about in the literature and I cannot ignore either side. I cannot ignore other people's work because science is a collective enterprise and publication is the main way of sharing our findings. On the other hand I cannot ignore my own findings—there would be no point in doing science, or investigating other people's work, if I did. The only honest reaction to the claims of psi in the ganzfeld is for me to say "I don't know but I doubt it."




  • Like 2
Link to comment
Share on other sites

I see lots of words and no numbers. The fact is that a "statistically significant correlation" is not necessarily a very strong correlation at all- it simply suggests that something more than chance is going on. Doesn't mean that the predictive value is even remotely reliable, nor does it imply a cause/effect relationship.


Extraordinary claims require extraordinary evidence- and from what I can see, this evidence is a long LONG way from being extraordinary.


If the experiment can't be repeated, it's likely a higher than random count could be accounted for by voice tonal signals amongst researchers, eye direction or some other minor factor that could have given the reader a slight edge. This is one reason why experiments must be replicated.

Link to comment
Share on other sites

Interesting article, Ouroboros. There was also a link to the definition of a term that gave a name to one of my initial thoughts--the psi assumption: the assumption that any significant departure from the laws of chance in a test of psychic ability is evidence that something anomalous or paranormal has occurred. Curious indeed that the hit rate increased after the first appearance of a target. Rank, agreed that no utility has been demonstrated as to predictive value. Even if experimental design was rock solid and a correlation was found, you don't just decree ESPdidit, and rest on your laurels. I also have to wonder how much the file drawer effect is in play here, where experiments with negative results were not published.

  • Like 1
Link to comment
Share on other sites

  • 2 weeks later...

This is the most information I could find anywhere about these early experiments and the math involved. There is a lot of information here:



An extensive debate took place in the mid-1980s between a parapsychologist and critic, questioning whether or not a particular body of parapsychologi- cal data had demonstrated psi abilities. The experi- ments in question were all conducted using the ganzfeld setting (described below). Several authors were invited to write commentaries on the debate. As a result, this data base has been more thor- oughly analyzed by both critics and proponents than any other and provides a good source for studying replication in parapsychology. The debate concluded with a detailed series of recommendations for further experiments, and left open the question of whether or not psi abilities had been demonstrated. A new series of experi- ments that followed the recommendations were conducted over the next few years. The results of the new experiments will be presented in Section 5.


4.1 Free-Response Experiments


Recent experiments in parapsychology tend to use more complex target material than the cards and dice used in the early investigations, partially to alleviate boredom on the part of the subjects and partially because they are thought to "more nearly resemble the conditions of spontaneous psi occur- rences" (Burdick and Kelly, 1977, page 109). These experiments fall under the general heading of "free-response" experiments, because the subject is asked to give a verbal or written description of the target, rather than being forced to make a choice from a small discrete set of possibilities. Various types of target material have been used, including pictures, short segments of movies on video tapes, actual locations and small objects.


Despite the more complex target material, the statistical methods used to analyze these experi-ments are similar to those for forced-choice experi-ments. A typical experiment proceeds as follows. Before conducting any trials, a large pool of poten-tial targets is assembled, usually in packets of four. Similarity of targets within a packet is kept to a minimum, for reasons made clear below. At the start of an experimental session, after the subject is sequestered in an isolated room, a target is selected at random from the pool. A sender is placed in another room with the target. The subject is asked to provide a verbal or written description of what he or she thinks is in the target, knowing only that it is a photograph, an object, etc.


After the subject's description has been recorded and secured against the potential for later alter-ation, a judge (who may or may not be the subject) is given a copy of the subject's description and the four possible targets that were in the packet with the correct target. A properly conducted experi-ment either uses video tapes or has two identical sets of target material and uses the duplicate set for this part of the process, to ensure that clues such as fingerprints don't give away the answer. Based on the subject's description, and of course on a blind basis, the judge is asked to either rank the four choices from most to least likely to have been the target, or to select the one from the four that seems to best match the subject's description. If ranks are used, the statistical analysis proceeds by summing the ranks over a series of trials and comparing the sum to what would be expected by chance. If the selection method is used, a "direct hit" occurs if the correct target is chosen, and the number of direct hits over a series of trials is compared to the number expected in a binomial experiment with p = 0.25.


Note that the subjects' responses cannot be con- sidered to be "random" in any sense, so probability assessments are based on the random selection of the target and decoys. In a correctly designed ex- periment, the probability of a direct hit by chance is 0.25 on each trial, regardless of the response, and the trials are independent. These and other issues related to analyzing free-response experiments are discussed by Utts (1991).


4.2 The Psi Ganzfeld Experiments


The ganzfeld procedure is a particular kind of free-response experiment utilizing a perceptual isolation technique originally developed by Gestalt psychologists for other purposes. Evidence from spontaneous case studies and experimental work had led parapsychologists to a model proposing that psychic functioning may be masked by sensory in- put and by inattention to internal states (Honorton, 1977). The ganzfeld procedure was specifically de- signed to test whether or not reduction of external "noise" would enhance psi performance.


In these experiments, the subject is placed in a comfortable reclining chair in an acoustically shielded room. To create a mild form of sensory deprivation, the subject wears headphones through which white noise is played, and stares into a constant field of red light. This is achieved by taping halved translucent ping-pong balls over the eyes and then illuminating the room with red light. In the psi ganzfeld experiments, the subject speaks into a microphone and attempts to describe the target material being observed by the sender in a distant room.


At the 1982 Annual Meeting of the Parapsycho-logical Association, a debate took place over the degree to which the results of the psi ganzfeld experiments constituted evidence of psi abilities. Psychologist and critic Ray Hyman and parapsy- chologist Charles Honorton each analyzed the re-sults of all known psi ganzfeld experiments to date, and they reached strikingly different conclusions (Honorton, 1985b; Hyman, 1985b). The debate con- tinued with the publication of their arguments in separate articles in the March 1985 issue of the Journal of Parapsychology. Finally, in the Decem-ber 1986 issue of the Journal of Parapsychology, Hyman and Honorton (1986) wrote-a joint article in which they highlighted their agreements and disagreements and outlined detailed criteria for future experiments. That same issue contained commentaries on the debate by 10 other authors.


The data base analyzed by Hyman and Honorton (1986) consisted of results taken from 34 reports written by a total of 47 authors. Honorton counted 42 separate experiments described in the reports, of which 28 reported enough information to determine the number of direct hits achieved. Twenty three of the studies (55%) were classified by Honorton as having achieved statistical significance at 0.05.


4.3 The Vote-Counting Debate


Vote-counting is the term commonly used for the technique of drawing inferences about an experi- mental effect by counting the number of significant versus nonsignificant studies of the effect. Hedges and Olkin (1985) give a detailed analysis of the inadequacy of this method, showing that it is more and more likely to make the wrong decision as the number of studies increases. While Hyman ac- knowledged that "vote-counting raises many prob-lems" (Hyman, 1985b, page 8), he nonetheless spent half of his critique of the ganzfeld studies showing why Honorton's count of 55% was wrong.


Hyman's first complaint was that several of the studies contained multiple conditions, each of which should be considered as a separate study. Using this definition he counted 80 studies (thus further reducing the sample sizes of the individual studies), of which 25 (31%) were "successful." Honorton's response to this was to invite readers to examine the studies and decide for themselves if the varying conditions constituted separate experiments.


Hyman next postulated that there was selection bias, so that significant studies were more likely to be reported. He raised some important issues about how pilot studies may be terminated and not re- ported if they don't show significant results, or may at least be subject to optional stopping, allowing the experimenter to determine the number of tri- als. He also presented a chi-square analysis that "suggests a tendency to report studies with a small sample only if they have significant results" (Hyman, 1985b, page 14), but I have questioned his analysis elsewhere (Utts, 1986, page 397).


Honorton refuted Hyman's argument with four rejoinders (Honorton, 1985b, page 66). In addition to reinterpreting Hyman's chi-square analysis, Honorton pointed out that the Parapsychological Association has an official policy encouraging the publication of nonsignificant results in its journals and proceedings, that a large number of reported ganzfeld studies did not achieve statistical signifi- cance and that there would have to be 15 studies in the "file-drawer" for every one reported to cancel out the observed significant results.


The remainder of Hyman's vote-counting analy-sis consisted of showing that the effective error rate for each study was actually much higher than the nominal 5%. For example, each study could have been analyzed using the direct hit measure, the sum of ranks measure or one of tw.o other measures used for free-response analyses. Hyman carried out a simulation study that showed the true error rate would be 0.22 if "significance" was defined by re- quiring at least one of these four measures to achieve the 0.05 level. He suggested several other ways in which multiple testing could occur and concluded that the effective error rate in each ex- periment was not the nominal 0.05, but rather was probably close to the 31% he had determined to be the actual success rate in his vote-count


Honorton acknowledged that there was a multi-ple testing problem, but he had a two-fold response. First, he applied a Bonferroni correction and found that the number of significant studies (using his definition of a study) only dropped from 55% to 45%. Next, he proposed that a uniform index of success be applied to all studies. He used the num- ber of direct hits, since it was by far the most commonly reported measure and was the measure used in the first published psi ganzfeld study. He then conducted a detailed analysis of the 28 studies reporting direct hits and found that 43% were sig- nificant at 0.05 on that measure alone. Further, he showed that significant effects were reported by six of the 10 independent investigators and thus were not due to just one or two investigators or laborato- ries. He also noted that success rates were very similar for reports published in refereed journals and those published in unrefereed monographs and abstracts


While Hyman's arguments identified issues such as selective reporting and optional stopping that should be considered in any meta-analysis, the de- pendence of significance levels on sample size makes the vote-counting technique almost useless for as- sessing the magnitude of the effect. Consider, for example, the 24 studies where the direct hit meas-ure was reported and the chance probability of a direct hit was 0.25, the most common type of study in the data base. (There were four direct hit studies with other chance probabilities and 14 that did not report direct hits.) Of the 24 studies, 13 (54%) were "nonsignificant" at a = 0.05, one-tailed. But if the 367 trials in these "failed replications" are com-bined, there are 106 direct hits, z = 1.66, and p = 0.0485, one tailed. This is reminiscent of the dilemma of Professor B in Section 3.


Power is typically very low for these studies. The median sample size for the studies reporting direct hits was 28. If there is a real effect and it increases the success probability from the chance 0.25 to an actual 0.33 (a value whose rationale will be made clear below), the power for a study with 28 trials is only 0.181 (Utts, 1986). It should be no surprise that there is a "repeatability" problem in parapsychology.


4.4 Flaw Analysis and Future Recommendations


The second half of Hyman's paper consisted of a "Meta-Analysis of Flaws and Successful Outcomes" (1985b, page 30), designed to explore whether or not various measures of success were related to specific flaws in the experiments. While many crit-ics have argued that the results in parapsychology can be explained by experimental flaws, Hyman's analysis was the first to attempt to quantify the relationship between flaws and significant results. Hyman identified 12 potential flaws in the ganzfeld experiments, such as inadequate randomization, multiple tests used without adjusting the significance level (thus inflating the significance level from the nominal 5%) and failure to use a duplicate set of targets for the judging process (thus allowing possible clues such as fingerprints). Using cluster and factor analyses, the 12 binary flaw variables were combined into three new variables, which Hyman named General Security, Statistics and Controls.


Several analyses were then conducted. The one reported with the most detail is a factor analysis utilizing 17 variables for each of 36 studies. Four factors emerged from the analysis. From these, Hyman concluded that security had increased over the years, that the significance level tended to be inflated the most for the most complex studies and that both effect size and level of significance were correlated with the existence of flaws.


Following his factor analysis, Hyman picked the three flaws that seemed to be most highly corre- lated with success, which were inadequate atten-tion to both randomization and documentation and the potential for ordinary communication between the sender and receiver. A regression equation was then computed using each of the three flaws as dummy variables, and the effect size for the experi- ment as the dependent variable. From this equa-tion, Hyman concluded that a study without these three flaws would be predicted to have a hit rate of 27%. He concluded that this is "well within the statistical neighborhood of the 25% chance rate" (1985b, page 37), and thus "the ganzfeld psi data base, despite initial impressions, is inadequate ei- ther to support the contention of a repeatable study or to demonstrate the reality of psi" (page 38).


Honorton discounted both Hyman's flaw classifi- cation and his analysis. He did not deny that flaws existed, but he objected that Hyman's analysis was faulty and impossible to interpret. Honorton asked psychometrician David Saunders to write an Ap- pendix to his article, evaluating Hyman's analysis. Saunders first criticized Hyman's use of a factor analysis with 17 variables (many of which were dichotomous) and only 36 cases and concluded that "the entire analysis is meaningless" (Saunders, 1985, page 87). He then noted that Hyman's choice of the three flaws to include in his regression anal- ysis constituted a clear case of multiple analysis, since there were 84 possible sets of three that could have been selected (out of nine potential flaws), and Hyman chose the set most .highly correlated with effect size. Again, Saunders concluded that "any interpretation drawn from [the regression analysis] must be regarded as meaningless" (1985, page 88).


Hyman's results were also contradicted by Harris and Rosenthal (1988b) in an analysis requested by Hyman in his capacity as Chair of the National Academy of Sciences' Subcommittee on Parapsy-chology. Using Hyman's flaw classifications and a multivariate analysis, Harris and Rosenthal con- cluded that "Our analysis of the effects of flaws on study outcome lends no support to the hypothesis that ganzfeld research results are a significant function of the set of flaw variables" (1988b, page 3).


Hyman and Honorton were in the process of preparing papers for a second round of debate when they were invited to lunch together at the 1986 Meeting of the Parapsychological Association. They discovered that they were in general agreement on several major issues, and they decided to coauthor a "Joint Communique"' (Hyman and Honorton, 1986). It is clear from their paper that they both thought it was more important to set the stage for future experimentation than to continue the techni- cal arguments over the current data base. In the abstract to their paper, they wrote:

We agree that there is an overall significant effect in this data base that cannot reasonably be explained by selective reporting or multiple analysis. We continue to differ over the degree to which the effect constitutes evidence for psi, but we agree that the final verdict awaits the outcome of future experiments conducted by a broader range of investigators and according to more stringent standards [page 351].



The paper then outlined what these standards should be. They included controls against any kind of sensory leakage, thorough testing and documen- tation of randomization methods used, better re- porting of judging and feedback protocols, control for multiple analyses and advance specification of number of trials and type of experiment. Indeed, any area of research could benefit from such a careful list of procedural recommendation




Replication and Meta-Analysis in Parapsychology

Author(s): Jessica UttsReviewed work(s):Source: Statistical Science, Vol. 6, No. 4 (Nov., 1991), pp. 363-378

Published by: Institute of Mathematical Statistics

This isn't the whole paper of course. It is only the sections that speak to the original Ganzfeld tests, the meta-analysis of that data and so forth. It does contain numbers though (I do believe I saw them requested). Finding any information on these tests is difficult at best. It seems only the first numbers posted by Honorton are the one's that continue to make the rounds even though this text shows they aren't as concrete as we're supposed to be led to believe they are.


Oh, and I'll post the conclusion of the paper:



Parapsychologists often make a distinction be-tween "proof-oriented research" and "process-oriented research." The former is typically con-ducted to test the hypothesis that psi abilities exist, while the latter is designed to answer questions about how psychic functioning works. Proof-oriented research has dominated the literature in parapsychology. Unfortunately, many of the studies used small samples and would thus be nonsignificant even if a moderate-sized effect exists.


The recent focus on meta-analysis in parapsy-chology has revealed that there are small but consistently nonzero effects across studies, experi- menters and laboratories. The sizes of the effects in forced-choice studies appear to be comparable to those reported in some medical studies that had been heralded as breakthroughs. (See Section 5; also Honorton and Ferrari, 1989, page 301.) Free-response studies show effect sizes of far greater magnitude.


A promising direction for future process-oriented research is to examine the causes of individual differences in psychic functioning. The ESP/ex-troversion meta-analysis is a step in that direction.


In keeping with the idea of individual differ-ences, Bayes and empirical Bayes methods would appear to make more sense than the classical infer-ence methods commonly used, since they would allow individual abilities and beliefs to be modeled. Jeffreys (1990) reported a Bayesian analysis of some of the RNG experiments and showed that conclu-sions were closely tied to prior beliefs even though hundreds of thousands of trials were available.


It may be that the nonzero effects observed in the meta-analyses can be explained by something other than ESP, such as shortcomings in our understand-ing of randomness and independence. Nonetheless, there is an anomaly that needs an explanation. As I have argued elsewhere (Utts, 1987), research in parapsychology should receive more support from the scientific community. If ESP does not exist, there is little to be lost by erring in the direction of further research, which may in fact uncover other anomalies. If ESP does exist, there is much to be lost by not doing process-oriented research, and much to be gained by discovering how to enhance and apply these abilities to important world problems.

I won't comment. Take Utts words for yourself. This way you can decide if they have some grand bias/agenda that you can use to dismiss the portion of the text that actually relates to this thread.



Link to comment
Share on other sites

Thank you, mwc! This is quite a bit of information. For anyone else interested, it looks like Utts' entire paper is published at http://www.scribd.com/doc/36360447/Replication-and-Meta-Analysis-in-Parapsychology-Jessica-Utts.

Link to comment
Share on other sites

This topic is now closed to further replies.

  • Create New...

Important Information

By using this site, you agree to our Guidelines.