Additional related information may be found at: |
Neuropsychopharmacology: The Fifth Generation of Progress |
The DSM-IV Classification and Psychopharmacology
Allen Frances, Avram H. Mack, Ruth Ross, and Michael B. First
During the past 25 years, the DSM system and psychopharmacology have "grown up" together and have had a strong influence upon one another. The psychopharmacological revolution required that there be a method of more systematic and reliable psychiatric diagnosis. This provided the major impetus for the development of the structured assessments and the research diagnostic criteria that were the immediate forerunners of DSM-III (30). In turn, the availability of well-defined psychiatric diagnoses stimulated the development of specific treatments and increasingly sophisticated psychopharmacological studies. We expect that DSM-IV will continue to influence, and future revisions to be influenced by, the next steps in the advance of psychopharmacology. This chapter discusses the development of DSM-IV and the ways in which several issues in psychiatric nosology (e.g., descriptive diagnosis, reliability, generalizability, validity, and the heuristic value of the current classification) interact with psychopharmacological research. We suggest that DSM-IV is likely to be a useful tool in psychopharmacological research, but one with numerous limitations that must be understood if this research effort is to have optimal value (see Genetic Influences in Drug Abuse).
The major innovation in the development of DSM-IV was the careful three-stage process of empirical review that informed all decisions (33). The three stages included (a) comprehensive and systematic reviews of the published literature, (b) reanalyses of already collected but previously unanalyzed data sets, and (c) field trials. Each stage was carefully documented and the results will be published in the detailed five volume DSM-IV Sourcebook (5).
The Work Groups generated 150 literature reviews on questions most crucial to the development of DSM-IV. A standard format was used to insure that these reviews would be methodical, objective, and comprehensive (17, 34). Each review began with an explicit statement of the issues and a discussion of their significance for clinical practice and research. This was followed by a summary of the literature gathered from as many sources as possible. Finally, there was a discussion of the advantages and disadvantages of the various possible options proposed for DSM-IV. The reviews were then carefully critiqued by Work Group members and advisors to ensure balance and cohesiveness. This process ensured that everyone involved in making decisions was working from a commonly accepted data base, which facilitated agreement among individuals who sometimes began with widely differing positions and orientations. It was hoped that the DSM-IV decisions would reflect the conclusions of an ideal "consensus scholar" and not be unduly influenced by the preconceptions of the participants (9). In some cases, the literature reviews provided enough information to resolve questions. In others, they indicated the need for further investigation in the subsequent two stages of the review process. The specific options for change in criteria resulting from the literature reviews were presented to the Task Force and to the mental health field in the DSM-IV Options Book (3).
The data reanalyses, funded by the John A. and Catherine D. MacArthur Foundation, were designed to resolve a number of important diagnostic questions for which incomplete answers were available in the published literature. A method was developed for reanalyzing data previously collected in a variety of settings, but not yet analyzed in a way that was useful for answering the Work Group's questions (33). This enabled the Work Groups to develop and refine suggested new criteria that could then be studied in field trials.
We conducted twelve field trials to examine questions that the first two stages had been unable to resolve, which served to bridge the clinical research literature and clinical practice. Each field trial compared alternative DSM-III, DSM-III-R, ICD-10, and proposed DSM-IV options at 5 to 10 sites with approximately 100 subjects at each site. The field trials [sponsored by the National Institute of Mental Health (NIMH) in conjunction with the National Institute on Drug Abuse (NIDA) and the National Institute on Alcohol Abuse and Alcoholism (NIAA)] studied: Antisocial Personality Disorder, Autism and related Pervasive Developmental Disorders, Disruptive Behavior Disorders, Insomnia, Major Depression and Dysthymia, Mixed Anxiety-Depression, Panic Disorder, Obsessive-Compulsive Disorder, Posttraumatic Stress Disorder, Schizophrenia, Somatization Disorder, and Substance Use Disorders (33).
The DSM-IV Task Force was very conservative in making changes from DSM-III-R. A major reason for this was to enable researchers to generalize from data gathered using the different DSMs. A conservative approach also reduces discontinuities in assessment and creates minimal disruption to studies in progress.
The first two stages (i.e., literature reviews and data reanalyses) depended heavily on the published and unpublished results of psychopharmacological studies. In many instances, the diagnostic information that was the most useful for the purposes of the literature reviews emerged from the diagnostic findings that have accumulated over the years as part of clinical trials. There were many advantages and also very serious limitations to the information thus obtained. It was extremely useful to have the extensive pool of data available from clinical trials, especially because the information generally had been carefully collected by skilled investigators under conditions that ensured very high reliability. The Work Groups also had to be cautious about generalizing from pharmacological trials to more general populations, because the results were most often drawn from patient samples after a highly selective screening process. There was a considerable effort to balance data obtained from pharmacological trials with data obtained from other types of clinical research and from epidemiological studies.
In a similar fashion, the reanalyses also used previously unpublished data drawn from many samples collected in pharmacological trials. These trials provided an extremely valuable pool of information and enabled the Work Groups to base deliberations on a much larger empirical base than would otherwise have been possible. As with the literature reviews, however, the Work Groups had to balance data reanalyses from pharmacological trials with data gathered in other clinical settings and from epidemiological samples to ensure generalizability.
The field trials provided a bridge between clinical research (including pharmacological trials) and more general clinical practice. They used methods that more closely approximated general practice than would the method of a typical, more rigorously controlled clinical trial (13). Consecutive patients were evaluated to eliminate the sampling bias inherent in the application of inclusionary and exclusionary criteria for treatment outcome studies. Patients were selected using a diversity of representative sites that sampled different socioeconomic, cultural, and ethnic groups in many different geographic locations and, in some instances, included randomly selected general community samples. Finally, reliability was not established as a precondition to beginning the formal portion of the study as is done routinely in pharmacological trials. Rather the field trial interviewers were given only limited training to approximate more closely the conditions that are obtained in general clinical practice.
Throughout the history of psychiatric nosology, there has been a back and forth alternation between systems that were based more on theories of etiology and those that were based more exclusively on descriptive observation. The ancient systems of psychiatric classification developed by Hippocrates, Galen, and Rhazes were anchored in etiological explanations (e.g., for Hippocrates, mental health or illness depended upon the balance or imbalance of the four humours: blood, phlegm, black bile, yellow bile) (24, 10). Modern psychopharmacology has continued the search for more meaningful humours, using vastly more powerful instruments of study. With the Renaissance came an increasing emphasis on descriptive classification. The English physician Sydenham (1624–1663 A.D.) assumed that because nature is "uniform" and "consistent," the same symptoms and signs "that you would observe in the sickness of a Socrates you would observe in the sickness of a simpleton" (31). Sydenham's emphasis on careful descriptive observation influenced the taxonomies of Linnaeus and Boissier de Sauvages. The latter developed perhaps the most extensive psychiatric classification in history, listing 2400 different "species" of mental disorders. Descriptive methods were also used by the early nineteenth century psychiatrists Pinel and Esquirol (18).
Explanatory, etiological models were soon offered to replace these superficial descriptive systems. Cabanis wrote that the excessive reliance on description would lead science to "lose itself in the multitude of facts gathered" (18). The physicians Gall and Broussais declared that to classify by symptoms was arbitrary and instead emphasized the value of brain dissection performed at autopsy (1). In this environment, the descriptive classification of Pinel and Esquirol was "now forgotten, [and] the neuropsychiatric perspective . . . took a leading position" (25). Many etiological models of mental illness have since been offered, ranging from phrenology to psychoanalytic theory, but more recently most have been based upon the attempt to determine the specific structure and function of the brain abnormalities that are closely associated with mental disorders.
Kraepelin's careful descriptive classification provided the inspiration, and much of the content, for the assertively descriptive approach used in DSM-III (7). Perhaps the most cogent (and still totally current) defense for the descriptive approach was offered by a contemporary of Kraepelin, the nineteenth century American psychiatrist, Pliny Earle, who said, "In the present state of our knowledge, no classification of insanity can be erected upon a pathological basis, for the simple reason that, with but slight exceptions, the pathology of the disease is unknown . . . we are forced to fall back upon the symptomatology of the disease" (19). The descriptive system has been extremely valuable both to psychiatric classification and to psychopharmacology because it increases reliability and promotes communication across all of the various psychiatric orientations (7). Nonetheless the user of the DSM-IV must also understand that any descriptive approach is only an unsatisfactory and limited step along the way toward a system of classification that is based on etiological understanding.
DSM-IV provides no more, but also no less, than a descriptive heuristic for discovery of underlying pathogenesis in the same way as descriptive systems in other disciplines have suggested eventually explanatory models (e.g., the role of Mendeliev's periodic table in chemistry and of Linneaus's taxonomy of species in furthering the theory of evolution). New findings, much of them drawn from psychopharmacological research, will hopefully help us gradually to replace our current descriptive system with one that is based on a much deeper knowledge of underlying etiology. In effect, DSM-IV provides the useful descriptions that will hopefully facilitate our moving away from, and beyond, the descriptive approach (14).
Another, more immediate, limitation of the DSM-IV descriptive system arises from the way in which the criteria sets have been generated. For the most part, the definitions of the individual DSM-IV disorders are based upon the covariation of symptoms (descriptive validity) and without any clear gold standards for choosing items or setting thresholds based on other forms of validity. Many (if not most) of the DSM-IV disorders are heterogeneous, lack clear boundaries with near neighbors, and could as well have been defined with alternative items or thresholds that had essentially equal claims to validity. We must, therefore, avoid reifying the existing DSM-IV categories, thresholds, or definitional items. It is possible that with greater understanding, new disorders and new combinations will emerge and that many of the descriptively defined DSM-IV disorders will cease to stand on their own. Many DSM-IV categories that now appear as separate disorders may be reunited when eventually it is determined that they have a shared etiology or pathogenesis. Other DSM-IV descriptive categories may be divided further based on discoveries about their etiological heterogeneity. There is no reason to assume that the current descriptive classification follows nature with any degree of precision.
There is also the misunderstood issue of comorbidity. We cannot assume that so-called comorbid disorders necessarily have separate and independent pathogeneses. Many disorders that appear to be comorbid may instead be no more than the split descriptive parts of a more complex syndrome or may reflect a definitional artifact resulting from the fact that many items appear in more or less equivalent form in the definitions of more than one disorder. There are five factors inherent in the DSM system that enhance (and perhaps artifactually elevate) comorbidity: (a) the narrowly defined criteria sets, (b) the large number of distinct diagnoses, (c) the explicit diagnostic criteria, (d) the use of structured interviews, and (e) the removal of diagnostic hierarchies. Ever since DSM-III, our system of classification has been a splitter's dream. However, we can expect a gradual return to the lumping together of at least some descriptive categories once we know more about shared etiology (16, 12).
Before DSM-III, the reliability of psychiatric diagnosis was limited by the lack of widely accepted and standardized diagnostic criteria and assessment instruments (8, 27, 29). Low reliability made it difficult to interpret (and impossible to generalize) the results of psychopharmacological trials. The development of explicit and reliable diagnostic criteria sets in the Feighner criteria (11), the Research Diagnostic Criteria (RDC) (28), and DSM-III (2) was a necessary prerequisite for the development of meaningful psychopharmacological research. The DSM-IV reliance upon carefully conducted field trials to select item sets with optimal demonstrable reliability represents one further step in the process of enhancing the reliability of psychiatric diagnosis.
Despite these achievements, we must be aware of certain limitations. There has been a considerable controversy regarding the claims made for the reliability that can be achieved using the DSM system. Kirk and Kutchins (23) have argued that the DSM-III field trials were performed and reported in an inconsistent manner that did not truly document the reliability of the DSM-III criteria sets. Moreover, they have correctly pointed out that the reliability achieved has been much greater when measured for the wider DSM sections (e.g., mood disorder or anxiety disorder) than for the more specific disorders contained in those sections (e.g., dysthymic disorder or generalized anxiety disorder).
The Kirk and Kutchins critique is much more germane to reliability measured in general clinical practice (i.e., the conditions that were approximated in the DSM-III field trials) than to the reliability measured in psychopharmacological studies. In the carefully controlled, somewhat "hot house" environment of a typical treatment outcome study, highly selected patients are evaluated by expert and highly trained interviewers who are very familiar with the diagnosis being studied and with one another and are using systematic and standardized assessment instruments. Under these conditions, it is almost always possible within any given site and/or any given diagnosis to achieve satisfactory to excellent reliability using the DSM system.
The more difficult issue for psychopharmacology is the degree of generalizability of the reliability achieved in one site to multiple sites or to more general clinical practice. For multisite studies, it is necessary to provide cross-site training and reliability testing (often using videotapes) as a means of ensuring that the reliability in any given site is not merely home grown. This problem has generally received insufficient attention in the implementation and reporting of multisite trials and certainly should be built into every multisite design and report of methods. An even tougher question is whether results generated in the very specific conditions of a clinical trial are generalizable to the rough and tumble of everyday clinical practice. One possible implication of this issue is the importance of measuring reliability in health services research studies and in Phase IV clinical trials that can be conducted in a wide variety of settings that more closely approximate the conditions in which patients are routinely seen.
Whereas a low reliability certainly sets the ceiling on our ability to determine validity, a high reliability by no means implies that we are engaged in making valid, or even interesting, distinctions. Kendell points out that "reliability can be very high while validity remains trivial and in such a situation high reliability is of very limited value" (21). An excessive devotion to reliability may in fact distract us from the more fundamental and meaningful validity questions if we focus too exclusively on polishing our methodological lenses rather than looking through them. We must therefore move forward in applying our more or less reliable diagnostic system to those research questions that will further our understanding of etiology and pathogenesis.
There is also an urgent need to understand more about the performance characteristics, beyond the descriptive level, of the DSM-IV items, thresholds, and disorders. It must be admitted that, despite the accumulating literature, there are very few diagnoses for which there are meaningful gold standards of validity. We often have little evidence to determine what is, and what is not, predicted by the various descriptive diagnoses. In conducting the three-stage process of empirical review for DSM-IV, we became aware of how relatively few data were available to guide the choice between alternative options regarding the specific definitional items that should be chosen for the criteria sets or the possible alternative thresholds to determine the presence or absence of a disorder. There is an important lesson in this for psychopharmacological research. In designing projects, we should avoid giving too much weight to the DSM-IV system as it now stands and take a broader overview. Although the DSM-IV is extraordinarily useful, it certainly does not deserve reification and we must avoid premature closure. This suggests three precautions that might inform the design and implementation of clinical trials: (a) Researchers should cast a much wider net of items in their assessment batteries beyond the items contained in the DSM-IV criteria sets and should not assume that the DSM-IV items provide the only useful predictor variables (additional items may be drawn from the Associated Features section of the text for each disorder). (b) The prediction of outcome and determination of indications can be improved by assessing for the comorbid near-neighbor symptoms or disorders that may one day become known as part of the disorder within a more complex syndrome. (c) Finally, efforts at data analysis should begin by using the DSM-IV algorithms but should also proceed in an exploratory fashion to test different possible thresholds and different defining items. To the degree that psychopharmacological research is limited to the DSM-IV approach, there will be numerous delays in learning more about how to improve it.
The preparation of DSM-IV helped us to identify any number of gaps in our knowledge base and suggested many areas of future research. These are indicated in the various sections that comprise the DSM-IV Sourcebook (5). The proposals that were made for the possible inclusion of new diagnoses in DSM-IV are of particular relevance to the search for new indicators that is so important in psychopharmacological research. The conservative mandate to the DSM-IV Task Force caused it to hold the line against the proliferation of new disorders (26). The Task Force accepted for official classification only a handful of the more than 100 suggested new diagnoses. Many of the new suggestions had been based on interesting clinical observation and/or preliminary research findings but were without sufficient empirical data on the descriptive characteristics and validity of the proposed categories to warrant inclusion. Appendix B of DSM-IV provides a list (see Table 1) of the Suggested New Diagnoses and Criteria Sets that did not yet have sufficient documentation to warrant inclusion in the classification but did show potential for further study.
Many of the suggested new categories are conditions that are subthreshold to existing DSM-IV categories. Including these conditions as official mental disorders would have had the effect of widening the definition of caseness and the sensitivity of the DSM-IV diagnostic system (thus decreasing false negatives) but at the cost of decreasing the specificity of the system and increasing false positives. Until there are many additional systematic studies to determine how the suggested subthreshold disorders respond to treatment, we cannot accurately measure the relative utility costs of false negatives versus those of false positives. It seems likely that the psychopharmacological community will benefit from studying at least some of these potential diagnoses and psychiatric classification will benefit from accumulated knowledge regarding treatment response. Some such studies are indeed in progress.
We summarize briefly the status of three proposed new categories (minor depression, brief recurrent depression, and mixed anxiety depression) that are particularly important to consider as potential new indications in clinical trials. The thresholds for severity and duration of major depression established in DSM-III and DSM-III-R were necessarily arbitrary and not based on any strong validation. It is therefore not surprising that a number of studies, particularly those performed in primary care and community settings, report that many patients who fall short of the DSM thresholds for mood disorder nonetheless exhibit clinically significant impairment as measured by functional disability and health care utilization. Three different types of subthreshold depression have been identified. Minor Depression is subthreshold in symptom severity to Major Depressive Disorder (22, 32). Brief Recurrent Depressive Disorder is subthreshold to Major Depressive Disorder with regard to duration and consists of many episodes in a year, each meeting the full symptom severity criteria of Major Depressive Disorder but lasting for only a few days (6). Mixed Anxiety Depression is characterized by a combination of dysphoric symptoms of anxiety and depression in individuals who fail to meet syndromal criteria for any specific anxiety disorder or depressive disorder (20).
There were several reasons that minor depressive disorder, brief recurrent depressive disorder, and mixed anxiety/depressive disorder were not included as official categories, but rather in the Appendix for Criteria Sets and Axes Provided for Further Study (Table 1): (a) Their inclusion might result in unnecessary treatment for the false positives. (b) Subthreshold categories may trivialize the construct of mental disorder and artificially inflate prevalence rates. (c) The evidence suggesting that subthreshold conditions are associated with significant impairment is difficult to interpret because the subthreshold diagnosis may have resulted from inadequate assessments. (d) The research on these conditions is incomplete and there are virtually no treatment studies. Indeed, at least some of the patients with subthreshold diagnoses might have met the criteria for major depression or an anxiety disorder were they diagnosed more carefully. On the other hand, it must be noted that the treatment implications of the proposed categories are unknown. We may be depriving patients of effective treatment by not recognizing the given category and/or we may be protecting the individuals from unnecessary and ineffective treatment. The suggestions for these subthreshold categories raise the fundamental question of how best to define the boundary between psychopathology and normality.
DSM-III was an innovative system. Unlike DSM-II, it focused on descriptive diagnosis and provided explicit diagnostic criteria (7, 30). In many ways this aided, and was aided by, the knowledge derived from psychopharmacology. To avoid disrupting research, DSM-IV has chosen to be a conservative system and changes were made only when convincing empirical evidence could be marshaled (26). Pharmacological research will play an important role in gradually replacing the DSM descriptive system with one that is increasingly based on etiology. This will occur through studies that expand our knowledge of treatment response and underlying mechanisms. DSM-IV provides a necessary common language for current study and practice, but should not be subject to reification or promote premature closure. Our knowledge will be enhanced to the degree that we also study additional definitional items, thresholds, and diagnoses that have not been included in DSM-IV. The diagnostic system and psychopharmacology will continue to mature with one another.
published 2000