Neuropsychological Assessment of Patients with Alzheimer's Disease

Additional related information may be found at:

Neuropsychopharmacology: The Fifth Generation of Progress

Back to Psychopharmacology - The Fourth Generation of Progress

Richard C. Mohs

INTRODUCTION

The aims of this chapter are to review the cognitive and behavioral abnormalities associated with Alzheimer's disease (AD), to review instruments used to measure those impairments, and to describe how those instruments can be used to evaluate treatments. During the past 20 years and especially during the past 10 years a great deal has been learned about the pathophysiology of AD and related dementias, and this knowledge has led to the development of many potential treatments for the cognitive impairments which are the hallmarks of AD (see Atypical Antipsychotic Drugs). In addition, there is substantial interest in the use of drugs which are already available but designed for other indications (such as psychosis, depression, or anxiety) as therapeutic agents for the management of AD patients (see Biological Markers in Alzheimers Disease). The efficacy of all of these agents is determined, in large part, by their effects on the behavior of AD patients, and many different instruments have been used to assess behavior of AD patients in clinical trials. The present review starts with a brief overview of the abnormal behaviors associated with AD and distinguishes the core cognitive abnormalities of AD such as memory loss, dysphasia, and dyspraxia from other symptoms such as agitation, depression, and anxiety, which are not invariably present. The roles of different assessment methods (including neuropsychologic tests, psychiatric rating scales, global clinical scales, and functional scales) in evaluating treatments for AD are discussed with emphasis on the need for neuropsychologic assessment in trials of drugs designed to treat cognitive symptoms. Desirable characteristics of instruments for evaluating AD treatments are then introduced, followed by a discussion of specific neuropsychologic and other assessment instruments along with data on the extent to which they have the desirable reliability and validity characteristics. The review indicates that existing instruments for evaluating cognitive symptoms of AD are adequate to evaluate most drugs proposed for the treatment of AD's cognitive symptoms, but areas of continuing controversy remain. Available instruments for evaluating psychiatric symptoms, functional capacity, and global clinical utility are less well developed, and considerable work needs to be done before these aspects of AD can be evaluated with confidence in clinical trials.

CLINICAL SYMPTOMS OF ALZHEIMER'S DISEASE

Cognitive Symptoms

Alzheimer's is a disease characterized by a progressive loss of cognitive functions including memory, language, praxis, judgment, and orientation (50). Modern diagnostic criteria for AD require that a patient be given a diagnosis of AD only if they have a progressive loss of memory and at least one other cognitive function sufficient to interfere with social or occupational functioning (2, 37). The terminology used to describe the cognitive functions lost in AD varies among clinical investigators, but there is general agreement on two points. The first is that the disorder is progressive (50), and the second is that, if a patient lives long enough after disease onset, all of the cognitive functions dependent upon the neocortex—particularly memory, language, praxis, and orientation—will be impaired (26, 32, 50). These defining deficits are referred to in this review as the core cognitive deficits of AD. Drugs that are designed to treat these core cognitive deficits are referred to as antidementia drugs.

Psychiatric Symptoms

Along with these cognitive abnormalities, however, AD patients have a variety of other symptoms and behaviors. Table 1, reproduced from Thal et al. (61), lists the symptoms and abnormal behaviors observed in a large group of AD patients, patients with vascular dementia, and patients with mixed AD and vascular dementia when they were first evaluated in a research clinic. As expected, virtually all patients with AD or vascular dementia had a prominent memory impairment, and many had clinically evident deficits in other cognitive areas. Many patients also had significant psychiatric symptoms such as depression, psychosis, agitation, and personality change. There is considerable controversy over the extent to which these symptoms are typical of AD and the way in which these symptoms change during the course of AD (26). What is evident, however, is that these symptoms, when present, are troublesome and produce excess disability in AD patients (60). In current clinical practice a variety of available psychotropic agents are used to treat these symptoms (see Biological Markers in Alzheimers Disease). However, drugs that alleviate such symptoms but do not improve cognition are not antidementia drugs. They are used very differently from antidementia drugs in clinical practice, and new agents to treat psychiatric symptoms must be evaluated with different procedures than are antidementia drugs.

Functional Impairments

Both the cognitive and the psychiatric symptoms of AD lead to functional impairment manifested by poor performance in everyday activities of daily living (ADLs) such as feeding, dressing, holding a job, doing household chores, and managing money. These functional impairments rather than symptoms, per se, are often the most troubling aspects of AD because they result in loss of autonomy for the patient, increased need for care by family members or professional caregivers, and economic hardship (60). Consequently, it is important both in studies of antidementia drugs and in studies of drugs for the psychiatric symptoms of AD to consider the effects of those treatments on patient functioning. As indicated below, functional measures often serve as a useful adjunct to measures of symptom severity in trials of drug treatments for AD.

Approaches to Assessment

Given the wide variety of symptoms and behavioral abnormalities associated with AD, it is not surprising that many different types of assessments are used in clinical trials. Roughly, these assessments can be grouped into two categories, each with its own rationale and historical development. The two categories are illustrated in Table 2, adapted from a review by Yesavage et al. (67) of the clinical trials of vasodilators done prior to 1979. Although the vasodilators are no longer considered viable treatments for AD, the table indicates one of the major problems encountered in trying to evaluate studies of antidementia drugs conducted prior to 1979—namely, the fact that there was little agreement about how to assess the efficacy of such drugs. The table presents all of the outcome measures used in two or more clinical trials. No instrument was used consistently enough to allow comparisons across studies, and most of the commonly used instruments (e.g., the WAIS) were not designed to evaluate symptoms or behaviors associated with AD. Almost none of the instruments listed in Table 2 are still used in trials of treatments for AD, but the grouping of these assessments into those involving psychological tests and those involving psychiatric ratings illustrates the two general approaches to assessment still followed today. Performance-based neuropsychological tests in which the patient is actually required to perform specific tasks are generally used to assess drug effects on the core cognitive symptoms of AD. Clinician ratings based on interviews of the patients, caregivers, or others are used to assess drug effects on psychiatric symptoms and ADLs and to provide global clinical assessments. As indicated below, both of these approaches have important, but complementary, roles to play in the evaluation of treatments for AD.

NEUROPSYCHOLOGICAL ASSESSMENT IN DIAGNOSIS

Part of the diagnostic process for patients with AD or other dementia is to document that the patient has a cognitive impairment. Several neuropsychologic batteries have been shown to be useful for screening persons at risk for AD and for evaluating patients during a diagnostic work-up. However, neuropsychologic tests do not, by themselves, permit a diagnosis of AD or other dementia. Furthermore, the characteristics of a good screening test or diagnostic aid for dementia do not necessarily make it a good instrument for use in clinical trials. Most current diagnostic criteria for AD (37) require that a patient be evaluated with some standard mental status examination before a diagnosis of AD is made. Community screening studies (27) indicate that poor performance on a standard cognitive screening instrument may be the best single predictor of who is likely to develop a dementing illness in the near future. Clinical studies (64) indicate that specific neuropsychologic measures can distinguish even mildly demented persons from matched normal controls very reliably. The tests most sensitive to early AD are delayed recall memory measures (64), while tests of speeded psychomotor performance and tests of verbal ability also show decrements very early (57, 65). Many different kinds of dementia produce similar cognitive impairments, however, and therefore neuropsychologic tests are relatively poor at distinguishing different types of dementia in the absence of a complete clinical examination. The difficulty is exemplified by the data in Table 1 which indicate that vascular dementia and AD share most of the same cognitive abnormalities. Performance on neuropsychologic tests is also very much affected by education (24), age (16), cultural background (16, 24), illnesses other than AD (16), and situational factors, so that a poor score on a test must be examined in light of all these clinical variables before a diagnosis of dementia is made.

Thus for a dementia screening test it is important to have information about how these factors affect performance so that they can be factored into clinical decision-making. For outcome measures in clinical trials, this information is often less crucial because performance is compared between groups of patients (rather than between individuals) who are randomized to different treatment conditions. Other factors, including the availability of longitudinal data and the availability of alternate forms, are more important for determining the utility of treatment assessment instruments.

NEUROPSYCHOLOGICAL TESTS IN CLINICAL TRIALS

Clinical and Regulatory Issues

Neuropsychological tests are an essential part of the evaluation of antidementia drugs but are of secondary importance in evaluating drugs designed to treat the psychiatric manifestations of AD. In ordinary clinical practice the severity of cognitive impairment in AD or other dementia is best evaluated with neuropsychological tests. A formal assessment of cognitive functioning with some performance-based test is used both as part of a standard diagnostic evaluation (37) and to monitor progression of illness (18). Recently developed regulatory guidelines for the development of antidementia drugs mandate that an antidementia drug be superior to placebo on two types of measures: (i) a performance-based measure of cognitive function and (ii) an independent clinician-rated measure of global severity (10, 34). Originally developed in the United States following extensive discussion among many clinicians involved in clinical trials of antidementia drugs (1!popup(ch130), this "dual outcome" strategy was adopted to ensure that any approved antidementia drug would improve the core cognitive symptoms of AD and that the magnitude of improvement would be large enough to be clinically significant (34). The pivotal studies used to demonstrate the efficacy of tacrine, the only drug yet approved as an antidementia drug in the United States, utilized outcome measures designed to satisfy this requirement (11, 14). Guidelines for the Community of European Nations (10) which are currently under review have adopted this dual outcome strategy with some modifications, particularly in that they permit a wider variety of measures to be used to assess the clinical impact of a drug.

At present there are no formal regulatory guidelines either in the United States or in Europe for the development of drugs to treat the psychiatric symptoms associated with dementia. Presumably, neuropsychologic tests would play a lesser role in the evaluation of such drugs because their primary target symptoms would be agitation, depression, psychosis, and anxiety, all of which are traditionally evaluated with rating scales completed by a clinician following an interview (Biological Markers in Alzheimers Disease). It is reasonable, however, to expect that cognitive assessments would be included in trials of such agents to determine whether the drug improved, worsened, or had no effect on cognition.

Criteria for Neuropsychological Tests

To be used successfully in clinical trials of drugs for the treatment of dementia, a neuropsychologic test battery must meet certain criteria. In many respects, these criteria are similar to those which must be met by instruments used satisfactorily to evaluate other psychoactive agents such as neuroleptics, antidepressants, or drugs for movement disorders. One difference between the antidementia field and other areas of psychopharmacology, however, is that in many other areas, effective drugs were developed before the most commonly used assessment tools were accepted as valid measures of efficacy. This enabled instruments such as the Hamilton Depression Rating Scale (23) and the Brief Psychiatric Rating Scale (43) to be validated in studies with treatments for depression or psychosis, respectively, that were already accepted as effective. The first drug approved as effective in the treatment of AD was only approved in 1993, and this drug has relatively modest beneficial effects (11, 14). Consequently, another measure of validity had to be used to assess potential instruments. Virtually all clinicians and clinical follow-up studies agree that AD is a progressive condition, and any instrument that is a valid measure of clinical severity must reflect this change over time. As a result, longitudinal studies have played an important role in establishing the utility of instruments for evaluating patients with AD.

Following is a list of some properties that any instrument used to evaluate treatments for AD should have. The list expands upon the criteria originally proposed by Mohs et al. (40). Although the criteria are specifically designed for performance-based cognitive assessments, they are, with minor modifications, applicable to other types of assessments for AD as well. There are two reliability criteria:

1. The instrument should have high inter-rater reliability.

2. The instrument should have high retest reliability. In studies of cognitive instruments the interval for evaluating retest reliability is usually 1–4 weeks because a patient's cognitive status is not likely to change substantially in that period. However, the appropriate retest interval for psychiatric assessment instruments is probably shorter because these symptoms may fluctuate more quickly.

There are two practicality criteria:

3. The instrument should be brief enough to be completed in 1 hr or less.

4. Alternate, but equivalent, forms should be available so that patients can be tested repeatedly. This criteria applies only to cognitive tests.

There are three validity criteria:

5. The instrument should measure all of the major symptoms judged to be clinically important; and if instruments are to be used in pivotal trials, they should yield a single overall symptom severity score.

6. The instrument should be suitable for use in AD patients with a broad range of dementia severity because patients enter clinical trials with different baseline levels of dementia.

7. The instrument should measure increases in symptom severity known to occur as patients progress longitudinally. This criterion does not apply to instruments for psychiatric symptoms because the symptoms do not invariably worsen as the disease progresses.

Comparison of Available Instruments and Batteries

Table 3 presents a list of some of the neuropsychological test batteries commonly used to assess cognition in clinical trials of drugs for AD. The table also lists the seven criteria for such tests mentioned above and gives a rating of the extent to which each test satisfies the criteria. While it is hardly exhaustive, this list includes all of the tests used very frequently and also includes a variety of different types of tests so that the strengths and weaknesses of various approaches can be discussed.

1. The Mini Mental State Exam (MMSE; see ref. 17). This is an 11-item test with a total score of 0 (severe impairment) to 30 (no impairment). It is probably the most widely used screening instrument for dementia in the world (18), and it has been used in clinical trials. It includes very brief assessments of memory, language, praxis, and orientation. Major strengths of the MMSE are its coverage of a variety of relevant cognitive areas in a very brief test and the fact that a large amount of cross-sectional and longitudinal data are available. Weaknesses are that it is probably too brief, particularly in its assessment of memory, to be very sensitive, and, because there are no alternate forms, nonspecific carry-over effects make it difficult to use in trials where patients are assessed many times.

2. Blessed Test of Information, Memory, and Concentration (BIMC; see ref. 4). This is a short test which assesses primarily orientation and memory. Originally developed in Great Britain as a 37-point scale, most American investigators use adaptations with 27 items and a total score ranging from 0 (no impairment) to 33 (severe impairment). Strengths of this instrument are that it is very brief, that scores on the Blessed test have been correlated with both the neuropathologic (4) and neurochemical (45) abnormalities of AD, and that extensive longitudinal data are available (53). Principal weaknesses of the scale are that it does not cover all of the cognitive symptoms evident in AD patients, that it is so brief that it is relatively insensitive (54), and that there are no alternate forms to enable repeated administration in clinical trials.

3. Alzheimer's Disease Assessment Scale—Cognitive Portion (ADAS-Cog; see ref. 49). This is probably the most widely used cognitive assessment instrument in clinical trials of antidementia drugs done in the United States (11, 14) and has been recognized by the U.S. Food and Drug Administration (34). Although less well known outside the United States, this scale has been used extensively in Europe (7, 66), where it has been recognized as acceptable by the Commission of the European Communities (10), and in Japan (39). Table 4 presents a list of the items included in both the cognitive and noncognitive portions of this scale. The cognitive portion includes seven performance items and four clinician-rated items assessing memory, language, praxis, and orientation, with a total score ranging from 0 (no impairment) to 70 (severe impairment). The noncognitive portion includes 10 clinician-rated items assessing psychosis, agitation, depression, and other abnormalities. Strengths of the scale are its broad coverage of relevant cognitive domains, its widespread use, the availability of alternate forms for the memory tests, and the availability of extensive longitudinal data (54). Weaknesses are that it is somewhat long (approximately 45 min per administration) and that severely demented patients cannot be assessed with this instrument.

4. Syndrome Kurtztest (SKT; see ref. 13). This is a timed test with 60 sec allowed for each of nine subtests. The tests are designed to assess memory, attention, naming, and object arrangement. The test was originally developed in Germany, where it has been administered to hundreds of persons (13). Recently, an English-language version was developed and used in a large clinical trial with AD patients (29). Strengths of the test are its brevity and inclusion of items specifically designed to measure attention. The principal weakness of the SKT is that the scale is suitable only for patients with very mild symptoms. There are few longitudinal data on the SKT and the fact that it is a timed test suggests that performance on many items may be difficult to interpret.

5. Mattis Dementia Rating Scale (see ref. 8). This is an instrument with five subscales measuring attention, initiation and perseveration, conceptualization, construction, and memory. The items are administered in a stepwise fashion such that patients who make errors on simple items do not receive more complex ones. The usual time for administration is about 45 min, and the total scale is from 0 (no impairment) to 144 (severe impairment). Strengths of the scale are its broad coverage of relevant cognitive domains and its inclusion of items to assess attention. The instrument has no alternate forms for repeated testing. Although the instrument has been used to assess severity of demented patients in clinical studies (8) and in some longitudinal studies (51), the scale has not been widely used in treatment trials.

6. Neuropsychological Battery of the Consortium to Establish a Registry for Alzheimer's Disease (CERAD). The CERAD project was funded by the U.S. National Institute on Aging (NIA) with the aim of developing standardized methods for the clinical (42), neuropsychological (64, 65), neuropathological (38), and neuroradiological (12) evaluation of patients with AD. The CERAD neuropsychological battery consists of seven subtests including the MMSE and three others which are adapted from the ADAS-Cog. These tests assess memory, language, praxis, and orientation. Because the tests were designed to characterize patients along different dimensions, there is no established algorithm for calculating a single dementia severity score. Advantages of the CERAD battery are its broad coverage of cognitive domains, applicability to a broad range of dementia severity, the availability of extensive longitudinal data (41), and the utility of the battery for measuring symptoms in early AD (64, 65). Disadvantages are that alternate forms are not readily available, the lack of an obvious summary measure, and its extensive overlap with other instruments.

7. New York University Computerized Test Battery (NYU Battery; see ref. 15). This is a collection of 12 tests that have been adapted for administration by computer. The battery is quite long and measures memory, language, concept formation, psychomotor speed, and attention. The tests in this battery are primarily adaptations of tests used to measure the effects of aging and, as a result, are suitable for nondemented aged and mildly demented persons but not for moderate or severe AD patients. Some of the tests are designed to simulate real-world memory tasks such as telephone number recall, and others were designed as human analogs of tests used to evaluate drug effects in nonhuman primates. There is no standard method for calculating an overall dementia score, and few longitudinal data have been published (15). The primary value of the battery may be as a tool for assessment early in drug development when there is a need to investigate possible drug effects on a broad array of cognitive functions and where statistical concerns about multiple outcome measures are not critical.

8. Everyday Memory Battery (see ref. 31). This is a collection of 14 memory and attention tests which are administered by computer. The tests were selected to mimic real-world activities such as telephone dialing and remembering of faces. Like the NYU Battery, this battery is suitable for nondemented elderly and mildly impaired patients but not for moderate or severe AD. There is no summary dementia score, and the battery does not cover all of the domains impaired in AD. This battery may also have some utility in the early phases of drug development when highly selected patients with mild dementia can be assessed on a broad array of measures.

9. Wechsler Adult Intelligence Scale (WAIS; see ref. 63). Until recently, this test was often used to assess deficits in patients with AD (67). It consists of five verbal subtests including vocabulary and general information and four nonverbal subtests including block design and digit symbol substitution. The test is very widely used as a general measure of intelligence, so extensive population data are available. The test has almost none of the features desirable in an assessment instrument for AD. It does not cover the major cognitive symptoms of AD (particularly memory), it is not suitable for a broad range of demented patients, there are no alternate forms, and few longitudinal data are available in AD patients. Fortunately, few recent clinical studies have used tests such as this.

Longitudinal Data and Their Implications

Extensive longitudinal data have been collected on several of the tests listed above, particularly the MMSE (18, 41), the BIMC test (18, 53, 61), the ADAS-Cog (30, 54), and the CERAD battery (41). All of these tests have been shown to measure deterioration in AD patients followed over time and to be relatively insensitive to age-related cognitive changes in nondemented elderly persons. Several studies have attempted to identify demographic or clinical factors that might predict rate of cognitive decline. With a few exceptions (36), most studies have not found any relationship between rate of deterioration and gender, age of onset, or presence of family history of dementia (28, 51, 53). While there may be some tendency for early-onset patients to have a more severe language and praxis impairment relative to memory impairment (6), these differences at initial presentation do not alter the longitudinal course. An implication is that we have, at present, no rationale for stratifying patients at entry into clinical trials on the basis of demographic factors. While there is some evidence that psychosis early in the disease may be predictive of more rapid deterioration (56), most studies of antidementia drugs do not include psychotic patients.

With all of these instruments, the reliability of measured change increases with longer follow-up. This has implications for the design of clinical trials, particularly those in which the aim is not to determine the acute effects of a drug, but to determine the ability of the drug to alter the course of the illness. Relevant data from Stern et al. (53) for the BIMC illustrate this point very clearly. Their data were obtained in a study of 111 patients assessed every 6 months. Over a 12-month follow-up the mean change on the BIMC test was 4.1 points with a standard deviation of 4.1 points. Over a 6-month follow-up period the mean change was 2.2 points, with a standard deviation of 3.2 points (53). Power calculations were done to determine the sample sizes needed to detect the effect of a drug which slows the rate of deterioration by one-half relative to placebo. Regardless of other parameters, the sample size needed for such a study was over twice as large when drug and placebo patients were compared for 6 months than when they were compared for 12 months. This results from the fact that the mean change relative to the standard deviation of change increases dramatically the longer patients are followed. Similar points have been made using MMSE (41), CERAD (41), and ADAS-Cog (54) data. For clinical investigators this implies that they can perform shorter studies with larger sample sizes or longer studies with smaller sample sizes.

Annual rates of progression have been published for several of the commonly used instruments. Table 5 presents a summary of some of the rates along with measures of variability. This table is not an exhaustive list but does include nearly all of the studies with large enough samples to yield stable estimates. Because the CERAD battery does not have an obvious summary score, annual rates of change for three selected subtests are presented. In most instances the standard deviation of the annual change is equal to or greater than the mean annual change. For the ADAS-Cog the standard deviation is slightly less, possibly indicating greater sensitivity of this instrument. However, the instruments do not all cover the same range of symptom severity, with the BIMC and MMSE probably covering the smallest severity range. One direct comparison of the range of two instruments compared the ADAS-Cog and the BIMC. This study found that the ADAS-Cog was more sensitive both to mild dementia and to change in severe dementia than was the BIMC (54). One study found that in clinically demented persons with MMSE scores above 24, portions of the CERAD battery were still able to detect marked impairment (64). The greater length of the ADAS-Cog and the CERAD battery relative to the MMSE and the BIMC test enable them to have items covering a greater range of dementia severity.

Rate of change for these tests is not independent of baseline dementia severity. For the BIMC test the rate of change is linear over most of the scale's range (53), but the rate of change slows for severely demented persons who are reaching the limit of the scale. For the other tests there is evidence that rate of change is slower both for mild dementia and for severe dementia than for moderate dementia. This phenomenon is illustrated by the graph in Fig. 1 [adapted from Stern et al. (54)], which shows annual change in the ADAS-Cog as a function of baseline. Annual change was a curvilinear function of baseline such that expected annual change went from less than 5 points for mild patients to a maximum of nearly 13 points for baselines of 35–40 and again decreased to 5 for baselines of over 60. A similar phenomenon was observed in the CERAD study when MMSE scores were plotted as a function of overall dementia severity during the follow-up period (41). Expected annual MMSE change was approximately 2 points for patients whose mean MMSE level during follow-up was over 20, annual change was nearly 5 for patients whose overall level was 10, and annual change was approximately 2 for patients whose level was 5. What this means is that the expected amount of deterioration for patients in a clinical trial will depend heavily upon dementia severity at entry. Consequently, stratification by severity will be wise, and these scales may be most sensitive to change, not in the most mild patients but in moderately demented patients.

RELATIONSHIP TO OTHER OUTCOME MEASURES

Global Change and Staging Instruments

These are instruments which provide an assessment not only of the patient's cognitive status, but of the patient's overall clinical condition as well. Staging instruments such as the Clinical Dementia Rating Scale (CDR; see refs. 3 and 5) and the Global Deterioration Scale (GDS; see ref. 47) evaluate patient's according to fixed external standards, whereas change instruments such as the Clinical Global Impression of Change (CGIC; see ref. 22) rate the patient relative to their own previous condition. In the evaluation of antidementia drugs, these instruments provide one way of assessing the clinical impact of proposed treatments that have been shown to improve cognitive symptoms as measured by neuropsychological tests (10, 34). In and of themselves, global instruments cannot be used to demonstrate that a drug has antidementia effects, because a variety of treatments that do not improve cognition might have globally beneficial effects in demented patients; this would include (a) neuroleptic drugs for agitated patients and (b) antidepressants for AD patients with depressive symptoms. There is considerable debate about whether staging instruments or change measures are best for this purpose and about the kinds of information that should be considered in assigning a global score. There is agreement that the global measure should be done independently of the neuropsychological test measure (10, 34).

When administered according to specified procedures, global scales can be highly reliable (5), although procedures for administering most are not well-specified. Scores on staging instruments correlate highly with performance-based cognitive measures (42) in AD patients, and they also can be used to document longitudinal change (3, 41). Few studies have been done to evaluate the reliability and sensitivity of global change measures such as the CGIC in AD. In theory, change measures might be more sensitive than global staging instruments because they are, by definition, adjusted for baseline. Some trials of antidementia drugs last a year or more, however, so that elaborate procedures for completing global change measures might be necessary to remind the rating clinician of the patient's baseline condition and to minimize the effect of intervening patients. Clearly, additional work needs to be done to evaluate the factors which make these instruments more or less reliable and more or less valid. Recent clinical trials have shown that even modestly effective cholinergic drugs (see Chapter 133) can be shown to have beneficial effects measured with unstructured global change instruments.

Functional Measures

These are scales designed to measure the patient's ability to perform everyday activities of daily living (ADLs). The primary use of such measures in clinical trials of drugs for AD is to document the functional or clinical impact of either antidementia drugs or drugs used to manage psychiatric symptoms. The fact that ADLs are impaired in AD is well known, but at present no ADL instrument has been accepted as a valid measure of clinical significance for antidementia or other drugs (10, 34). This probably results from the fact that no ADL instrument has been shown both to capture all of the major functional impairments in AD patients and to measure progression of those impairments over a broad range of severity.

Most ADL instruments are derived, at least in part, from the work of Katz et al. (25), who identified and developed a scale to measure six basic ADLs (feeding, toileting, dressing, physical ambulation, bathing, and grooming) which represent biologically necessary activities common to all persons. Later functional assessment instruments designed for older persons include many of these basic ADLs, but also contain clinician ratings of cognitive symptoms such as memory impairment or language loss (21, 35), ratings of psychiatric symptoms such as personality change (4), or ratings of higher-level ADLs such as managing money, driving, and shopping (52). If a functional assessment instrument is to be used to provide independent validation of the effects of a drug designed to treat cognitive or psychiatric symptoms, then it would not be desirable to include items that also evaluate cognitive or psychiatric symptoms in the functional scale. While they may have some utility as global measures in some situations, functional scales with cognitive or psychiatric symptom items are not useful validators for clinical trials.

The distinction between universal, biologically necessary ADLs such as feeding and dressing and higher-level ADLs performed by most (but not all) individuals was formalized by Lawton and Brodie (33). They developed two scales: (i) the Physical and Self-Maintenance Scale (PSMS), which evaluates basic ADLs, and (ii) the Instrumental Activities of Daily Living Scale (IADLS), which evaluates higher-level activities that tend to be more specific for certain groups of individuals. The PSMS is similar to the original ADL scale of Katz et al. (25), whereas the IADLS includes items such as "shopping," "doing laundry," and "handling finances." Several studies have shown that these higher-level, instrumental activities are impaired in patients with AD (18, 52). A recent study of 104 patients followed for up to 8 years demonstrated the strengths and weaknesses of these two scales (19). While all of the PSMS items had high reliability, raters had substantial difficulty in rating many of the IADLS items, particularly for men, because they tended not to perform these activities even prior to onset of dementia (33). Thus, more universally applicable IADLS items would be desirable. The difficulty with using the PSMS alone is illustrated by the longitudinal data in Fig. 2, which presents the 12-month change scores plotted against the BIMC test scores obtained at the start of each follow-up interval. For follow-up intervals beginning when patients were mild to moderately demented (i.e., BIMC scores < 25) there was usually little change on the PSMS, indicating that the PSMS would be relatively insensitive to change in most clinical trials. Scores on the IADLS began to change substantially even for follow-up intervals beginning when patients had mild dementia, but tended to change very little in moderate-to-severe patients largely because of a ceiling on the scale. Thus, to be sensitive over a broad range of disease severity, an ADL scale for use in clinical trials would have to include items to assess both basic and instrumental ADLs, with items designed to be less gender-specific than those of the IADLS.

Psychiatric Rating Scales

These are scales designed to assess symptoms such as depression, agitation, psychosis, anxiety, and personality change in patients with AD. While there is general agreement that such symptoms are of clinical significance in patients with AD, it is not yet clear how prevalent these symptoms are, how best to assess them, or how they are related to other aspects of AD. Because the different types of psychiatric symptoms are likely to be treated with different drugs, it will probably be necessary to have different items or instruments for each type of symptom. It is not possible to review all of the issues involved in evaluating these symptoms here, but a few general principles involved in selecting instruments for evaluating these symptoms should be mentioned. First is that AD patients are, by definition, demented, so that traditional rating scales which rely upon information from the patients themselves are generally not appropriate for AD patients. Secondly, because these symptoms may be different in AD patients than in other diagnostic groups, the assessments used for AD patients should be shown to measure symptoms in AD patients.

Depressive symptoms are common in AD patients (46), but the prevalence of major depressive disorder is probably quite low (20, 48). Some (44), but not all (55), studies find a relationship of depression to functional impairment in AD. Depressive symptoms are usually found to be more common in the early stages of AD (44), but they can appear in severe cases (20). Newer instruments for evaluating depression in demented patients, such as the NIMH Scale for Depression in Dementia (58), will probably enhance the validity of clinical trials of antidepressant drugs in demented patients, but additional data are needed to document the prevalence, natural history, and consequences of depression in AD patients.

Agitation is clearly a clinically significant problem in AD, and many demented patients are treated at least temporarily with psychotropic drugs in an effort to manage agitation (60). In general, agitation is more of a problem in advanced dementia (9), but the degree of agitation varies among patients and does not necessarily show the inexorable progression characteristic of cognitive impairment (68). Recently, some scales [such as the Cohen-Mansfield scale (9)] that are specifically designed to measure agitation or behavioral disturbance in AD have been developed. Only limited longitudinal data are available on these scales, so it is not certain that they are adequate measures for patients at all levels of severity or in all settings.

FUTURE DIRECTIONS

Present standards for evaluating the efficacy of antidementia drugs are likely to remain useful with only minor modifications for the foreseeable future. These standards require that an antidementia drug be shown to be superior to placebo on a performance-based assessment of cognitive function as well as on an independent measure of clinical efficacy. The available performance-based cognitive instruments, such as the ADAS-Cog and the MMSE, are adequate but not optimal for evaluating potential antidementia drugs. Modified versions of these instruments might be necessary for special populations such as very mild or very severe patients or for drugs that have effects primarily on attention, which is not assessed directly by these scales. More work needs to be done to determine the best procedures for administering global staging and change instruments and to determine how these instruments behave when used in longitudinal studies. Functional measures could ultimately be very useful for evaluating the "real world" impact of proposed treatments, but more research is needed to identify reliable and valid ways of measuring the key functional capacities lost in AD patients. Instrumentation for evaluating psychiatric symptoms, particularly agitation, needs to be developed further. Such instrumentation will facilitate the evaluation of drugs currently used to manage behavioral symptoms and will help speed the development of new, more effective agents.

published 2000