Free
Research Article
Issue Date: May 01, 2014
Published Online: April 29, 2014
Updated: January 01, 2019
Systematic Review of Studies on Measurement Properties of Instruments for Adults Published in the American Journal of Occupational Therapy, 2009–2013
Author Affiliations
  • Hon K. Yuen, PhD, OTR/L, is Professor and Director of Research, Department of Occupational Therapy, School of Health Professions, University of Alabama at Birmingham, 1530 Third Avenue South, Birmingham, AL 35294; yuen@uab.edu
  • Sarah L. Austin, PhD, OTR/L, is Assistant Professor, Department of Occupational Therapy, Chicago State University, College of Health Sciences, Chicago, IL
Article Information
Assessment Development and Testing / Centennial Vision / Evidence-Based Practice / Departments / Centennial Vision
Research Article   |   May 01, 2014
Systematic Review of Studies on Measurement Properties of Instruments for Adults Published in the American Journal of Occupational Therapy, 2009–2013
American Journal of Occupational Therapy, May/June 2014, Vol. 68, e97-e106. https://doi.org/10.5014/ajot.2014.011171
American Journal of Occupational Therapy, May/June 2014, Vol. 68, e97-e106. https://doi.org/10.5014/ajot.2014.011171
Abstract

OBJECTIVE. We describe the methodological quality of recent studies on instrument development and testing published in the American Journal of Occupational Therapy (AJOT).

METHOD. We conducted a systematic review using the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN) checklist to appraise 48 articles on measurement properties of assessments for adults published in AJOT between 2009 and 2013.

RESULTS. Most studies had adequate methodological quality in design and statistical analysis. Common methodological limitations included that methods used to examine internal consistency were not consistently linked to the theoretical constructs underpinning assessments; participants in some test–retest reliability studies were not stable during the interim period; and in several studies of reliability and convergent validity, sample sizes were inadequate.

CONCLUSION.AJOT’s dissemination of psychometric research evidence has made important contributions to moving the profession toward the American Occupational Therapy Association’s Centennial Vision. This study’s results provide a benchmark by which to evaluate future accomplishments.

In 2007, the American Occupational Therapy Association (AOTA) adopted its Centennial Vision affirming the profession’s goal that occupational therapy be recognized as a science-driven and evidence-based profession. Two years later, when Sharon A. Gutman became the editor-in-chief of the American Journal of Occupational Therapy (AJOT), she named a set of publication priorities designed to support the profession’s progress toward this vision. One of these priorities was to promote the publication of science-driven research evidence related to instrument development and testing (Harries, Gutman, & Polatajko, 2013). After this priority was articulated, the number of articles related to instrument development and testing published in AJOT sharply increased. An electronic literature search in the PubMed database showed that between 2004 and 2008, AJOT published 34 articles (9% of all AJOT articles) related to instrument development and testing. Then, in the 5 yr after Gutman named instrument development and testing as the publication priority (2009 and 2013), AJOT published 84 articles (18% of all AJOT articles) on this topic. This change constituted a 2.5-fold increase over the previous 5-yr period.
Given that research in the area of instrument development and testing in occupational therapy is vital to the survival and growth of the profession (Doucet & Gutman, 2013), it is important to examine not only the number of articles published but also their methodological quality. Hilton, Goloff, Altaras, and Josman (2013)  have conducted a comprehensive review of studies related to instrument development and testing for children and youth in AJOT between 2009 and 2012. Our study adds to the Centennial Vision series by systematically evaluating the methodological quality of studies on measurement properties of instruments in adult populations published in AJOT between 2009 and 2013.
This systematic review aimed to provide both (1) an evaluation of current scientific evidence on instrument development and testing related to occupational therapy disseminated through AJOT and (2) a benchmark to guide AJOT’s ongoing efforts to publish studies with high methodological quality in the area of measurement over the next 5 yr (i.e., 2014–2018).
Method
Literature Search Strategy
We conducted this systematic review according to the guidelines of the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (Moher, Liberati, Tetzlaff, & Altman, 2009). The process began with electronic literature searches performed in the PubMed database for studies on instrument development and testing for adults published in AJOT between January 1, 2009, and December 31, 2013. Queries to identify all relevant articles on measurement properties of instruments were based on Boolean combinations of the following key search terms: (valid* [tiab] OR reliab* [tiab] OR reproducibility of results [MeSH] OR reproducib* [tiab] OR psychometrics [MeSH] OR internal consistency [tiab] OR validation studies [pt] OR coefficient of variation [tiab] OR ceiling effect [tiab] OR observer variation [MeSH] OR discriminative [tiab] OR precision [tiab]). These search terms have been shown to produce results that are both sensitive and precise for finding studies on measurement properties of instruments in PubMed (Terwee, Jansma, Riphagen, & de Vet, 2009).
Study Eligibility Criteria
Articles were included if they met the following criteria: (1) The study was published in AJOT between 2009 and 2013; (2) the article described the development or validation of one or two measurement instruments; and (3) the majority of the study participants were adults (i.e., ≥18 yr old), although children could constitute a minority of the participants in a study. Articles were excluded if (1) they did not contain a study with the specific aim of studying the measurement properties of an instrument; (2) they focused on the assessment process, modification of administering processes or response format, or the clinical utility or usability of an instrument; (3) they were case reports, editorials, or review articles; or (4) the instruments measured the responses of parents of children with special needs (i.e., had a pediatric focus).
Data Extraction and Quality Assessment
The first author (Hon K. Yuen) performed the data extraction. To minimize the risk for selection bias, we both independently screened the titles and abstracts of all the articles that had been retrieved to determine whether they met the eligibility criteria. If there was any doubt, we both performed an additional screening on the basis of the full text of the articles in question. Any disagreements were discussed until consensus was reached. The flow diagram in Figure 1 describes the process used to select articles for this study and the results of this literature search.
Figure 1.
Flow diagram of the selection process and study search result.
Note. AJOT = American Journal of Occupational Therapy.
Figure 1.
Flow diagram of the selection process and study search result.
Note. AJOT = American Journal of Occupational Therapy.
×
We both have doctoral degrees concentrating on measurement, evaluation, and research methodology, and we used the COnsensus-based Standards for the selection of health status Measurement INstruments (COSMIN; Mokkink, Terwee, Patrick, et al., 2010a) checklist to determine the methodological quality of the selected studies. The COSMIN checklist consists of a number of boxes. Each box includes a set of rating criteria (i.e., items) that are relevant to studies of a specific measurement property. We followed the guidelines from the COSMIN manual to select the COSMIN boxes that corresponded to the purposes of each study described in each selected article and to score the items in each relevant box.
We both independently appraised the methodological quality of each study of measurement properties in the selected articles. This process was done in two steps. In Step 1, we each independently reviewed the research objectives or questions stated in each selected article to determine which measurement properties (i.e., COSMIN boxes) would be evaluated. In Step 2, we evaluated each article by rating each of the items in the COSMIN boxes that were identified in Step 1. We applied the response options of “yes,” “no,” or “not applicable/indeterminate” to rate each item (Terwee et al., 2012). When our ratings of specific COSMIN items differed, we discussed these items until we reached a consensus. We used an Excel spreadsheet to record our ratings and calculated the percentage of each rating option within each COSMIN box.
Before scoring the selected articles, we studied and discussed both the COSMIN manual and published studies that applied the COSMIN to increase our familiarity with the tool and to clarify our understanding of the terminology and standards of the COSMIN checklist. We also practiced applying the checklist by independently scoring 3 randomly selected articles on measurement properties of instruments for children from AJOT between 2009 and 2013. As a result of this experience, we made some minor modifications to the COSMIN checklist and developed clarifications for some rating criteria. For example, when research staff administered performance-based tests or interviews, data sets were assumed to be complete, and we rated the criterion of “stated percentage of missing items” as “yes”; deletion of participants was acceptable as a description of how missing items were handled; and flaws noted in other COSMIN items were not marked again in the criterion of “important flaws in the design or methods.” Other modifications specific to a particular measurement property are noted in Supplemental Figures 1–6 (available online at http://otjournal.net; navigate to this article and click on “Supplemental”).
COSMIN Checklist
The COSMIN checklist was developed through an international Delphi consensus-based process as a standardized tool for evaluating and quantifying the methodological quality of studies of the measurement properties of health-related patient-reported outcomes (Mokkink, Terwee, Patrick, et al., 2010a). The quality standards described in COSMIN are also relevant to other health-related measurement instruments (Oftedal et al., 2012). The COSMIN checklist consists of boxes that can be used to rate nine different measurement properties, as follows: internal consistency (Box A, 11 items), reliability (Box B, 14 items), measurement error (Box C, 11 items), content validity (Box D, 5 items), structural validity (Box E, 7 items), hypothesis testing (Box F, 10 items), cross-cultural validity (Box G, 15 items), criterion validity (Box H, 7 items), and responsiveness (Box I, 18 items; Mokkink, Terwee, Patrick, et al., 2010b). The COSMIN boxes for reliability included items that could be used to evaluate studies of test–retest, interrater, intrarater, and other types of reliability. The hypothesis testing box could be applied to studies of convergent or discriminant validity, known-groups validity, and other related methods (Mokkink, Terwee, Patrick, et al., 2010b). Previous studies have demonstrated that the COSMIN checklist has adequate content validity and interrater reliability (Mokkink, Terwee, Gibbons, et al., 2010; Mokkink, Terwee, Patrick, et al., 2010a).
Results
Of the 475 articles published in AJOT between 2009 and 2013, 48 met the eligibility criteria for this study. We evaluated internal consistency in 20 articles, reliability in 20 articles (with test–retest reliability in 5; interrater reliability, including one case of interinstrument reliability, in 15; and intrarater reliability in 5), measurement error in 1, content validity in 6, structural validity in 10, hypothesis testing in 24 (with convergent validity in 11 and known-groups validity in 19), and criterion validity in 13. None of the articles addressed cross-cultural validity. Two selected articles included psychometric studies that the authors described as examining “sensitivity to change.” These studies were not included in this review because they did not match the COSMIN’s definition of responsiveness (Mokkink, Terwee, Patrick, et al., 2010b).
Table 1 shows which COSMIN boxes (i.e., measurement properties) were assigned to each selected article. Although most studies used classical test theory approaches, 11 studies involved the use of Rasch analysis to evaluate the psychometric properties of the instruments.
Table 1.
Measurement Properties of the Study Articles (N = 48)
Measurement Properties of the Study Articles (N = 48)×
StudyItem Response TheoryMeasurement Property
Internal ConsistencyReliability
Measurement ErrorContent ValidityStructural ValidityHypothesis Testing
Criterion Validity
Test–RetestInterraterIntraraterConvergent ValidityKnown- Groups Validity
Bédard, Parkkari, Weaver, Riendeau, & Dahlquist (2010) xxxx
Canny, Thompson, & Wheeler (2009) xxx
Chang, Helfrich, & Coster (2013) xx
Chang, Ailey, Heller, & Chen (2013) xxx
Cheng & Cheng (2011) xx
Classen, Wang, Crizzle, Winter, & Lanford (2013) x
Classen, Wang, Winter, et al. (2013) x
Classen et al. (2012a) xxx
Classen et al. (2012b) xx
Classen et al. (2010) x
Classen et al. (2011) xx
Duquette et al. (2010) x
Eakman (2012) xxx
Engstrand, Krevers, & Kvist (2012) xx
Flinn, Pease, & Freimer (2012) xxxx
Gal, Ben Meir, & Katz (2013) xx
George & Crotty (2010) x
Hartman-Maeir, Harel, & Katz (2009) xxx
Hwang (2010) xxx
Hwang (2012) xxx
Hwang (2013) xx
Jang, Chern, & Lin (2009) xxxx
Katz, Averbuch, & Bar-Haim Erez (2012) xx
Katz, Bar-Haim Erez, Livni, & Averbuch (2012) xxx
Kay, Bundy, & Clemson (2009) x
King (2013) x
Lehman, Woodbury, & Velozo (2011) xx
Lewis, Fors, & Tharion (2010) xx
Lindstrom-Hazel, Kratt, & Bix (2009) x
Lyons, Li, Tosteson, Meehan, & Ahles (2010) xxx
Mennem, Warren, & Yuen (2012) xx
Merritt (2011) xxx
Morrison et al. (2013) xxx
Ownsworth et al. (2013) xxxx
Perlmutter (2013) xxx
Rieke & Anderson (2009) x
Rowe (2013) xx
Saban, Ornoy, Grotto, & Parush (2012) xxxxxx
Shechtman, Awadzi, Classen, Lanford, & Joo (2010) x
Shih, Rogers, Skidmore, Irrgang, & Holm (2009) xxx
Simmons, Griswold, & Berg (2010) xxx
Søndergaard & Fisher (2012) x
Stark, Somerville, & Morris (2010) xxxx
Su, Tsai, Su, Tang, & Tsai (2011) x
Toglia & Berg (2013) xx
Unsworth, Pallant, Russell, Germano, & Odell (2010) xxxxxx
Weiner, Toglia, & Berg (2012) x
Wong & Moskovitz (2010) xxx
Table 1.
Measurement Properties of the Study Articles (N = 48)
Measurement Properties of the Study Articles (N = 48)×
StudyItem Response TheoryMeasurement Property
Internal ConsistencyReliability
Measurement ErrorContent ValidityStructural ValidityHypothesis Testing
Criterion Validity
Test–RetestInterraterIntraraterConvergent ValidityKnown- Groups Validity
Bédard, Parkkari, Weaver, Riendeau, & Dahlquist (2010) xxxx
Canny, Thompson, & Wheeler (2009) xxx
Chang, Helfrich, & Coster (2013) xx
Chang, Ailey, Heller, & Chen (2013) xxx
Cheng & Cheng (2011) xx
Classen, Wang, Crizzle, Winter, & Lanford (2013) x
Classen, Wang, Winter, et al. (2013) x
Classen et al. (2012a) xxx
Classen et al. (2012b) xx
Classen et al. (2010) x
Classen et al. (2011) xx
Duquette et al. (2010) x
Eakman (2012) xxx
Engstrand, Krevers, & Kvist (2012) xx
Flinn, Pease, & Freimer (2012) xxxx
Gal, Ben Meir, & Katz (2013) xx
George & Crotty (2010) x
Hartman-Maeir, Harel, & Katz (2009) xxx
Hwang (2010) xxx
Hwang (2012) xxx
Hwang (2013) xx
Jang, Chern, & Lin (2009) xxxx
Katz, Averbuch, & Bar-Haim Erez (2012) xx
Katz, Bar-Haim Erez, Livni, & Averbuch (2012) xxx
Kay, Bundy, & Clemson (2009) x
King (2013) x
Lehman, Woodbury, & Velozo (2011) xx
Lewis, Fors, & Tharion (2010) xx
Lindstrom-Hazel, Kratt, & Bix (2009) x
Lyons, Li, Tosteson, Meehan, & Ahles (2010) xxx
Mennem, Warren, & Yuen (2012) xx
Merritt (2011) xxx
Morrison et al. (2013) xxx
Ownsworth et al. (2013) xxxx
Perlmutter (2013) xxx
Rieke & Anderson (2009) x
Rowe (2013) xx
Saban, Ornoy, Grotto, & Parush (2012) xxxxxx
Shechtman, Awadzi, Classen, Lanford, & Joo (2010) x
Shih, Rogers, Skidmore, Irrgang, & Holm (2009) xxx
Simmons, Griswold, & Berg (2010) xxx
Søndergaard & Fisher (2012) x
Stark, Somerville, & Morris (2010) xxxx
Su, Tsai, Su, Tang, & Tsai (2011) x
Toglia & Berg (2013) xx
Unsworth, Pallant, Russell, Germano, & Odell (2010) xxxxxx
Weiner, Toglia, & Berg (2012) x
Wong & Moskovitz (2010) xxx
×
COSMIN requires ratings of how authors reported and handled missing data in the criteria for studies on a wide range of measurement properties. In the set of articles included in this study, authors did not always provide the percentage of missing items or describe how they handled missing data or items within a scale. Most other results (described next) were related to COSMIN ratings for specific measurement properties.
Internal Consistency
Thirteen (65%) of the articles assessing internal consistency focused on instruments that were clearly based on measurement models in which all assessment items were theoretically based on a single underlying latent trait. The majority of these authors assessed unidimensionality before evaluating internal consistency, a strong practice that supports the interpretability of internal consistency statistics. Some authors could have strengthened their studies by making clearer links between the theoretical structure of their instruments and the methods that they used to assess the internal consistency of the scales and subscales.
Reliability
According to the COSMIN, a sample size of 50 is the minimum standard for studies that can support reliable conclusions, and in many cases, such as Rasch-based models, much larger samples are desirable (Terwee et al., 2012). Only 3 (15%) of the studies assessing reliability met the minimum criterion of including a sample size of 50, and 50% had sample sizes <30. Sample sizes <30 have a profound negative effect on the precision of estimates (Terwee et al., 2012). Also, some authors applied Pearson product–moment or Spearman rank order correlations instead of intraclass correlation coefficients for estimating reliability. A limitation of the Pearson and Spearman correlations is that they do not reflect systematic differences in scores (Lin, 2008).
Test–Retest Reliability.
The period between the administrations of assessments in studies assessing test–retest reliability varied from 1 wk to 3 mo. These time periods were assumed to be sufficiently long to minimize memory effects; however, in studies with long delays between test and retest administrations (e.g., a 3-mo interval), additional sources of variance may have been introduced. In addition, the interpretability of test–retest reliability results depends on the stability of study participants in relation to the construct to be measured during this interval between assessments. In several studies, certain participants’ characteristics were likely to change between test administrations because of natural recovery or because of interventions received during the interim period. Although instability in participants’ characteristics related to the assessment’s construct can lead to the underestimation of test–retest reliability, the use of participants without clinical conditions may have inflated reliability results in some selected studies (Lucas, Macaskill, Irwig, & Bogduk, 2010). The use of individuals from well populations also affects the interpretability of results because these samples do not reflect the characteristics of clients whom occupational therapists typically see in the clinic.
Interrater Reliability.
Most authors clearly stated that raters were blinded to the results produced by other raters. This evidence of independent administration was sufficiently described in most of the studies. Several studies in this set relied on therapists’ ratings of video recordings. Although this methodology can control for some of the challenges inherent in the assessment process, it is likely to overestimate the reliability of assessment tools as they are used in clinical settings. In 1 study, different types of raters assessed participants under different conditions, which may have led to an underestimation of interrater reliability because both the variability of raters and the variability of assessment conditions may have contributed error to the measurement process.
Intrarater Reliability.
The majority of the studies assessing intrarater reliability focused on research staff’s consistency in consecutive recordings of participants’ physical parameters. Although establishing this type of consistency is important, it is almost inevitable that the raters in such conditions will introduce expectation bias, which can lead to an underestimation of measurement error. One study was designed to assess convergent validity but mistakenly conceptualized or labeled it as intrarater reliability.
Content Validity
All 6 studies on instrument development included processes to assess potential items for their relevance to the construct to be measured and to the target population. Most also evaluated the items on the basis of the purpose of the instrument. The majority of these studies included an evaluation of the degree to which the set of potential items comprehensively represented all relevant aspects of the construct to be measured. Exemplary studies in this category provided clear justification for the number, range, and content of assessment items within a table of specification. In several studies, users were asked to review potential items via cognitive interviewing or similar means, which reflected a strong practice.
Structural Validity
The majority of the studies assessing structural validity used approaches to test structural validity that were appropriate to the theoretical structure of the assessment. All 10 studies assessing structural validity included either a factor analysis or a Rasch-based approach to analyze dimensionality. Of the studies in this set, 80% had sample sizes that met the COSMIN’s quality criteria (i.e., samples ≥100 for factor analysis and Rasch models; Terwee et al., 2012).
Hypothesis Testing
Hypothesis testing provides critical evidence of construct validity. This type of study involves assessing the degree to which measurement instruments perform in ways that are consistent with hypotheses that are based on the assessment construct (Mokkink, Terwee, Patrick, et al., 2010b). The majority of the studies in this group (i.e., both convergent and known-groups validity) were clearly based on an a priori hypothesis. However, few of these hypotheses specified the expected direction or magnitude of correlations or mean differences. The authors of the COSMIN suggested that when researchers do not clearly and fully articulate their a priori hypothesis, they may be more likely to provide alternative explanations when interpreting low correlations or insignificant results instead of suggesting that the fault lies in the measure (Terwee et al., 2007). Several studies of convergent validity included evaluators who were blinded to the other measures, thus minimizing expectation risk and limiting biased ratings. Only 55% of the studies assessing convergent validity had a sample size ≥50. The statistical methods used in this set of studies consistently met the quality criteria in the COSMIN. However, several authors did not justify their selection of parametric statistics over nonparametric approaches.
Criterion Validity
The majority of the studies that assessed criterion validity adopted a reasonable gold standard or best available tool for comparison and conducted adequate statistical analysis including computation of correlations, area under the receiver operating curve, or sensitivity and specificity. In several studies, the raters were not blinded to the results of the gold standard assessment when they administered the outcome assessment or vice versa. Also, in 1 study scores of both the outcome measure and the criterion were derived from the same performance of the participants. This approach can severely inflate the correlation in studies of criterion validity.
Discussion
The results of our review indicate that the majority of the selected studies on measurement properties of instruments for adults published in AJOT between 2009 and 2013 had adequate methodological quality in both design and statistical analysis. Most of these studies focused on hypothesis testing, followed by internal consistency and reliability. Researchers used methodologies based on classical test theory, as well as a range of Rasch-based measurement models, which provided diverse sources of evidence. Specific strengths in methodology included the following: (1) Researchers assessed a wide range of measurement properties; (2) when exploring content validity during assessment development, researchers consistently assessed the relevance and representativeness of potential items; (3) in studies that examined reliability, most researchers stated that raters were blind to other raters’ assessment results; and (4) most studies that focused on hypothesis testing (known-groups validity) and structural validity had adequate sample sizes.
The majority of the authors who addressed missing data used participant deletion as their approach. This approach can lead to lower sample sizes with less statistical power and bias in parameter estimation. Only 1 article in this study described the use of a specific imputation strategy. Wider use of appropriate data imputation strategies could strengthen the quality of future studies. In addition, when researchers used complete data sets without missing data or items within a scale, a clear statement of this fact would remove ambiguity. Another area for improvement is sample size. Of the selected studies, small sample sizes were especially common in studies of reliability and hypothesis testing (convergent validity). Researchers can strengthen their studies by using a priori power calculations.
Future studies related to assessment development could be strengthened by including a description of an assessment specification or blueprint that is clearly linked to the theoretical underpinnings of the assessment’s construct. Defining the construct tested by an assessment, as well as the construct’s structure and intended use, is also a necessary foundation for studies of internal consistency and validity studies based on hypothesis testing. For example, if an assessment includes subscales that are not theoretically unidimensional, then total scores (based on a sum of subscale scores) should not be computed.
The time interval between test administrations is a critical element in the methodology of test–retest reliability studies. Researchers should make an effort to evaluate participants who are in a relatively stable state and use an interim period that is typically used for initial testing and retesting in a particular group of clients in clinical settings (U.S. Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research, & Center for Devices and Radiological Health, 2006). Participants in stable states who also have relevant health conditions can add to the interpretability of results. Thus, the variability of participants who experienced a stroke 5–10 yr ago could serve as a baseline when measuring the change of a particular construct after recovery and therapy in people who are experiencing the acute phase of stroke.
In assessing the reliability of assessment scores, researchers should work on developing methodologies that match to clinical assessment conditions. These methodologies could involve (1) including study participants who represent clinical populations, (2) using live observation of performance in studies of interrater reliability, and (3) examining variability that results from administering the same assessment in different (clinical) settings. The use of assessment tools in clinical settings is likely to introduce many sources of measurement error; researchers could use methodologies based on generalizability theory or multifaceted Rasch models to identify the relative contributions of these sources of measurement error in assessment scores when performing one integrated analysis.
It is especially important for studies assessing validity to use both theory and prior research data to develop a priori hypotheses regarding direction and magnitude of associations or mean differences. When applying parametric statistics, researchers should either present evidence that their data meet parametric assumptions or apply alternate nonparametric options.
In the future, the range of measurement properties evaluated in AJOT could be broadened by including more studies that assess measurement error, cross-cultural validation of assessment instruments, and responsiveness studies that evaluate the ability of outcome measures to detect the clinical meaningfulness of changes in scores.
Study Limitations
Although the COSMIN is the most comprehensive tool available for evaluating the quality of studies focused on measurement properties (Mokkink, Terwee, Patrick, et al., 2010b), it only provides rating criteria for a set of traditional psychometric properties. The COSMIN does not provide quality criteria for studies that involve the development of norms scores, nor does it provide criteria for more innovative studies of such issues as the acceptability of assessments to clients; therefore, we did not include these studies in our review. In addition, we based our ratings strictly on information published in the articles and did not contact authors to request details or clarifications. Therefore, some elements that we rated as “unable to determine” may have been quite strong. Finally, the profession’s progress toward the Centennial Vision will be affected by both the quality and the content of occupational therapy assessments; however, the limited scope of this study did not include an examination of the degree to which the constructs represented by these assessments reflect the occupation-based and client-centered values of the profession of occupational therapy.
Conclusion
The dissemination of methodologically strong psychometric research supports the kind of evidence-based decision making that is necessary to enable occupational therapy to be recognized as a science-driven and evidence-based profession, as well as to achieve the broader goals of AOTA’s Centennial Vision. Naming the publication of studies related to instrument development and testing as one of AJOT’s priorities has clearly helped to support this vision. This study provides a benchmark for the evaluation of AOTA’s dissemination of psychometric research in the future. Further achievements in the production and use of psychometric research evidence will require collaboration and communication among researchers, theorists, psychometricians, clinicians, journal editors, and peer reviewers. We hope that the results of this study promote discussions of research methodologies in instrument development and testing in these communities of scholars and that these discussions will facilitate AJOT’s ability to exceed this benchmark within the next 5 yr.
References
American Occupational Therapy Association. (2007). AOTA’s Centennial Vision and executive summary. American Journal of Occupational Therapy, 61, 613–614. http://dx.doi.org/10.5014/ajot.61.6.613 [Article]
American Occupational Therapy Association. (2007). AOTA’s Centennial Vision and executive summary. American Journal of Occupational Therapy, 61, 613–614. http://dx.doi.org/10.5014/ajot.61.6.613 [Article] ×
*Bédard, M. B., Parkkari, M., Weaver, B., Riendeau, J., & Dahlquist, M. (2010). Assessment of driving performance using a simulator protocol: Validity and reproducibility. American Journal of Occupational Therapy, 64, 336–340. http://dx.doi.org/10.5014/ajot.64.2.336 [Article] [PubMed]
*Bédard, M. B., Parkkari, M., Weaver, B., Riendeau, J., & Dahlquist, M. (2010). Assessment of driving performance using a simulator protocol: Validity and reproducibility. American Journal of Occupational Therapy, 64, 336–340. http://dx.doi.org/10.5014/ajot.64.2.336 [Article] [PubMed]×
*Canny, M. L., Thompson, J. M., & Wheeler, M. J. (2009). Reliability of the Box and Block Test of manual dexterity for use with patients with fibromyalgia. American Journal of Occupational Therapy, 63, 506–510. http://dx.doi.org/10.5014/ajot.63.4.506 [Article] [PubMed]
*Canny, M. L., Thompson, J. M., & Wheeler, M. J. (2009). Reliability of the Box and Block Test of manual dexterity for use with patients with fibromyalgia. American Journal of Occupational Therapy, 63, 506–510. http://dx.doi.org/10.5014/ajot.63.4.506 [Article] [PubMed]×
*Chang, F. H., Helfrich, C. A., & Coster, W. J. (2013). Psychometric properties of the Practical Skills Test (PST). American Journal of Occupational Therapy, 67, 246–253. http://dx.doi.org/10.5014/ajot.2013.006627 [Article] [PubMed]
*Chang, F. H., Helfrich, C. A., & Coster, W. J. (2013). Psychometric properties of the Practical Skills Test (PST). American Journal of Occupational Therapy, 67, 246–253. http://dx.doi.org/10.5014/ajot.2013.006627 [Article] [PubMed]×
*Chang, Y. C., Ailey, S. H., Heller, T., & Chen, M. D. (2013). Rasch analysis of the Mental Health Recovery Measure. American Journal of Occupational Therapy, 67, 469–477. http://dx.doi.org/10.5014/ajot.2013.007492 [Article] [PubMed]
*Chang, Y. C., Ailey, S. H., Heller, T., & Chen, M. D. (2013). Rasch analysis of the Mental Health Recovery Measure. American Journal of Occupational Therapy, 67, 469–477. http://dx.doi.org/10.5014/ajot.2013.007492 [Article] [PubMed]×
*Cheng, A. S., & Cheng, S. W. (2011). Use of job-specific functional capacity evaluation to predict the return to work of patients with a distal radius fracture. American Journal of Occupational Therapy, 65, 445–452. http://dx.doi.org/10.5014/ajot.2011.001057 [Article] [PubMed]
*Cheng, A. S., & Cheng, S. W. (2011). Use of job-specific functional capacity evaluation to predict the return to work of patients with a distal radius fracture. American Journal of Occupational Therapy, 65, 445–452. http://dx.doi.org/10.5014/ajot.2011.001057 [Article] [PubMed]×
*Classen, S., Wang, Y., Crizzle, A. M., Winter, S. M., & Lanford, D. N. (2013). Predicting older driver on-road performance via the Useful Field of View and Trails B. American Journal of Occupational Therapy, 67, 574–582. http://dx.doi.org/10.5014/ajot.2013.008136 [Article] [PubMed]
*Classen, S., Wang, Y., Crizzle, A. M., Winter, S. M., & Lanford, D. N. (2013). Predicting older driver on-road performance via the Useful Field of View and Trails B. American Journal of Occupational Therapy, 67, 574–582. http://dx.doi.org/10.5014/ajot.2013.008136 [Article] [PubMed]×
*Classen, S., Wang, Y., Winter, S. M., Velozo, C. A., Lanford, D. N., & Bédard, M. (2013). Concurrent criterion validity of the Safe Driving Behavior Measure: A predictor of on-road driving outcomes. American Journal of Occupational Therapy, 67, 108–116. http://dx.doi.org/10.5014/ajot.2013.005116 [Article] [PubMed]
*Classen, S., Wang, Y., Winter, S. M., Velozo, C. A., Lanford, D. N., & Bédard, M. (2013). Concurrent criterion validity of the Safe Driving Behavior Measure: A predictor of on-road driving outcomes. American Journal of Occupational Therapy, 67, 108–116. http://dx.doi.org/10.5014/ajot.2013.005116 [Article] [PubMed]×
*Classen, S., Wen, P. S., Velozo, C. A., Bédard, M., Winter, S. M., Brumback, B., & Lanford, D. N. (2012a). Psychometrics of the self-report Safe Driving Behavior Measure for older adults. American Journal of Occupational Therapy, 66, 233–241. http://dx.doi.org/10.5014/ajot.2012.001834 [Article]
*Classen, S., Wen, P. S., Velozo, C. A., Bédard, M., Winter, S. M., Brumback, B., & Lanford, D. N. (2012a). Psychometrics of the self-report Safe Driving Behavior Measure for older adults. American Journal of Occupational Therapy, 66, 233–241. http://dx.doi.org/10.5014/ajot.2012.001834 [Article] ×
*Classen, S., Wen, P. S., Velozo, C. A., Bédard, M., Winter, S. M., Brumback, B. A., & Lanford, D. N. (2012b). Rater reliability and rater effects of the Safe Driving Behavior Measure. American Journal of Occupational Therapy, 66, 69–77. http://dx.doi.org/10.5014/ajot.2012.002261 [Article]
*Classen, S., Wen, P. S., Velozo, C. A., Bédard, M., Winter, S. M., Brumback, B. A., & Lanford, D. N. (2012b). Rater reliability and rater effects of the Safe Driving Behavior Measure. American Journal of Occupational Therapy, 66, 69–77. http://dx.doi.org/10.5014/ajot.2012.002261 [Article] ×
*Classen, S., Winter, S. M., Velozo, C. A., Bédard, M., Lanford, D. N., Brumback, B., & Lutz, B. J. (2010). Item development and validity testing for a self- and proxy report: The Safe Driving Behavior Measure. American Journal of Occupational Therapy, 64, 296–305. http://dx.doi.org/10.5014/ajot.64.2.296 [Article] [PubMed]
*Classen, S., Winter, S. M., Velozo, C. A., Bédard, M., Lanford, D. N., Brumback, B., & Lutz, B. J. (2010). Item development and validity testing for a self- and proxy report: The Safe Driving Behavior Measure. American Journal of Occupational Therapy, 64, 296–305. http://dx.doi.org/10.5014/ajot.64.2.296 [Article] [PubMed]×
*Classen, S., Witter, D. P., Lanford, D. N., Okun, M. S., Rodriguez, R. L., Romrell, J., … Fernandez, H. H. (2011). Usefulness of screening tools for predicting driving performance in people with Parkinson’s disease. American Journal of Occupational Therapy, 65, 579–588. http://dx.doi.org/10.5014/ajot.2011.001073 [Article] [PubMed]
*Classen, S., Witter, D. P., Lanford, D. N., Okun, M. S., Rodriguez, R. L., Romrell, J., … Fernandez, H. H. (2011). Usefulness of screening tools for predicting driving performance in people with Parkinson’s disease. American Journal of Occupational Therapy, 65, 579–588. http://dx.doi.org/10.5014/ajot.2011.001073 [Article] [PubMed]×
Doucet, B. M., & Gutman, S. A. (2013). Quantifying function: The rest of the measurement story. American Journal of Occupational Therapy, 67, 7–9. http://dx.doi.org/10.5014/ajot.2013.007096 [Article] [PubMed]
Doucet, B. M., & Gutman, S. A. (2013). Quantifying function: The rest of the measurement story. American Journal of Occupational Therapy, 67, 7–9. http://dx.doi.org/10.5014/ajot.2013.007096 [Article] [PubMed]×
*Duquette, J., McKinley, P., Mazer, B., Gélinas, I., Vanier, M., Benoit, D., & Gresset, J. (2010). Impact of partial administration of the Cognitive Behavioral Driver’s Inventory on concurrent validity for people with brain injury. American Journal of Occupational Therapy, 64, 279–287. http://dx.doi.org/10.5014/ajot.64.2.279 [Article] [PubMed]
*Duquette, J., McKinley, P., Mazer, B., Gélinas, I., Vanier, M., Benoit, D., & Gresset, J. (2010). Impact of partial administration of the Cognitive Behavioral Driver’s Inventory on concurrent validity for people with brain injury. American Journal of Occupational Therapy, 64, 279–287. http://dx.doi.org/10.5014/ajot.64.2.279 [Article] [PubMed]×
*Eakman, A. M. (2012). Measurement characteristics of the Engagement in Meaningful Activities Survey in an age-diverse sample. American Journal of Occupational Therapy, 66, e20–e29. http://dx.doi.org/10.5014/ajot.2012.001867 [Article] [PubMed]
*Eakman, A. M. (2012). Measurement characteristics of the Engagement in Meaningful Activities Survey in an age-diverse sample. American Journal of Occupational Therapy, 66, e20–e29. http://dx.doi.org/10.5014/ajot.2012.001867 [Article] [PubMed]×
*Engstrand, C., Krevers, B., & Kvist, J. (2012). Interrater reliability in finger joint goniometer measurement in Dupuytren’s disease. American Journal of Occupational Therapy, 66, 98–103. http://dx.doi.org/10.5014/ajot.2012.001925 [Article] [PubMed]
*Engstrand, C., Krevers, B., & Kvist, J. (2012). Interrater reliability in finger joint goniometer measurement in Dupuytren’s disease. American Journal of Occupational Therapy, 66, 98–103. http://dx.doi.org/10.5014/ajot.2012.001925 [Article] [PubMed]×
*Flinn, S. R., Pease, W. S., & Freimer, M. L. (2012). Score reliability and construct validity of the Flinn Performance Screening Tool for adults with symptoms of carpal tunnel syndrome. American Journal of Occupational Therapy, 66, 330–337. http://dx.doi.org/10.5014/ajot.2012.000935 [Article] [PubMed]
*Flinn, S. R., Pease, W. S., & Freimer, M. L. (2012). Score reliability and construct validity of the Flinn Performance Screening Tool for adults with symptoms of carpal tunnel syndrome. American Journal of Occupational Therapy, 66, 330–337. http://dx.doi.org/10.5014/ajot.2012.000935 [Article] [PubMed]×
*Gal, E., Ben Meir, A., & Katz, N. (2013). Development and reliability of the Autism Work Skills Questionnaire (AWSQ). American Journal of Occupational Therapy, 67, e1–e5. http://dx.doi.org/10.5014/ajot.2013.005066 [Article] [PubMed]
*Gal, E., Ben Meir, A., & Katz, N. (2013). Development and reliability of the Autism Work Skills Questionnaire (AWSQ). American Journal of Occupational Therapy, 67, e1–e5. http://dx.doi.org/10.5014/ajot.2013.005066 [Article] [PubMed]×
*George, S., & Crotty, M. (2010). Establishing criterion validity of the Useful Field of View assessment and Stroke Drivers’ Screening Assessment: Comparison to the result of on-road assessment. American Journal of Occupational Therapy, 64, 114–122. http://dx.doi.org/10.5014/ajot.64.1.114 [Article] [PubMed]
*George, S., & Crotty, M. (2010). Establishing criterion validity of the Useful Field of View assessment and Stroke Drivers’ Screening Assessment: Comparison to the result of on-road assessment. American Journal of Occupational Therapy, 64, 114–122. http://dx.doi.org/10.5014/ajot.64.1.114 [Article] [PubMed]×
Harries, P. A., Gutman, S. A., & Polatajko, H. J. (2013). Reciprocal access agreements between BJOT, AJOT, and CJOT: New resources for occupational therapists around the world. American Journal of Occupational Therapy, 67, 138–139. http://dx.doi.org/10.5014/ajot.2013.672002 [Article] [PubMed]
Harries, P. A., Gutman, S. A., & Polatajko, H. J. (2013). Reciprocal access agreements between BJOT, AJOT, and CJOT: New resources for occupational therapists around the world. American Journal of Occupational Therapy, 67, 138–139. http://dx.doi.org/10.5014/ajot.2013.672002 [Article] [PubMed]×
*Hartman-Maeir, A., Harel, H., & Katz, N. (2009). Kettle Test—A brief measure of cognitive functional performance: Reliability and validity in stroke rehabilitation. American Journal of Occupational Therapy, 63, 592–599. http://dx.doi.org/10.5014/ajot.63.5.592 [Article] [PubMed]
*Hartman-Maeir, A., Harel, H., & Katz, N. (2009). Kettle Test—A brief measure of cognitive functional performance: Reliability and validity in stroke rehabilitation. American Journal of Occupational Therapy, 63, 592–599. http://dx.doi.org/10.5014/ajot.63.5.592 [Article] [PubMed]×
Hilton, C. L., Goloff, S. E., Altaras, O., & Josman, N. (2013). Review of instrument development and testing studies for children and youth. American Journal of Occupational Therapy, 67, e30–e54. http://dx.doi.org/10.5014/ajot.2013.007831 [Article] [PubMed]
Hilton, C. L., Goloff, S. E., Altaras, O., & Josman, N. (2013). Review of instrument development and testing studies for children and youth. American Journal of Occupational Therapy, 67, e30–e54. http://dx.doi.org/10.5014/ajot.2013.007831 [Article] [PubMed]×
*Hwang, J. E. (2010). Promoting healthy lifestyles with aging: Development and validation of the Health Enhancement Lifestyle Profile (HELP) using the Rasch measurement model. American Journal of Occupational Therapy, 64, 786–795. http://dx.doi.org/10.5014/ajot.2010.09088 [Article] [PubMed]
*Hwang, J. E. (2010). Promoting healthy lifestyles with aging: Development and validation of the Health Enhancement Lifestyle Profile (HELP) using the Rasch measurement model. American Journal of Occupational Therapy, 64, 786–795. http://dx.doi.org/10.5014/ajot.2010.09088 [Article] [PubMed]×
*Hwang, J. E. (2012). Development and validation of a 15-item lifestyle screening for community-dwelling older adults. American Journal of Occupational Therapy, 66, e98–e106. http://dx.doi.org/10.5014/ajot.2012.005181 [Article] [PubMed]
*Hwang, J. E. (2012). Development and validation of a 15-item lifestyle screening for community-dwelling older adults. American Journal of Occupational Therapy, 66, e98–e106. http://dx.doi.org/10.5014/ajot.2012.005181 [Article] [PubMed]×
*Hwang, J. E. (2013). Reliability of the Health Enhancement Lifestyle Profile–Screener (HELP–Screener). American Journal of Occupational Therapy, 67, e6–e10. http://dx.doi.org/10.5014/ajot.2013.005934 [Article] [PubMed]
*Hwang, J. E. (2013). Reliability of the Health Enhancement Lifestyle Profile–Screener (HELP–Screener). American Journal of Occupational Therapy, 67, e6–e10. http://dx.doi.org/10.5014/ajot.2013.005934 [Article] [PubMed]×
*Jang, Y., Chern, J. S., & Lin, K. C. (2009). Validity of the Loewenstein Occupational Therapy Cognitive Assessment in people with intellectual disabilities. American Journal of Occupational Therapy, 63, 414–422. http://dx.doi.org/10.5014/ajot.63.4.414 [Article] [PubMed]
*Jang, Y., Chern, J. S., & Lin, K. C. (2009). Validity of the Loewenstein Occupational Therapy Cognitive Assessment in people with intellectual disabilities. American Journal of Occupational Therapy, 63, 414–422. http://dx.doi.org/10.5014/ajot.63.4.414 [Article] [PubMed]×
*Katz, N., Averbuch, S., & Bar-Haim Erez, A. (2012). Dynamic Lowenstein Occupational Therapy Cognitive Assessment–Geriatric Version (DLOTCA–G): Assessing change in cognitive performance. American Journal of Occupational Therapy, 66, 311–319. http://dx.doi.org/10.5014/ajot.2012.002485 [Article] [PubMed]
*Katz, N., Averbuch, S., & Bar-Haim Erez, A. (2012). Dynamic Lowenstein Occupational Therapy Cognitive Assessment–Geriatric Version (DLOTCA–G): Assessing change in cognitive performance. American Journal of Occupational Therapy, 66, 311–319. http://dx.doi.org/10.5014/ajot.2012.002485 [Article] [PubMed]×
*Katz, N., Bar-Haim Erez, A., Livni, L., & Averbuch, S. (2012). Dynamic Lowenstein Occupational Therapy Cognitive Assessment: Evaluation of potential to change in cognitive performance. American Journal of Occupational Therapy, 66, 207–214. http://dx.doi.org/10.5014/ajot.2012.002469 [Article] [PubMed]
*Katz, N., Bar-Haim Erez, A., Livni, L., & Averbuch, S. (2012). Dynamic Lowenstein Occupational Therapy Cognitive Assessment: Evaluation of potential to change in cognitive performance. American Journal of Occupational Therapy, 66, 207–214. http://dx.doi.org/10.5014/ajot.2012.002469 [Article] [PubMed]×
*Kay, L. G., Bundy, A. C., & Clemson, L. (2009). Awareness of driving ability in senior drivers with neurological conditions. American Journal of Occupational Therapy, 63, 146–150. http://dx.doi.org/10.5014/ajot.63.2.146 [Article] [PubMed]
*Kay, L. G., Bundy, A. C., & Clemson, L. (2009). Awareness of driving ability in senior drivers with neurological conditions. American Journal of Occupational Therapy, 63, 146–150. http://dx.doi.org/10.5014/ajot.63.2.146 [Article] [PubMed]×
*King, T. I., 2nd. (2013). Interinstrument reliability of the Jamar electronic dynamometer and pinch gauge compared with the Jamar hydraulic dynamometer and B&L Engineering mechanical pinch gauge. American Journal of Occupational Therapy, 67, 480–483. http://dx.doi.org/10.5014/ajot.2013.007351 [Article] [PubMed]
*King, T. I., 2nd. (2013). Interinstrument reliability of the Jamar electronic dynamometer and pinch gauge compared with the Jamar hydraulic dynamometer and B&L Engineering mechanical pinch gauge. American Journal of Occupational Therapy, 67, 480–483. http://dx.doi.org/10.5014/ajot.2013.007351 [Article] [PubMed]×
*Lehman, L. A., Woodbury, M., & Velozo, C. A. (2011). Examination of the factor structure of the Disabilities of the Arm, Shoulder, and Hand Questionnaire. American Journal of Occupational Therapy, 65, 169–178. http://dx.doi.org/10.5014/ajot.2011.000794 [Article] [PubMed]
*Lehman, L. A., Woodbury, M., & Velozo, C. A. (2011). Examination of the factor structure of the Disabilities of the Arm, Shoulder, and Hand Questionnaire. American Journal of Occupational Therapy, 65, 169–178. http://dx.doi.org/10.5014/ajot.2011.000794 [Article] [PubMed]×
*Lewis, E., Fors, L., & Tharion, W. J. (2010). Interrater and intrarater reliability of finger goniometric measurements. American Journal of Occupational Therapy, 64, 555–561. http://dx.doi.org/10.5014/ajot.2010.09028 [Article] [PubMed]
*Lewis, E., Fors, L., & Tharion, W. J. (2010). Interrater and intrarater reliability of finger goniometric measurements. American Journal of Occupational Therapy, 64, 555–561. http://dx.doi.org/10.5014/ajot.2010.09028 [Article] [PubMed]×
Lin, L. (2008). Overview of agreement statistics for medical devices. Journal of Biopharmaceutical Statistics, 18, 126–144. http://dx.doi.org/10.1080/10543400701668290 [Article] [PubMed]
Lin, L. (2008). Overview of agreement statistics for medical devices. Journal of Biopharmaceutical Statistics, 18, 126–144. http://dx.doi.org/10.1080/10543400701668290 [Article] [PubMed]×
*Lindstrom-Hazel, D., Kratt, A., & Bix, L. (2009). Interrater reliability of students using hand and pinch dynamometers. American Journal of Occupational Therapy, 63, 193–197. http://dx.doi.org/10.5014/ajot.63.2.193 [Article] [PubMed]
*Lindstrom-Hazel, D., Kratt, A., & Bix, L. (2009). Interrater reliability of students using hand and pinch dynamometers. American Journal of Occupational Therapy, 63, 193–197. http://dx.doi.org/10.5014/ajot.63.2.193 [Article] [PubMed]×
Lucas, N. P., Macaskill, P., Irwig, L., & Bogduk, N. (2010). The development of a quality appraisal tool for studies of diagnostic reliability (QAREL). Journal of Clinical Epidemiology, 63, 854–861. http://dx.doi.org/10.1016/j.jclinepi.2009.10.002 [Article] [PubMed]
Lucas, N. P., Macaskill, P., Irwig, L., & Bogduk, N. (2010). The development of a quality appraisal tool for studies of diagnostic reliability (QAREL). Journal of Clinical Epidemiology, 63, 854–861. http://dx.doi.org/10.1016/j.jclinepi.2009.10.002 [Article] [PubMed]×
*Lyons, K. D., Li, Z., Tosteson, T. D., Meehan, K., & Ahles, T. A. (2010). Consistency and construct validity of the Activity Card Sort (modified) in measuring activity resumption after stem cell transplantation. American Journal of Occupational Therapy, 64, 562–569. http://dx.doi.org/10.5014/ajot.2010.09033 [Article] [PubMed]
*Lyons, K. D., Li, Z., Tosteson, T. D., Meehan, K., & Ahles, T. A. (2010). Consistency and construct validity of the Activity Card Sort (modified) in measuring activity resumption after stem cell transplantation. American Journal of Occupational Therapy, 64, 562–569. http://dx.doi.org/10.5014/ajot.2010.09033 [Article] [PubMed]×
*Mennem, T. A., Warren, M., & Yuen, H. K. (2012). Preliminary validation of a vision-dependent activities of daily living instrument on adults with homonymous hemianopia. American Journal of Occupational Therapy, 66, 478–482. http://dx.doi.org/10.5014/ajot.2012.004762 [Article] [PubMed]
*Mennem, T. A., Warren, M., & Yuen, H. K. (2012). Preliminary validation of a vision-dependent activities of daily living instrument on adults with homonymous hemianopia. American Journal of Occupational Therapy, 66, 478–482. http://dx.doi.org/10.5014/ajot.2012.004762 [Article] [PubMed]×
*Merritt, B. K. (2011). Validity of using the Assessment of Motor and Process Skills to determine the need for assistance. American Journal of Occupational Therapy, 65, 643–650. http://dx.doi.org/10.5014/ajot.2011.000547 [Article] [PubMed]
*Merritt, B. K. (2011). Validity of using the Assessment of Motor and Process Skills to determine the need for assistance. American Journal of Occupational Therapy, 65, 643–650. http://dx.doi.org/10.5014/ajot.2011.000547 [Article] [PubMed]×
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G.; PRISMA Group., (2009). Reprint—Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Physical Therapy, 89, 873–880. [PubMed]
Moher, D., Liberati, A., Tetzlaff, J., Altman, D. G.; PRISMA Group., (2009). Reprint—Preferred reporting items for systematic reviews and meta-analyses: The PRISMA statement. Physical Therapy, 89, 873–880. [PubMed]×
Mokkink, L. B., Terwee, C. B., Gibbons, E., Stratford, P. W., Alonso, J., Patrick, D. L., … de Vet, H. C. (2010). Inter-rater agreement and reliability of the COSMIN (COnsensus-based Standards for the selection of health status Measurement INstruments) checklist. BMC Medical Research Methodology, 10, 82. http://dx.doi.org/10.1186/1471-2288-10-82 [Article] [PubMed]
Mokkink, L. B., Terwee, C. B., Gibbons, E., Stratford, P. W., Alonso, J., Patrick, D. L., … de Vet, H. C. (2010). Inter-rater agreement and reliability of the COSMIN (COnsensus-based Standards for the selection of health status Measurement INstruments) checklist. BMC Medical Research Methodology, 10, 82. http://dx.doi.org/10.1186/1471-2288-10-82 [Article] [PubMed]×
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., … de Vet, H. C. (2010a). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Quality of Life Research, 19, 539–549. http://dx.doi.org/10.1007/s11136-010-9606-8 [Article]
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., … de Vet, H. C. (2010a). The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: An international Delphi study. Quality of Life Research, 19, 539–549. http://dx.doi.org/10.1007/s11136-010-9606-8 [Article] ×
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., … de Vet, H. C. (2010b). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63, 737–745. http://dx.doi.org/10.1016/j.jclinepi.2010.02.006 [Article]
Mokkink, L. B., Terwee, C. B., Patrick, D. L., Alonso, J., Stratford, P. W., Knol, D. L., … de Vet, H. C. (2010b). The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. Journal of Clinical Epidemiology, 63, 737–745. http://dx.doi.org/10.1016/j.jclinepi.2010.02.006 [Article] ×
*Morrison, M. T., Giles, G. M., Ryan, J. D., Baum, C. M., Dromerick, A. W., Polatajko, H. J., & Edwards, D. F. (2013). Multiple Errands Test–Revised (MET–R): A performance-based measure of executive function in people with mild cerebrovascular accident. American Journal of Occupational Therapy, 67, 460–468. http://dx.doi.org/10.5014/ajot.2013.007880 [Article] [PubMed]
*Morrison, M. T., Giles, G. M., Ryan, J. D., Baum, C. M., Dromerick, A. W., Polatajko, H. J., & Edwards, D. F. (2013). Multiple Errands Test–Revised (MET–R): A performance-based measure of executive function in people with mild cerebrovascular accident. American Journal of Occupational Therapy, 67, 460–468. http://dx.doi.org/10.5014/ajot.2013.007880 [Article] [PubMed]×
Oftedal, S., Bell, K. L., Mitchell, L. E., Davies, P. S., Ware, R. S., & Boyd, R. N. (2012). A systematic review of the clinimetric properties of habitual physical activity measures in young children with a motor disability. International Journal of Pediatrics, 2012, 976425. http://dx.doi.org/10.1155/2012/976425 [Article] [PubMed]
Oftedal, S., Bell, K. L., Mitchell, L. E., Davies, P. S., Ware, R. S., & Boyd, R. N. (2012). A systematic review of the clinimetric properties of habitual physical activity measures in young children with a motor disability. International Journal of Pediatrics, 2012, 976425. http://dx.doi.org/10.1155/2012/976425 [Article] [PubMed]×
*Ownsworth, T., Stewart, E., Fleming, J., Griffin, J., Collier, A. M., & Schmidt, J. (2013). Development and preliminary psychometric evaluation of the Self-Perceptions in Rehabilitation Questionnaire (SPIRQ) for brain injury rehabilitation. American Journal of Occupational Therapy, 67, 336–344. http://dx.doi.org/10.5014/ajot.2013.007625 [Article] [PubMed]
*Ownsworth, T., Stewart, E., Fleming, J., Griffin, J., Collier, A. M., & Schmidt, J. (2013). Development and preliminary psychometric evaluation of the Self-Perceptions in Rehabilitation Questionnaire (SPIRQ) for brain injury rehabilitation. American Journal of Occupational Therapy, 67, 336–344. http://dx.doi.org/10.5014/ajot.2013.007625 [Article] [PubMed]×
*Perlmutter, M. S. (2013). A home lighting assessment for clients with low vision. American Journal of Occupational Therapy, 67, 674–682. http://dx.doi.org/10.5014/ajot.2013.006692 [Article] [PubMed]
*Perlmutter, M. S. (2013). A home lighting assessment for clients with low vision. American Journal of Occupational Therapy, 67, 674–682. http://dx.doi.org/10.5014/ajot.2013.006692 [Article] [PubMed]×
*Rieke, E. F., & Anderson, D. (2009). Adolescent/Adult Sensory Profile and obsessive–compulsive disorder. American Journal of Occupational Therapy, 63, 138–145. http://dx.doi.org/10.5014/ajot.63.2.138 [Article] [PubMed]
*Rieke, E. F., & Anderson, D. (2009). Adolescent/Adult Sensory Profile and obsessive–compulsive disorder. American Journal of Occupational Therapy, 63, 138–145. http://dx.doi.org/10.5014/ajot.63.2.138 [Article] [PubMed]×
*Rowe, V. (2013). The Functional Test for the Hemiparetic Upper Extremity normative database. American Journal of Occupational Therapy, 67, 717–721. http://dx.doi.org/10.5014/ajot.2013.008797 [Article] [PubMed]
*Rowe, V. (2013). The Functional Test for the Hemiparetic Upper Extremity normative database. American Journal of Occupational Therapy, 67, 717–721. http://dx.doi.org/10.5014/ajot.2013.008797 [Article] [PubMed]×
*Saban, M. T., Ornoy, A., Grotto, I., & Parush, S. (2012). Adolescents and Adults Coordination Questionnaire: Development and psychometric properties. American Journal of Occupational Therapy, 66, 406–413. http://dx.doi.org/10.5014/ajot.2012.003251 [Article] [PubMed]
*Saban, M. T., Ornoy, A., Grotto, I., & Parush, S. (2012). Adolescents and Adults Coordination Questionnaire: Development and psychometric properties. American Journal of Occupational Therapy, 66, 406–413. http://dx.doi.org/10.5014/ajot.2012.003251 [Article] [PubMed]×
*Shechtman, O., Awadzi, K. D., Classen, S., Lanford, D. N., & Joo, Y. (2010). Validity and critical driving errors of on-road assessment for older drivers. American Journal of Occupational Therapy, 64, 242–251. http://dx.doi.org/10.5014/ajot.64.2.242 [Article] [PubMed]
*Shechtman, O., Awadzi, K. D., Classen, S., Lanford, D. N., & Joo, Y. (2010). Validity and critical driving errors of on-road assessment for older drivers. American Journal of Occupational Therapy, 64, 242–251. http://dx.doi.org/10.5014/ajot.64.2.242 [Article] [PubMed]×
*Shih, M. M., Rogers, J. C., Skidmore, E. R., Irrgang, J. J., & Holm, M. B. (2009). Measuring stroke survivors’ functional status independence: Five perspectives. American Journal of Occupational Therapy, 63, 600–608. http://dx.doi.org/10.5014/ajot.63.5.600 [Article] [PubMed]
*Shih, M. M., Rogers, J. C., Skidmore, E. R., Irrgang, J. J., & Holm, M. B. (2009). Measuring stroke survivors’ functional status independence: Five perspectives. American Journal of Occupational Therapy, 63, 600–608. http://dx.doi.org/10.5014/ajot.63.5.600 [Article] [PubMed]×
*Simmons, C. D., Griswold, L. A., & Berg, B. (2010). Evaluation of social interaction during occupational engagement. American Journal of Occupational Therapy, 64, 10–17. http://dx.doi.org/10.5014/ajot.64.1.10 [Article] [PubMed]
*Simmons, C. D., Griswold, L. A., & Berg, B. (2010). Evaluation of social interaction during occupational engagement. American Journal of Occupational Therapy, 64, 10–17. http://dx.doi.org/10.5014/ajot.64.1.10 [Article] [PubMed]×
*Søndergaard, M., & Fisher, A. G. (2012). Sensitivity of the evaluation of social interaction measures among people with and without neurologic or psychiatric disorders. American Journal of Occupational Therapy, 66, 356–362. http://dx.doi.org/10.5014/ajot.2012.003582 [Article] [PubMed]
*Søndergaard, M., & Fisher, A. G. (2012). Sensitivity of the evaluation of social interaction measures among people with and without neurologic or psychiatric disorders. American Journal of Occupational Therapy, 66, 356–362. http://dx.doi.org/10.5014/ajot.2012.003582 [Article] [PubMed]×
*Stark, S. L., Somerville, E. K., & Morris, J. C. (2010). In-Home Occupational Performance Evaluation (I–HOPE). American Journal of Occupational Therapy, 64, 580–589. http://dx.doi.org/10.5014/ajot.2010.08065 [Article] [PubMed]
*Stark, S. L., Somerville, E. K., & Morris, J. C. (2010). In-Home Occupational Performance Evaluation (I–HOPE). American Journal of Occupational Therapy, 64, 580–589. http://dx.doi.org/10.5014/ajot.2010.08065 [Article] [PubMed]×
*Su, C. Y., Tsai, P. C., Su, W. L., Tang, T. C., & Tsai, A. Y. (2011). Cognitive profile difference between Allen Cognitive Levels 4 and 5 in schizophrenia. American Journal of Occupational Therapy, 65, 453–461. http://dx.doi.org/10.5014/ajot.2011.000711 [Article] [PubMed]
*Su, C. Y., Tsai, P. C., Su, W. L., Tang, T. C., & Tsai, A. Y. (2011). Cognitive profile difference between Allen Cognitive Levels 4 and 5 in schizophrenia. American Journal of Occupational Therapy, 65, 453–461. http://dx.doi.org/10.5014/ajot.2011.000711 [Article] [PubMed]×
Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., … de Vet, H. C. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology, 60, 34–42. http://dx.doi.org/10.1016/j.jclinepi.2006.03.012 [Article] [PubMed]
Terwee, C. B., Bot, S. D., de Boer, M. R., van der Windt, D. A., Knol, D. L., Dekker, J., … de Vet, H. C. (2007). Quality criteria were proposed for measurement properties of health status questionnaires. Journal of Clinical Epidemiology, 60, 34–42. http://dx.doi.org/10.1016/j.jclinepi.2006.03.012 [Article] [PubMed]×
Terwee, C. B., Jansma, E. P., Riphagen, I. I., & de Vet, H. C. (2009). Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Quality of Life Research, 18, 1115–1123. http://dx.doi.org/10.1007/s11136-009-9528-5 [Article] [PubMed]
Terwee, C. B., Jansma, E. P., Riphagen, I. I., & de Vet, H. C. (2009). Development of a methodological PubMed search filter for finding studies on measurement properties of measurement instruments. Quality of Life Research, 18, 1115–1123. http://dx.doi.org/10.1007/s11136-009-9528-5 [Article] [PubMed]×
Terwee, C. B., Mokkink, L. B., Knol, D. L., Ostelo, R. W., Bouter, L. M., & de Vet, H. C. (2012). Rating the methodological quality in systematic reviews of studies on measurement properties: A scoring system for the COSMIN checklist. Quality of Life Research, 21, 651–657. http://dx.doi.org/10.1007/s11136-011-9960-1 [Article] [PubMed]
Terwee, C. B., Mokkink, L. B., Knol, D. L., Ostelo, R. W., Bouter, L. M., & de Vet, H. C. (2012). Rating the methodological quality in systematic reviews of studies on measurement properties: A scoring system for the COSMIN checklist. Quality of Life Research, 21, 651–657. http://dx.doi.org/10.1007/s11136-011-9960-1 [Article] [PubMed]×
*Toglia, J., & Berg, C. (2013). Performance-based measure of executive function: Comparison of community and at-risk youth. American Journal of Occupational Therapy, 67, 515–523. http://dx.doi.org/10.5014/ajot.2013.008482 [Article] [PubMed]
*Toglia, J., & Berg, C. (2013). Performance-based measure of executive function: Comparison of community and at-risk youth. American Journal of Occupational Therapy, 67, 515–523. http://dx.doi.org/10.5014/ajot.2013.008482 [Article] [PubMed]×
*Unsworth, C. A., Pallant, J. F., Russell, K. J., Germano, C., & Odell, M. (2010). Validation of a test of road law and road craft knowledge with older or functionally impaired drivers. American Journal of Occupational Therapy, 64, 306–315. http://dx.doi.org/10.5014/ajot.64.2.306 [Article] [PubMed]
*Unsworth, C. A., Pallant, J. F., Russell, K. J., Germano, C., & Odell, M. (2010). Validation of a test of road law and road craft knowledge with older or functionally impaired drivers. American Journal of Occupational Therapy, 64, 306–315. http://dx.doi.org/10.5014/ajot.64.2.306 [Article] [PubMed]×
U.S. Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research, & Center for Devices and Radiological Health. (2006). Guidance for industry: Patient-reported outcome measures: Use in medical product development to support labeling claims: Draft guidance. Health and Quality of Life Outcomes, 4, 79. http://dx.doi.org/10.1186/1477-7525-4-79 [Article] [PubMed]
U.S. Food and Drug Administration, Center for Drug Evaluation and Research, Center for Biologics Evaluation and Research, & Center for Devices and Radiological Health. (2006). Guidance for industry: Patient-reported outcome measures: Use in medical product development to support labeling claims: Draft guidance. Health and Quality of Life Outcomes, 4, 79. http://dx.doi.org/10.1186/1477-7525-4-79 [Article] [PubMed]×
*Weiner, N. W., Toglia, J., & Berg, C. (2012). Weekly Calendar Planning Activity (WCPA): A performance-based assessment of executive function piloted with at-risk adolescents. American Journal of Occupational Therapy, 66, 699–708. http://dx.doi.org/10.5014/ajot.2012.004754 [Article] [PubMed]
*Weiner, N. W., Toglia, J., & Berg, C. (2012). Weekly Calendar Planning Activity (WCPA): A performance-based assessment of executive function piloted with at-risk adolescents. American Journal of Occupational Therapy, 66, 699–708. http://dx.doi.org/10.5014/ajot.2012.004754 [Article] [PubMed]×
*Wong, C. K., & Moskovitz, N. (2010). New assessment of forearm strength: Reliability and validity. American Journal of Occupational Therapy, 64, 809–813. http://dx.doi.org/10.5014/ajot.2010.09140 [Article] [PubMed]
*Wong, C. K., & Moskovitz, N. (2010). New assessment of forearm strength: Reliability and validity. American Journal of Occupational Therapy, 64, 809–813. http://dx.doi.org/10.5014/ajot.2010.09140 [Article] [PubMed]×
*Indicates studies that were systematically reviewed for this article.
Indicates studies that were systematically reviewed for this article.×
Figure 1.
Flow diagram of the selection process and study search result.
Note. AJOT = American Journal of Occupational Therapy.
Figure 1.
Flow diagram of the selection process and study search result.
Note. AJOT = American Journal of Occupational Therapy.
×
Table 1.
Measurement Properties of the Study Articles (N = 48)
Measurement Properties of the Study Articles (N = 48)×
StudyItem Response TheoryMeasurement Property
Internal ConsistencyReliability
Measurement ErrorContent ValidityStructural ValidityHypothesis Testing
Criterion Validity
Test–RetestInterraterIntraraterConvergent ValidityKnown- Groups Validity
Bédard, Parkkari, Weaver, Riendeau, & Dahlquist (2010) xxxx
Canny, Thompson, & Wheeler (2009) xxx
Chang, Helfrich, & Coster (2013) xx
Chang, Ailey, Heller, & Chen (2013) xxx
Cheng & Cheng (2011) xx
Classen, Wang, Crizzle, Winter, & Lanford (2013) x
Classen, Wang, Winter, et al. (2013) x
Classen et al. (2012a) xxx
Classen et al. (2012b) xx
Classen et al. (2010) x
Classen et al. (2011) xx
Duquette et al. (2010) x
Eakman (2012) xxx
Engstrand, Krevers, & Kvist (2012) xx
Flinn, Pease, & Freimer (2012) xxxx
Gal, Ben Meir, & Katz (2013) xx
George & Crotty (2010) x
Hartman-Maeir, Harel, & Katz (2009) xxx
Hwang (2010) xxx
Hwang (2012) xxx
Hwang (2013) xx
Jang, Chern, & Lin (2009) xxxx
Katz, Averbuch, & Bar-Haim Erez (2012) xx
Katz, Bar-Haim Erez, Livni, & Averbuch (2012) xxx
Kay, Bundy, & Clemson (2009) x
King (2013) x
Lehman, Woodbury, & Velozo (2011) xx
Lewis, Fors, & Tharion (2010) xx
Lindstrom-Hazel, Kratt, & Bix (2009) x
Lyons, Li, Tosteson, Meehan, & Ahles (2010) xxx
Mennem, Warren, & Yuen (2012) xx
Merritt (2011) xxx
Morrison et al. (2013) xxx
Ownsworth et al. (2013) xxxx
Perlmutter (2013) xxx
Rieke & Anderson (2009) x
Rowe (2013) xx
Saban, Ornoy, Grotto, & Parush (2012) xxxxxx
Shechtman, Awadzi, Classen, Lanford, & Joo (2010) x
Shih, Rogers, Skidmore, Irrgang, & Holm (2009) xxx
Simmons, Griswold, & Berg (2010) xxx
Søndergaard & Fisher (2012) x
Stark, Somerville, & Morris (2010) xxxx
Su, Tsai, Su, Tang, & Tsai (2011) x
Toglia & Berg (2013) xx
Unsworth, Pallant, Russell, Germano, & Odell (2010) xxxxxx
Weiner, Toglia, & Berg (2012) x
Wong & Moskovitz (2010) xxx
Table 1.
Measurement Properties of the Study Articles (N = 48)
Measurement Properties of the Study Articles (N = 48)×
StudyItem Response TheoryMeasurement Property
Internal ConsistencyReliability
Measurement ErrorContent ValidityStructural ValidityHypothesis Testing
Criterion Validity
Test–RetestInterraterIntraraterConvergent ValidityKnown- Groups Validity
Bédard, Parkkari, Weaver, Riendeau, & Dahlquist (2010) xxxx
Canny, Thompson, & Wheeler (2009) xxx
Chang, Helfrich, & Coster (2013) xx
Chang, Ailey, Heller, & Chen (2013) xxx
Cheng & Cheng (2011) xx
Classen, Wang, Crizzle, Winter, & Lanford (2013) x
Classen, Wang, Winter, et al. (2013) x
Classen et al. (2012a) xxx
Classen et al. (2012b) xx
Classen et al. (2010) x
Classen et al. (2011) xx
Duquette et al. (2010) x
Eakman (2012) xxx
Engstrand, Krevers, & Kvist (2012) xx
Flinn, Pease, & Freimer (2012) xxxx
Gal, Ben Meir, & Katz (2013) xx
George & Crotty (2010) x
Hartman-Maeir, Harel, & Katz (2009) xxx
Hwang (2010) xxx
Hwang (2012) xxx
Hwang (2013) xx
Jang, Chern, & Lin (2009) xxxx
Katz, Averbuch, & Bar-Haim Erez (2012) xx
Katz, Bar-Haim Erez, Livni, & Averbuch (2012) xxx
Kay, Bundy, & Clemson (2009) x
King (2013) x
Lehman, Woodbury, & Velozo (2011) xx
Lewis, Fors, & Tharion (2010) xx
Lindstrom-Hazel, Kratt, & Bix (2009) x
Lyons, Li, Tosteson, Meehan, & Ahles (2010) xxx
Mennem, Warren, & Yuen (2012) xx
Merritt (2011) xxx
Morrison et al. (2013) xxx
Ownsworth et al. (2013) xxxx
Perlmutter (2013) xxx
Rieke & Anderson (2009) x
Rowe (2013) xx
Saban, Ornoy, Grotto, & Parush (2012) xxxxxx
Shechtman, Awadzi, Classen, Lanford, & Joo (2010) x
Shih, Rogers, Skidmore, Irrgang, & Holm (2009) xxx
Simmons, Griswold, & Berg (2010) xxx
Søndergaard & Fisher (2012) x
Stark, Somerville, & Morris (2010) xxxx
Su, Tsai, Su, Tang, & Tsai (2011) x
Toglia & Berg (2013) xx
Unsworth, Pallant, Russell, Germano, & Odell (2010) xxxxxx
Weiner, Toglia, & Berg (2012) x
Wong & Moskovitz (2010) xxx
×