What type of validity refers to the extent to which the results of a study can be generalized across time?

Note to EPSY 5601 Students: An understanding of the difference between population and ecological validity is sufficient. Mastery of the sub categories for each is not necessary for this course.

External Validity
(Generalizability)
–to whom can the results of the study be applied–

There are two types of study validity: internal (more applicable with experimental research) and external. This section covers external validity.

External validity involves the extent to which the results of a study can be generalized (applied) beyond the sample. In other words, can you apply what you found in your study to other people (population validity) or settings (ecological validity).  A study of fifth graders in a rural school that found one method of teaching spelling was superior to another may not be applicable with third graders (population) in an urban school (ecological).

Threats to External Validity

Population Validity the extent to which the results of a study can be generalized from the specific sample that was studied to a larger group of subjects

  1. the extent to which one can generalize from the study sample to a defined population– If  the sample is drawn from an accessible population, rather than the target population, generalizing the research results from the accessible population to the target population is risky. 2. the extent to which personological variables interact with treatment effects–

    If the study is an experiment, it may be possible that different results might be found with students at different grades (a personological variable).

Ecological Validity the extent to which the results of an experiment can be generalized from the set of environmental conditions created by the researcher to other environmental conditions (settings and conditions).

  1. Explicit description of the experimental treatment (not sufficiently described for others to replicate)
    If the researcher fails to adequately describe how he or she conducted a study, it is difficult to determine whether the results are applicable to other settings.
  2. Multiple-treatment interference (catalyst effect)
    If a researcher were to apply several treatments, it is difficult to determine how well each of the treatments would work individually. It might be that only the combination of the treatments is effective.
  3. Hawthorne effect (attention causes differences)
    Subjects perform differently because they know they are being studied. “…External validity of the experiment is jeopardized because the findings might not generalize to a situation in which researchers or others who were involved in the research are not present” (Gall, Borg, & Gall, 1996, p. 475)
  4. Novelty and disruption effect (anything different makes a difference)
    A treatment may work because it is novel and the subjects respond to the uniqueness, rather than the actual treatment. The opposite may also occur, the treatment may not work because it is unique, but given time for the subjects to adjust to it, it might have worked.
  5. Experimenter effect (it only works with this experimenter)
    The treatment might have worked because of the person implementing it. Given a different person, the treatment might not work at all.
  6. Pretest sensitization (pretest sets the stage)
    A treatment might only work if a pretest is given. Because they have taken a pretest, the subjects may be more sensitive to the treatment. Had they not taken a pretest, the treatment would not have worked.
  7. Posttest sensitization (posttest helps treatment “fall into place”)
    The posttest can become a learning experience. “For example, the posttest might cause certain ideas presented during the treatment to ‘fall into place’ ” (p. 477). If the subjects had not taken a posttest, the treatment would not have worked.
  8. Interaction of history and treatment effect (…to everything there is a time…)
    Not only should researchers be cautious about generalizing to other population, caution should be taken to generalize to a different time period. As time passes, the conditions under which treatments work change.
  9. Measurement of the dependent variable (maybe only works with M/C tests)
    A treatment may only be evident with certain types of measurements. A teaching method may produce superior results when its effectiveness is tested with an essay test, but show no differences when the effectiveness is measured with a multiple choice test.
  10. Interaction of time of measurement and treatment effect (it takes a while for the treatment to kick in)
    It may be that the treatment effect does not occur until several weeks after the end of the treatment. In this situation, a posttest at the end of the treatment would show no impact, but a posttest a month later might show an impact.

Bracht, G. H., & Glass, G. V. (1968). The external validity of experiments. American Education Research Journal, 5, 437-474.
Gall, M. D., Borg, W. R., & Gall, J. P. (1996). Educational research: An introduction. White Plains, NY: Longman.

Del Siegle, Ph.D.
Neag School of Education – University of Connecticut

www.delsiegle.com

By Dr. Saul McLeod, published 2013

What is the meaning of validity in research?

The concept of validity was formulated by Kelly (1927, p. 14) who stated that a test is valid if it measures what it claims to measure.

For example a test of intelligence should measure intelligence and not something else (such as memory).

A distinction can be made between internal and external validity. These types of validity are relevant to evaluating the validity of a research study / procedure.

What is internal and external validity in research?

Internal validity refers to whether the effects observed in a study are due to the manipulation of the independent variable and not some other factor.

In-other-words there is a causal relationship between the independent and dependent variable.

Internal validity can be improved by controlling extraneous variables, using standardized instructions, counter balancing, and eliminating demand characteristics and investigator effects.

External validity refers to the extent to which the results of a study can be generalized to other settings (ecological validity), other people (population validity) and over time (historical validity).

External validity can be improved by setting experiments in a more natural setting and using random sampling to select participants.

Assessing the Validity of Test

There there are two main categories of validity used to assess the validity of test (i.e. questionnaire, interview, IQ test etc.): Content and criterion.

What type of validity refers to the extent to which the results of a study can be generalized across time?

What is face validity in research?

Face validity is simply whether the test appears (at face value) to measure what it claims to. This is the least sophisticated measure of validity.

Tests wherein the purpose is clear, even to naïve respondents, are said to have high face validity. Accordingly, tests wherein the purpose is unclear have low face validity (Nevo, 1985).

A direct measurement of face validity is obtained by asking people to rate the validity of a test as it appears to them. This rater could use a likert scale to assess face validity. For example:

  1. the test is extremely suitable for a given purpose
  2. the test is very suitable for that purpose;
  3. the test is adequate
  4. the test is inadequate
  5. the test is irrelevant and therefore unsuitable

It is important to select suitable people to rate a test (e.g. questionnaire, interview, IQ test etc.). For example, individuals who actually take the test would be well placed to judge its face validity.

Also people who work with the test could offer their opinion (e.g. employers, university administrators, employers). Finally, the researcher could use members of the general public with an interest in the test (e.g. parents of testees, politicians, teachers etc.).

The face validity of a test can be considered a robust construct only if a reasonable level of agreement exists among raters.

It should be noted that the term face validity should be avoided when the rating is done by "expert" as content validity is more appropriate.

Having face validity does not mean that a test really measures what the researcher intends to measure, but only in the judgment of raters that it appears to do so. Consequently it is a crude and basic measure of validity.

A test item such as 'I have recently thought of killing myself' has obvious face validity as an item measuring suicidal cognitions, and may be useful when measuring symptoms of depression.

However, the implications of items on tests with clear face validity is that they are more vulnerable to social desirability bias. Individuals may manipulate their response to deny or hide problems, or exaggerate behaviors to present a positive images of themselves.

It is possible for a test item to lack face validity but still have general validity and measure what it claims to measure. This is good because it reduces demand characteristics and makes it harder for respondents to manipulate their answers.

For example, the test item 'I believe in the second coming of Christ' would lack face validity as a measure of depression (as the purpose of the item is unclear).

This item appeared on the first version of The Minnesota Multiphasic Personality Inventory (MMPI) and loaded on the depression scale.

Because most of the original normative sample of the MMPI were good Christians only a depression Christian would think Christ is not coming back. Thus, for this particular religious sample the item does have general validity, but not face validity.

What is construct validity in research?

Construct validity was invented by Cornball and Meehl (1955). This type of validity refers to the extent to which a test captures a specific theoretical construct or trait, and it overlaps with some of the other aspects of validity

Construct validity does not concern the simple, factual question of whether a test measures an attribute.

Instead it is about the complex question of whether test score interpretations are consistent with a nomological network involving theoretical and observational terms (Cronbach & Meehl, 1955).

To test for construct validity it must be demonstrated that the phenomenon being measured actually exists. So, the construct validity of a test for intelligence, for example, is dependent on a model or theory of intelligence.

Construct validity entails demonstrating the power of such a construct to explain a network of research findings and to predict further relationships.

The more evidence a researcher can demonstrate for a test's construct validity the better. However, there is no single method of determining the construct validity of a test.

Instead, different methods and approaches are combined to present the overall construct validity of a test. For example, factor analysis and correlational methods can be used.

What is concurrent validity in research?

This is the degree to which a test corresponds to an external criterion that is known concurrently (i.e. occurring at the same time).

If the new test is validated by a comparison with a currently existing criterion, we have concurrent validity.

Very often, a new IQ or personality test might be compared with an older but similar test known to have good validity already.

What is predictive validity in research?

This is the degree to which a test accurately predicts a criterion that will occur in the future.

For example, a prediction may be made on the basis of a new intelligence test, that high scorers at age 12 will be more likely to obtain university degrees several years later. If the prediction is born out then the test has predictive validity.

APA Style References

Cronbach, L. J., and Meehl, P. E. (1955) Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.

Hathaway, S. R., & McKinley, J. C. (1943). Manual for the Minnesota Multiphasic Personality Inventory. New York: Psychological Corporation.

Kelley, T. L. (1927).Interpretation of educational measurements. New York: Macmillan.

Nevo, B. (1985). Face validity revisited. Journal of Educational Measurement, 22(4), 287-293.

How to reference this article:

McLeod, S. A. (2013). What is validity? Simply Psychology. www.simplypsychology.org/validity.html

Home | About Us | Privacy Policy | Advertise | Contact Us

Simply Psychology's content is for informational and educational purposes only. Our website is not intended to be a substitute for professional medical advice, diagnosis, or treatment.

© Simply Scholar Ltd - All rights reserved