Features of determining the validity of a pedagogical test

Validity is one of the basic criteria in psychodiagnostics of tests and techniques that determines their quality, close to the concept of reliability. It is used when you need to find out how well a technique measures exactly what it is aimed at; accordingly, the better the quality under study is displayed, the greater the validity of this technique.

The question of validity arises first in the process of developing the material, then after applying a test or technique, if it is necessary to find out whether the degree of expression of the identified personality characteristic corresponds to the method for measuring this property.

The concept of validity is expressed by the correlation of the results obtained as a result of applying a test or technique with other characteristics that are also studied, and it can also be argued comprehensively, using different techniques and criteria. Different types of validity are used: conceptual, constructive, criterion, content validity, with specific methods for establishing their degree of reliability. Sometimes the criterion of reliability is a mandatory requirement for checking psychodiagnostic methods if they are in doubt.

For psychological research to have real value, it must not only be valid, but also reliable at the same time. Reliability allows the experimenter to be confident that the value being studied is very close to the true value. And a valid criterion is important because it indicates that what is being studied is exactly what the experimenter intends. It is important to note that this criterion may imply reliability, but reliability cannot imply validity. Reliable values ​​may not be valid, but valid ones must be reliable, this is the whole essence of successful research and testing.

Validity is in psychology

In psychology, the concept of validity refers to the experimenter’s confidence that he measured exactly what he wanted using a certain technique, and shows the degree of consistency between the results and the technique itself relative to the tasks set. A valid measurement is one that measures exactly what it was designed to measure. For example, a technique aimed at determining temperament should measure precisely temperament, and not something else.

Validity in experimental psychology is a very important aspect, it is an important indicator that ensures the reliability of the results, and sometimes the most problems arise with it. A perfect experiment must have impeccable validity, that is, it must demonstrate that the experimental effect is caused by modifications of the independent variable and must be completely consistent with reality. The results obtained can be generalized without restrictions. If we are talking about the degree of this criterion, then it is assumed that the results will correspond to the objectives.

Validity testing is carried out in three ways.

Content validity assessment is carried out to find out the level of correspondence between the methodology used and the reality in which the property under study is expressed in the methodology. There is also such a component as obvious, also called face validity, it characterizes the degree of compliance of the test with the expectations of those being assessed. In most methodologies, it is considered very important that the assessment participant sees an obvious connection between the content of the assessment procedure and the reality of the assessment object.

Construct validity assessment is performed to obtain the degree of validity that the test actually measures those constructs that are specified and scientifically valid.

There are two dimensions to construct validity. The first is called convergent validation, which checks the expected relationship of the results of a technique with characteristics from other techniques that measure the original properties. If several methods are needed to measure some characteristic, then a rational solution would be to conduct experiments with at least two methods, so that when comparing the results, finding a high positive correlation, one can claim a valid criterion.

Convergent validation determines the likelihood that a test score will vary with expectations. The second approach is called discriminant validation, which means that the technique should not measure any characteristics with which theoretically there should be no correlation.

Validity testing can also be criterion-based; it, guided by statistical methods, determines the degree of compliance of the results with predetermined external criteria. Such criteria can be: direct measures, methods independent of the results, or the value of social and organizational significant performance indicators. Criterion validity also includes predictive validity; it is used when there is a need to predict behavior. And if it turns out that this forecast is realized over time, then the technique is predictively valid.

Content validity.

Content validity requires that every item, task, or question belonging to a particular domain has an equal chance of being tested on a test. Content validity assesses the consistency of the test content (tasks, questions) with the measured area of ​​behavior. The tests, compiled by two development teams, are conducted on a sample of subjects. Test reliability is calculated by splitting items into two parts, resulting in a content validity index.

The validity of the test is

A test is a standardized task, as a result of its application, data is obtained about the psychophysiological state of a person and his personal properties, his knowledge, abilities and skills.

Validity and reliability of tests are two indicators that determine their quality.

The validity of the test determines the degree of correspondence of the quality, characteristic, or psychological property being studied to the test by which they are determined.

The validity of a test is an indicator of its effectiveness and applicability to the measurement of the required characteristic. The highest quality tests have 80% validity. When validating, it should be taken into account that the quality of the results will depend on the number of subjects and their characteristics. It turns out that one test can be either highly reliable or completely invalid.

There are several approaches to determining the validity of a test.

When measuring a complex psychological phenomenon that has a hierarchical structure and cannot be studied using just one test, construct validity is used. It determines the accuracy of the study of complex, structured psychological phenomena and personality traits measured through testing.

Criterion-based validity is a test criterion that determines the psychological phenomenon under study at the present moment and predicts the characteristics of this phenomenon in the future. To do this, the results obtained during testing are correlated with the degree of development of the quality being measured in practice, assessing specific abilities in a certain activity. If the validity of the test has a value of at least 0.2, then the use of such a test is justified.

Content validity is a criterion of a test that is used to determine the compliance of the scope of its measured psychological constructs and demonstrates the completeness of the set of measured indicators.

Predictive validity is a criterion by which one can predict the nature of the development of the quality under study in the future. This criterion for test quality is very valuable when viewed from a practical point of view, but there may be difficulties, since the uneven development of this quality in different people is excluded.

Test reliability is a test criterion that measures the level of consistency of test results across repeated studies. It is determined by secondary testing after a certain amount of time and calculating the correlation coefficient of the results obtained after the first and after the second testing. It is also important to take into account the peculiarities of the test procedure itself and the socio-psychological structure of the sample. The same test can have different reliability, depending on the gender, age, and social status of the subjects. Therefore, reliability can sometimes have inaccuracies and errors that arise from the research process itself, so ways are being sought to reduce the influence of certain factors on testing. It can be stated that the test is reliable if it is 0.8-0.9.

The validity and reliability of tests are very important because they define the test as a measuring instrument. When reliability and validity are unknown, the test is considered unsuitable for use.

There is also an ethical context in measuring reliability and validity. This is especially important when test results have implications for people's life-saving decisions. Some people are hired, others are eliminated, some students go to educational institutions, while others must finish their studies first, some are given a psychiatric diagnosis and treatment, while others are healthy - this all suggests that such decisions are made on the basis studying assessment of behavior or special abilities. For example, a person looking for a job must take a test, and his scores are the decisive indicators when applying for a job, and finds out that the test was not valid and reliable enough, he will be very disappointed.

External validity

*determines to what extent the results obtained in the experiment will correspond to the life situation that served as the “prototype” for the experiment.

*In addition, external validity characterizes the possibility of generalization, transferring the results obtained in the experiment to the entire class of life situations to which the “primordial” belongs, and to any Others.

It must be said that external validity is of particular importance at the empirical stage of the development of science. In principle, experiments are possible that do not correspond to any real life situations, but serve only to test hypotheses, the source of which is a developed theory. In advanced sciences, researchers tend to avoid “direct closure.” The experimental result is reality, since it is clear that the experiment is based on the requirements of the theory being tested, and not on the requirements of compliance with reality. Modeling of some conditions, for example in experiments on sensory deprivation or the development of classical conditioned reflexes, does not correspond to any real life reality. Provided that by reality we mean what was, and not what could potentially be. Therefore, the multi-page discussions of such a reputable author as Gottsdanker about “full correspondence experiments” or “reality-enhancing experiments” seem far-fetched and archaic.

But the importance of “external validity” for an experiment cannot be denied, given the general state of psychological science, and not the “cutting edge” of psychological theory.

External validity is sometimes interpreted as a characteristic of an experiment that determines the transferability (generalization) of the results obtained to different times, places, conditions and groups of people (or animals).

However, the possibility of transfer is a consequence of two reasons:

1) compliance of the experimental conditions with its “primordial” life situation (“representativeness” of the experiment);

2) the typicality of the most “primordial” situation for reality (“representativeness” of the situation).

The situation chosen for modeling in an experiment may be completely unrepresentative from the point of view of the life of the group of subjects participating in the experiment, or it may be rare and atypical.

External validity, as Gottsdanker defines it, affects primarily the reliability of the conclusions that the results of a real experiment provide in comparison with a full-match experiment. To achieve high external validity, it is necessary that the levels of additional variables in the experiment correspond to their levels in reality. An experiment that lacks external validity is considered invalid. Let us add that it is incorrect if the source of the hypothesis is reality, ordinary knowledge, and not theory. An experiment that does not correspond to reality may have perfect internal and operational validity. Another thing is that a direct transfer of its results to reality is impossible without taking into account the influence on the dependent variable in addition to the independent and additional variables.

It is obvious that achieving complete external validity is impossible in principle, therefore any “pure” analytical study is externally invalid. However, it is recommended to take into account as much as possible the influence of additional variables on the experimental effect, since it is not known when a theory will be built to explain them, and the data may have to be used in practice.

Researchers working in applied fields are especially concerned about the external validity of experiments:

*clinical psychology,

*pedagogical and

*organizational psychology.

This is understandable, because to solve their everyday problems they more often have to resort to experiments that imitate reality. In fact, the historical debate between supporters of the laboratory experiment and the “natural experiment” was a reflection of the different methodological approaches of specialists involved in fundamental or applied psychology. Currently, factors influencing external validity are considered to be irreducible features of the experiment that distinguish it from the real situation. Campbell equates external validity, the representativeness of the experiment, and the generalizability of its results. He classifies factors that threaten external validity as primarily effects associated with the characteristics of the research object:

*learnability,

*availability of memory,

*ability to react emotionally to situations.

Campbell names the main reasons for the violation of external validity:

1. The effect of testing is a decrease or increase in the susceptibility of subjects to experimental influence under the influence of testing. For example, preliminary control of students' knowledge can increase their interest in new educational material. Since the population is not subject to preliminary testing, the results for it may not be representative.

2. Conditions for conducting the study . They cause the subject's reaction to the experiment. Consequently, its data cannot be transferred to individuals who did not take part in the experiment; these individuals are the entire general population, except for the experimental sample.

3. Interaction of selection factors and the content of experimental influence . Their consequences are artifacts (in experiments with volunteers or subjects participating under duress).

4. Interference of experimental influences . The subjects have memory and learning ability. If an experiment consists of several series, then the first influences do not pass without a trace and affect the appearance of effects from subsequent influences.

Most of the reasons for the violation of external validity are associated with the characteristics of a psychological experiment conducted with human participants, which differentiate psychological research from an experiment carried out by specialists in other natural sciences.

Solomon was the first to draw attention to the interaction between the testing procedure and the content of experimental influence in 1949 when conducting a study of schoolchildren: preliminary testing reduced the effectiveness of learning. A study of social attitudes showed that preliminary testing influenced the individual’s attitudes and susceptibility to persuasion, and in Hovland’s , on the contrary, weakened the persuasive effect of films.

The more unusual the testing procedure and the more similar in content the experimental intervention is to the test, the greater the effect. To avoid pretest effects, Campbell recommends using experimental designs with no pretest groups.

As already connected with the non-random participation of subjects in the experiment. The reaction can be of two types:

*willingness of volunteers to be “exposed” and

*refusal, negative reaction of those who are forced to participate in the experiment.

Only intellectually gifted people can agree to participate in learning research. Dropout of subjects during an experiment can be caused by experimental influence. For example, individuals who fail in achievement motivation tasks may refuse to participate in subsequent series.

Naturally, it is almost impossible to eliminate only the factor of “reaction to the experiment.” Let us note once again that the problem of internal validity is solvable in principle, since it is possible to select appropriate procedures for planning an experiment and mathematical processing of the results to ensure a given level of their reliability.

According to Campbell, the problem of external validity as the representativeness of an experiment in relation to reality is insoluble, since induction, i.e. generalization, can never be completely objective.

The problem of external validity as the adequacy of the experimental situation to its “primordial” life situation is also insoluble by logical and mathematical means: it requires the involvement of the entire body of scientific psychological knowledge to describe the situation as a whole.

The validity of the methodology is

The validity of a technique determines the correspondence of what is studied by this technique to what exactly it is intended to study.

For example, if a psychological technique that is based on informed self-report is assigned to study a certain personality quality, a quality that cannot be truly assessed by the person himself, then such a technique will not be valid.

In most cases, the answers that the subject gives to questions about the presence or absence of development of this quality in him can express how the subject himself perceives himself, or how he would like to be in the eyes of other people.

Validity is also a basic requirement for psychological methods for studying psychological constructs. There are many different types of this criterion, and there is no single opinion yet on how to correctly name these types and it is not known which specific types the technique must comply with. If the technique turns out to be invalid externally or internally, it is not recommended to use it. There are two approaches to method validation.

The theoretical approach is revealed in showing how truly the methodology measures exactly the quality that the researcher came up with and is obliged to measure. This is proven through compilation with related indicators and those where connections could not exist. Therefore, to confirm a theoretically valid criterion, it is necessary to determine the degree of connections with a related technique, meaning a convergent criterion and the absence of such a connection with techniques that have a different theoretical basis (discriminant validity).

Assessing the validity of a technique can be quantitative or qualitative. The pragmatic approach evaluates the effectiveness and practical significance of the technique, and for its implementation an independent external criterion is used, as an indicator of the occurrence of this quality in everyday life. Such a criterion, for example, can be academic performance (for achievement methods, intelligence tests), subjective assessments (for personal methods), specific abilities, drawing, modeling (for special characteristics methods).

To prove the validity of external criteria, four types are distinguished: performance criteria - these are criteria such as the number of tasks completed, time spent on training; subjective criteria are obtained along with questionnaires, interviews or questionnaires; physiological – heart rate, blood pressure, physical symptoms; criteria of chance - are used when the goal is related or influenced by a certain case or circumstances.

When choosing a research methodology, it is of theoretical and practical importance to determine the scope of the characteristics being studied, as an important component of validity. The information contained in the name of the technique is almost always not sufficient to judge the scope of its application. This is just the name of the technique, but there is always a lot more hidden under it. A good example would be the proofreading technique. Here, the scope of properties being studied includes concentration, stability and psychomotor speed of processes. This technique provides an assessment of the severity of these qualities in a person, correlates well with values ​​obtained from other methods and has good validity. At the same time, the values ​​obtained as a result of the correction test are subject to a greater influence of other factors, regarding which the technique will be nonspecific. If you use a proof test to measure them, the validity will be low. It turns out that by determining the scope of application of the methodology, a valid criterion reflects the level of validity of the research results. With a small number of accompanying factors that influence the results, the reliability of the estimates obtained in the methodology will be higher. The reliability of the results is also determined using a set of measured properties, their importance in diagnosing complex activities, and the importance of displaying the methodology of the subject of measurement in the material. For example, to meet the requirements of validity and reliability, the methodology assigned for professional selection must analyze a large range of different indicators that are most important in achieving success in the profession.

Rating
( 1 rating, average 5 out of 5 )
Did you like the article? Share with friends: