Wednesday, 1 February 2017

Reliability and Validity of a Test

For a test items to be regarded as appropriate for measuring anything, such a test must possess certain desirable psychometric qualities.
What is of interest to a test constructor at this stage is the mean score for all the groups and the distribution of scores to determine consistency of the test scores. In view of this, Brown (1983) opines that the consistency of the test scores must be established especially the equivalent form reliability if more than the form of such test items exist. However, reliability of a test can as well be established by using either the split-half or test-retest methods.
Brown further states that the validity of the test particularly the content validity as well as the criterion-related validity must also be determined.
When constructing or selecting test items and other evaluation instruments, the most important question is, to what extent will the interpretation of the scores be appropriate, meaningful, and useful for the intended application of the results?
A test generally is a valid and reliable instrument that is made up of items that could be used in obtaining information about student or individual behavior. Test and other evaluation instruments serve a variety of uses in the school. For example, tests of achievement might be used for selection, placement, diagnosis and certification of mastery, aptitude test might be used for predicting success in the future learning activities or occupations.
Regardless of the type of instrument used or how the results are to be used, however, all of the measurements should possess certain characteristics, which are validity, reliability and usability.
The following psychometric properties are as follows;
VALIDITY
Validity is the extent to which a test measure what is supposes to measure. It is also refer to the purpose of the test. If the purpose of the test is to measure students’ achievement, it should be used for that purpose only.
Types of Validity
1.    Face validity
2.    Content validity
3.    Predictive validity
4.    Concurrent validity
5.    Construct validity
FACE VALIDITY: Face validity deals with the appearance of the test, whether the appearance of the test looks like the purpose of the test. The evidence of face validity is the recognitions of a test in terms of what it seeks to measure. For example, the ability to recognized a mathematics test because of numbers, and other mathematical signs and symbols such that mathematics test cannot be confused to be English test or a chemistry test. If this is done, it means the test have face validity.
CONTENT VALIDITY: This is concerned with the extent to which the test items cover a given content of a curriculum area. In establishing the content validity of a test, the compiler of the test requires the use of test blue print or a table of specification and more importantly, the consensus of expert judgment in the area that is being tested.
PREDICTIVE VALIDITY: A test possesses predictive validity if it is capable of predicting the future outcome or performance of testees, for instance the University Matriculation Examination (UME) has predictive validity if it is capable of predicting the performance of students who eventually enter the university such that their UME scores correlate with their academic performance during their university programme.
CONCURRENT VALIDITY: It is the extent to which two similar tests relate in terms of score when given it to a group of students. I.e. the two form of similar test should be given to the students within the same period. The score of the two tests are correlated to obtain a coefficient which is significant; the first has a concurrent validity with the second test. The first may be from the teacher who taught the students previously while the second test is from the new teacher who wants to verify their performance when the scores are related. It is an evident of concurrent validity.
The concurrent validity of test is established by correlating the scores of the test that is being constructed with the scores of an already standardized test. The higher the co-efficient of construction, the higher the level of concurrent validity.
CONSTRUCT VALIDITY: It is the extent of verifying the evident of psychological variable, it include interest, emotion among others. The researcher who is investigating the interest of students, may want to find out through the use of questionnaire whether students have interest in a particular subject than when there is evident student have interest, it shows it has construct validity. Base on the use of questionnaire, most of the psychological variables are abstract.
Whenever we wish to interpret test performance in terms of some psychological trait or quality, we are concerned with construct. A construct is a psychological quality that we assume exists in order to explain some aspect of behavior. Mathematical reasoning is a const5ruct, and so are intelligence, creativity, reading comprehension and such personality characteristics as sociability, honesty and anxiety.

NATURE OF VILIDITY
When using the validity in relation to testing and evaluation, there are a number of cautions to be done in mind;
1.    Validity refers to the appropriateness of the interpretation of the result of a test or evaluation instrument for a given group of individuals and not to the instrument itself. We sometimes speak of the “validity of a test” for the sake of convenience, but it is more correct to speak of the validity of the interpretation to be made from the result.
2.    Validity is a matter of degree; It does not exist on an all-or-none basis. Consequently, we should avoid thinking of evaluation of results as valid or invalid. Validity is best considered in terms of categories that specify degree, such as high validity, moderate validity and low validity.
3.    Validity is always specific to some particular use. No test is valid for all purpose. For example, the result of an arithmetic test may have a high degree of validity for indicating computational skill, a low degree of validity for indicating arithmetical reasoning, a moderate degree of validity for predicting success in future mathematics courses, and no validity for predicting success in art or music. Thus when appraising or describing validity, it is necessary to consider the use to be made of the results. Evaluation results are never just valid; they have a different degree of validity for each particular interpretation to be made.
4.    Validity is unitary concept; The conceptual nature of validity has typically been described for the testing profession in a set of standards prepared by a joint committee made up of member from three professional organization that are especially concerned with educational and psychologically testing. Validity is viewed as unitary concept based on various kinds of evidence. The three basic ways of accumulating evidence to support the validity of an interpretation (content, criterion, related and construct).

RELIABILITY
A question that may be asked about a measuring instrument or a test is how reliable is the test or instrument. In this case, we are not asking what it measures, but how accurately it measures what it does measure. How accurately will it be reproduced if we measure the testee again? Putting all the answers to the above questions in summary form, one could define the reliability of a test as the consistency of the test in measuring what it is meant to measure on several occasions (murkherjee, 1978). Stanley and Hopkings (1972) define reliability as the extent to which a test is consistent in measuring whatever it does measuring, dependability, stability, relative freedom from error measurement. Thorndike and Hgen (1977) see reliability of a test as the accuracy or precision with which a measure based on one sample of test tasks at a single point in time represent performance established on a variation of sample of the same kind of tasks or a different point of time or both. They further suggest that accuracy may be expressed by a reliability coefficient.
Reliability refers to the consistency of test results. If we obtain quite similar scores when the sample test is administered to the same group on two different occasions, we can conclude that our results have a high degree of reliability from one occasion to the other. Similarly, if different teachers independently rate the same pupils on the same instrument (test) and obtain similar ratings, we can conclude that the results have a high degree of reliability from one rater to another.
There are various ways of determining the reliability coefficient of a test. The most important methods for establishing reliability as far as objective or multiple – choice item achievements test are;
1.    Split – half method (internal consistency)
2.    Test – retest (coefficient of stability)
3.    Parallel form(coefficient of equivalent)
4.    Cronbach’s Alpha (commonly use for instrument or questionnaire with multiple respeonses)
5.    Kuder – Richardson formular 20 & 21 (internal consistency)

Split – half method; It is the extent to which a single test is administer to a group of testee, but however the testees are dichotomize into odd and even numbers. The score of the students are correlated using odd and even number, then a coefficient will be obtained, this coefficient is called half reliability. To obtain a full reliability, a spearman rank order formula is used to spear up reliability. This produces two scores for each testee, which when correlated, produces a measure of internal consistency.
As noted, the proceeding reliability coefficient is determined by correlating the scores of two half – tests. To estimate the scores reliability based on the full length test, the spearman rank formula is usually applied.
Reliability on full test =   2 x Reliability on ½ test
                                           1 + Reliability on ½ test
The simplicity of the formula can be seen in the following example in which the correlation coefficient between the test’s two halves is .60;
Reliability on full test = 2 x .60   = 1.20     =    .75
                                         1 + .60     1.60

Parallel form (coefficient of equivalent); It is the extent to which two similar test are administered to testee in a given period, the scores from both test are correlated to obtained a coefficient, this coefficient is called a measured of equivalence. Or it can also be put in this way, if two or more parallel forms of a test have been produced in such a way that it seems likely to scores on these alternate forms will be equivalent and if each individual in the group is given both forms of the test, then the correlation between scores on the two forms produces on estimate of these reliability. (Tuckman, 1978).
Test – retest; The test – retest method is one of the oldest and, at least at first glance, one of the most sensible methods of estimating the reliability of test scores.
Test – retest method is the extent to which a single test is administer twice on two occasion with and individual.

No comments:

Post a Comment