For a test
items to be regarded as appropriate for measuring anything, such a test must
possess certain desirable psychometric qualities.
What is of
interest to a test constructor at this stage is the mean score for all the
groups and the distribution of scores to determine consistency of the test
scores. In view of this, Brown (1983) opines that the consistency of the test
scores must be established especially the equivalent form reliability if more
than the form of such test items exist. However, reliability of a test can as
well be established by using either the split-half or test-retest methods.
Brown
further states that the validity of the test particularly the content validity
as well as the criterion-related validity must also be determined.
When constructing
or selecting test items and other evaluation instruments, the most important
question is, to what extent will the interpretation of the scores be
appropriate, meaningful, and useful for the intended application of the
results?
A test
generally is a valid and reliable instrument that is made up of items that
could be used in obtaining information about student or individual behavior.
Test and other evaluation instruments serve a variety of uses in the school.
For example, tests of achievement might be used for selection, placement,
diagnosis and certification of mastery, aptitude test might be used for
predicting success in the future learning activities or occupations.
Regardless
of the type of instrument used or how the results are to be used, however, all
of the measurements should possess certain characteristics, which are validity,
reliability and usability.
The
following psychometric properties are as follows;
VALIDITY
Validity is
the extent to which a test measure what is supposes to measure. It is also
refer to the purpose of the test. If the purpose of the test is to measure
students’ achievement, it should be used for that purpose only.
Types of Validity
1. Face validity
2. Content validity
3. Predictive validity
4. Concurrent validity
5. Construct validity
FACE
VALIDITY: Face validity deals with the appearance of the test, whether the appearance
of the test looks like the purpose of the test. The evidence of face validity
is the recognitions of a test in terms of what it seeks to measure. For
example, the ability to recognized a mathematics test because of numbers, and
other mathematical signs and symbols such that mathematics test cannot be
confused to be English test or a chemistry test. If this is done, it means the
test have face validity.
CONTENT VALIDITY:
This is concerned with the extent to which the test items cover a given content
of a curriculum area. In establishing the content validity of a test, the
compiler of the test requires the use of test blue print or a table of
specification and more importantly, the consensus of expert judgment in the
area that is being tested.
PREDICTIVE
VALIDITY: A test possesses predictive validity if it is capable of predicting
the future outcome or performance of testees, for instance the University
Matriculation Examination (UME) has predictive validity if it is capable of
predicting the performance of students who eventually enter the university such
that their UME scores correlate with their academic performance during their
university programme.
CONCURRENT
VALIDITY: It is the extent to which two similar tests relate in terms of score
when given it to a group of students. I.e. the two form of similar test should
be given to the students within the same period. The score of the two tests are
correlated to obtain a coefficient which is significant; the first has a
concurrent validity with the second test. The first may be from the teacher who
taught the students previously while the second test is from the new teacher
who wants to verify their performance when the scores are related. It is an
evident of concurrent validity.
The
concurrent validity of test is established by correlating the scores of the
test that is being constructed with the scores of an already standardized test.
The higher the co-efficient of construction, the higher the level of concurrent
validity.
CONSTRUCT
VALIDITY: It is the extent of verifying the evident of psychological variable,
it include interest, emotion among others. The researcher who is investigating
the interest of students, may want to find out through the use of questionnaire
whether students have interest in a particular subject than when there is
evident student have interest, it shows it has construct validity. Base on the
use of questionnaire, most of the psychological variables are abstract.
Whenever we
wish to interpret test performance in terms of some psychological trait or
quality, we are concerned with construct. A construct is a psychological
quality that we assume exists in order to explain some aspect of behavior.
Mathematical reasoning is a const5ruct, and so are intelligence, creativity,
reading comprehension and such personality characteristics as sociability,
honesty and anxiety.
NATURE OF
VILIDITY
When using
the validity in relation to testing and evaluation, there are a number of
cautions to be done in mind;
1. Validity refers to the
appropriateness of the interpretation of the result of a test or evaluation
instrument for a given group of individuals and not to the instrument itself.
We sometimes speak of the “validity of a test” for the sake of convenience, but
it is more correct to speak of the validity of the interpretation to be made
from the result.
2. Validity is a matter of degree; It
does not exist on an all-or-none basis. Consequently, we should avoid thinking
of evaluation of results as valid or invalid. Validity is best considered in
terms of categories that specify degree, such as high validity, moderate
validity and low validity.
3. Validity is always specific to some
particular use. No test is valid for all purpose. For example, the result of an
arithmetic test may have a high degree of validity for indicating computational
skill, a low degree of validity for indicating arithmetical reasoning, a moderate
degree of validity for predicting success in future mathematics courses, and no
validity for predicting success in art or music. Thus when appraising or
describing validity, it is necessary to consider the use to be made of the
results. Evaluation results are never just valid; they have a different degree
of validity for each particular interpretation to be made.
4. Validity is unitary concept; The
conceptual nature of validity has typically been described for the testing
profession in a set of standards prepared by a joint committee made up of
member from three professional organization that are especially concerned with
educational and psychologically testing. Validity is viewed as unitary concept
based on various kinds of evidence. The three basic ways of accumulating
evidence to support the validity of an interpretation (content, criterion,
related and construct).
RELIABILITY
A
question that may be asked about a measuring instrument or a test is how
reliable is the test or instrument. In this case, we are not asking what it
measures, but how accurately it measures what it does measure. How accurately
will it be reproduced if we measure the testee again? Putting all the answers
to the above questions in summary form, one could define the reliability of a test
as the consistency of the test in measuring what it is meant to measure on
several occasions (murkherjee, 1978). Stanley and Hopkings (1972) define
reliability as the extent to which a test is consistent in measuring whatever
it does measuring, dependability, stability, relative freedom from error
measurement. Thorndike and Hgen (1977) see reliability of a test as the
accuracy or precision with which a measure based on one sample of test tasks at
a single point in time represent performance established on a variation of
sample of the same kind of tasks or a different point of time or both. They
further suggest that accuracy may be expressed by a reliability coefficient.
Reliability
refers to the consistency of test results. If we obtain quite similar scores
when the sample test is administered to the same group on two different
occasions, we can conclude that our results have a high degree of reliability
from one occasion to the other. Similarly, if different teachers independently
rate the same pupils on the same instrument (test) and obtain similar ratings,
we can conclude that the results have a high degree of reliability from one
rater to another.
There
are various ways of determining the reliability coefficient of a test. The most
important methods for establishing reliability as far as objective or multiple
– choice item achievements test are;
1. Split – half method (internal
consistency)
2. Test – retest (coefficient of
stability)
3. Parallel form(coefficient of
equivalent)
4. Cronbach’s Alpha (commonly use for instrument
or questionnaire with multiple respeonses)
5. Kuder – Richardson formular 20 &
21 (internal consistency)
Split – half method; It is the extent to which a single
test is administer to a group of testee, but however the testees are
dichotomize into odd and even numbers. The score of the students are correlated
using odd and even number, then a coefficient will be obtained, this
coefficient is called half reliability. To obtain a full reliability, a
spearman rank order formula is used to spear up reliability. This produces two
scores for each testee, which when correlated, produces a measure of internal
consistency.
As noted, the proceeding reliability coefficient is
determined by correlating the scores of two half – tests. To estimate the
scores reliability based on the full length test, the spearman rank formula is
usually applied.

1 + Reliability on ½ test
The
simplicity of the formula can be seen in the following example in which the correlation
coefficient between the test’s two halves is .60;


1 + .60 1.60
Parallel
form (coefficient of equivalent); It is the extent to which two similar test
are administered to testee in a given period, the scores from both test are
correlated to obtained a coefficient, this coefficient is called a measured of
equivalence. Or it can also be put in this way, if two or more parallel forms
of a test have been produced in such a way that it seems likely to scores on
these alternate forms will be equivalent and if each individual in the group is
given both forms of the test, then the correlation between scores on the two
forms produces on estimate of these reliability. (Tuckman, 1978).
Test – retest; The test – retest method is one of
the oldest and, at least at first glance, one of the most sensible methods of
estimating the reliability of test scores.
Test –
retest method is the extent to which a single test is administer twice on two
occasion with and individual.
No comments:
Post a Comment