| Index: > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z |
|
|||||
| First Prev [ 1 2 3 ] Next Last |
This oversimplified example should not be taken to be realistic. Suppose a psychologist proposes a theory that there are two kinds of intelligence, which let us call "verbal intelligence" and "mathematical intelligence". Evidence for the theory is sought in the examination scores of 1000 students in each of 10 different academic fields. If a student is chosen randomly from a large population, then the student's 10 scores are random variables. The psychologist's theory may say that the average score in each of the 10 subjects for students with a particular level of "verbal intelligence" and a particular level of "mathematical intelligence" is a certain number times the level of "verbal intelligence" plus a certain number times the level of "mathematical intelligence", i.e., it is a linear combination of those two "factors". The numbers by which the two "intelligences" are multiplied are posited by the theory to be the same for all students, and are called "factor loadings". For example, the theory may hold that the average student's aptitude in the science of omphalology is
The numbers 10 and 6 would be the factor loadings associated with the field of omphalology. Other academic subjects would have factor loadings other than 10 and 6. Two students having identical degrees of verbal intelligence and identical degrees of mathematical intelligence would have different aptitudes in omphalology or any other subject because individual aptitudes differ from average aptitudes. That difference is the "error" — an unfortunate misnomer in statistics that means the amount by which an individual differs from what is average (see errors and residuals in statistics).
The observable data that go into factor analysis would be 10 scores of each of the 1000 students, a total of 10,000 numbers. The factor loadings and levels of the two kinds of intelligence of each student must be inferred from the data. Indeed, even the number of factors (two, in this example) must be inferred from the data.
In the example above, for i = 1, ..., 10,000 the ith student's scores are
where
In matrix notation, we have
where
Now observe that by doubling the scale on which "verbal intelligence"—the first component in each column of F—is measured, and simultaneously halving the factor loadings for verbal intelligence, makes no difference to the model. Thus, no generality is lost be assuming the standard deviation of verbal intelligence is 1. Likewise for mathematical intelligence. Moreover, for similar reasons, no generality is lost by assuming the two factors are uncorrelated with each other. The "errors" ε are taken to be independent of each other. The variances of the "errors" associated with the 10 different subjects are not assumed to be equal.
The values of the loadings L, the averages μ, and the variances of the "errors" ε must be estimated given the observed data X. [How this is done is a subject that must get addressed in this article, which remains "under construction".]