DESPERADO: analisis faktor (sari saraswati

Factor analysis
1. Introduction
Factor analysis is a mathematical tool which can be used to examine a wide range of data sets. It has been used in disciplines as diverse as chemistry, sociology, economics, psychology and the analysis of the performance of race horses. This tutorial is designed to provide a basic understanding of the principles underlying factor analysis. The focus of the tutorial is the analysis of a 'factor space' or 'data space'. It was written to introduce the undergraduate chemistry major to the basic concept of a 'data space' and to demonstrate how factor analysis can be used to study a 'data space'. As an aid to conceptualization a geometric approach is used wherever possible and the actual linear algebra involved is illustrated.
http://www.chem.duke.edu/~clochmul/tutor1/factucmp.html
2.
Factor Analysis
Richard B. Darlington
Factor analysis includes both component analysis and common factor analysis. More than other statistical techniques, factor analysis has suffered from confusion concerning its very purpose. This affects my presentation in two ways. First, I devote a long section to describing what factor analysis does before examining in later sections how it does it. Second, I have decided to reverse the usual order of presentation. Component analysis is simpler, and most discussions present it first. However, I believe common factor analysis comes closer to solving the problems most researchers actually want to solve. Thus learning component analysis first may actually interfere with understanding what those problems are. Therefore component analysis is introduced only quite late in this chapter.
What Factor Analysis Can and Can't Do
I assume you have scores on a number of variables-- anywhere from 3 to several hundred variables, but most often between 10 and 100. Actually you need only the correlation or covariance matrix--not the actual scores. The purpose of factor analysis is to discover simple patterns in the pattern of relationships among the variables. In particular, it seeks to discover if the observed variables can be explained largely or entirely in terms of a much smaller number of variables called factors.
Some Examples of Factor-Analysis Problems
1. Factor analysis was invented nearly 100 years ago by psychologist Charles Spearman, who hypothesized that the enormous variety of tests of mental ability--measures of mathematical skill, vocabulary, other verbal skills, artistic skills, logical reasoning ability, etc.--could all be explained by one underlying "factor" of general intelligence that he called g. He hypothesized that if g could be measured and you could select a subpopulation of people with the same score on g, in that subpopulation you would find no correlations among any tests of mental ability. In other words, he hypothesized that g was the only factor common to all those measures.
It was an interesting idea, but it turned out to be wrong. Today the College Board testing service operates a system based on the idea that there are at least three important factors of mental ability--verbal, mathematical, and logical abilities--and most psychologists agree that many other factors could be identified as well.
2. Consider various measures of the activity of the autonomic nervous system--heart rate, blood pressure, etc. Psychologists have wanted to know whether, except for random fluctuation, all those measures move up and down together--the "activation" hypothesis. Or do groups of autonomic measures move up and down together, but separate from other groups? Or are all the measures largely independent? An unpublished analysis of mine found that in one data set, at any rate, the data fitted the activation hypothesis quite well.
3. Suppose many species of animal (rats, mice, birds, frogs, etc.) are trained that food will appear at a certain spot whenever a noise--any kind of noise--comes from that spot. You could then tell whether they could detect a particular sound by seeing whether they turn in that direction when the sound appears. Then if you studied many sounds and many species, you might want to know on how many different dimensions of hearing acuity the species vary. One hypothesis would be that they vary on just three dimensions--the ability to detect high-frequency sounds, ability to detect low-frequency sounds, and ability to detect intermediate sounds. On the other hand, species might differ in their auditory capabilities on more than just these three dimensions. For instance, some species might be better at detecting sharp click-like sounds while others are better at detecting continuous hiss-like sounds.
4. Suppose each of 500 people, who are all familiar with different kinds of automobiles, rates each of 20 automobile models on the question, "How much would you like to own that kind of automobile?" We could usefully ask about the number of dimensions on which the ratings differ. A one-factor theory would posit that people simply give the highest ratings to the most expensive models. A two-factor theory would posit that some people are most attracted to sporty models while others are most attracted to luxurious models. Three-factor and four-factor theories might add safety and reliability. Or instead of automobiles you might choose to study attitudes concerning foods, political policies, political candidates, or many other kinds of objects.
5. Rubenstein (1986) studied the nature of curiosity by analyzing the agreements of junior-high-school students with a large battery of statements such as "I like to figure out how machinery works" or "I like to try new kinds of food." A factor analysis identified seven factors: three measuring enjoyment of problem-solving, learning, and reading; three measuring interests in natural sciences, art and music, and new experiences in general; and one indicating a relatively low interest in money.
The Goal: Understanding of Causes
Many statistical methods are used to study the relation between independent and dependent variables. Factor analysis is different; it is used to study the patterns of relationship among many dependent variables, with the goal of discovering something about the nature of the independent variables that affect them, even though those independent variables were not measured directly. Thus answers obtained by factor analysis are necessarily more hypothetical and tentative than is true when independent variables are observed directly. The inferred independent variables are called factors. A typical factor analysis suggests answers to four major questions:
1. How many different factors are needed to explain the pattern of relationships among these variables?
2. What is the nature of those factors?
3. How well do the hypothesized factors explain the observed data?
4. How much purely random or unique variance does each observed variable include?
I illustrate these questions later.
Is Factor Analysis Objective?
The concept of heuristics is useful in understanding a property of factor analysis which confuses many people. Several scientists may apply factor analysis to similar or even identical sets of measures, and one may come up with 3 factors while another comes up with 6 and another comes up with 10. This lack of agreement has tended to discredit all uses of factor analysis. But if three travel writers wrote travel guides to the United States, and one divided the country into 3 regions, another into 6, and another into 10, would we say that they contradicted each other? Of course not; the various writers are just using convenient ways of organizing a topic, not claiming to represent the only correct way of doing so. Factor analysts reaching different conclusions contradict each other only if they all claim absolute theories, not heuristics. The fewer factors the simpler the theory; the more factors the better the theory fits the data. Different workers may make different choices in balancing simplicity against fit.
A similar balancing problem arises in regression and analysis of variance, but it generally doesn't prevent different workers from reaching nearly or exactly the same conclusions. After all, if two workers apply an analysis of variance to the same data, and both workers drop out the terms not significant at the .05 level, then both will report exactly the same effects. However, the situation in factor analysis is very different. For reasons explained later, there is no significance test in component analysis that will test a hypothesis about the number of factors, as that hypothesis is ordinarily understood. In common factor analysis there is such a test, but its usefulness is limited by the fact that it frequently yields more factors than can be satisfactorily interpreted. Thus a worker who wants to report only interpretable factors is still left without an objective test.
A similar issue arises in identifying the nature of the factors. Two workers may each identify 6 factors, but the two sets of factors may differ--perhaps substantially. The travel-writer analogy is useful here too; two writers might each divide the US into 6 regions, but define the regions very differently.
Another geographical analogy may be more parallel to factor analysis, since it involves computer programs designed to maximize some quantifiable objective. Computer programs are sometimes used to divide a state into congressional districts which are geographically continguous, nearly equal in population, and perhaps homogeneous on dimensions of ethnicity or other factors. Two different district-creating programs might come up with very different answers, though both answers are reasonable. This analogy is in a sense too good; we believe that factor analysis programs usually don't yield answers as different from each other as district-creating programs do.
Factor Analysis Versus Clustering and Multidimensional Scaling
Another challenge to factor analysis has come from the use of competing techniques such as cluster analysis and multidimensional scaling. While factor analysis is typically applied to a correlation matrix, those other methods can be applied to any sort of matrix of similarity measures, such as ratings of the similarity of faces. But unlike factor analysis, those methods cannot cope with certain unique properties of correlation matrices, such as reflections of variables. For instance, if you reflect or reverse the scoring direction of a measure of "introversion", so that high scores indicate "extroversion" instead of introversion, then you reverse the signs of all that variable's correlations: -.36 becomes +.36, +.42 becomes -.42, and so on. Such reflections would completely change the output of a cluster analysis or multidimensional scaling, while factor analysis would recognize the reflections for what they are; the reflections would change the signs of the "factor loadings" of any reflected variables, but would not change anything else in the factor analysis output.
Another advantage of factor analysis over these other methods is that factor analysis can recognize certain properties of correlations. For instance, if variables A and B each correlate .7 with variable C, and correlate .49 with each other, factor analysis can recognize that A and B correlate zero when C is held constant because .72 = .49. Multidimensional scaling and cluster analysis have no ability to recognize such relationships, since the correlations are treated merely as generic "similarity measures" rather than as correlations.
We are not saying these other methods should never be applied to correlation matrices; sometimes they yield insights not available through factor analysis. But they have definitely not made factor analysis obsolete. The next section touches on this point.
Factors "Differentiating" Variables Versus Factors "Underlying" Variables
When someone says casually that a set of variables seems to reflect "just one factor", there are several things they might mean that have nothing to do with factor analysis. If we word statements more carefully, it turns out that the phrase "just one factor differentiates these variables" can mean several different things, none of which corresponds to the factor analytic conclusion that "just one factor underlies these variables".
One possible meaning of the phrase about "differentiating" is that a set of variables all correlate highly with each other but differ in their means. A rather similar meaning can arise in a different case. Consider several tests A, B, C, D which test the same broadly-conceived mental ability, but which increase in difficulty in the order listed. Then the highest correlations among the tests may be between adjacent items in this list (rAB, rBC and rCD) while the lowest correlation is between items at the opposite ends of the list (rAD). Someone who observed this pattern in the correlations among the items might well say the tests "can be put in a simple order" or "differ in just one factor", but that conclusion has nothing to do with factor analysis. This set of tests would not contain just one common factor.
A third case of this sort may arise if variable A affects B, which affects C, which affects D, and those are the only effects linking these variables. Once again, the highest correlations would be rAB, rBC and rCD while the lowest correlation would be rAD. Someone might use the same phrases just quoted to describe this pattern of correlations; again it has nothing to do with factor analysis.
A fourth case is in a way a special case of all the previous cases: a perfect Guttman scale. A set of dichotomous items fits a Guttman scale if the items can be arranged so that a negative response to any item implies a negative response to all subsequent items while a positive response to any item implies a positive response to all previous items. For a trivial example consider the items
• Are you above 5 feet 2 inches in height?
• Are you above 5 feet 4 inches in height?
• Are you above 5 feet 6 inches in height?
• Etc.
To be consistent, a person answering negatively to any of these items must answer negatively to all later items, and a positive answer implies that all previous answers must be positive. For a nontrivial example consider the following questionnaire items:
• Should our nation lower tariff barriers with nation B?
• Should our two central banks issue a single currency?
• Should our armies become one?
• Should we fuse with nation B, becoming one nation?
If it turned out that these items formed a perfect Guttman scale, it would be easier to describe peoples' attitutes about "nation B" than if they didn't. When a set of items does form a Guttman scale, interestingly it does not imply that factor analysis would discover a single common factor. A Guttman scale implies that one factor differentiates a set of items (e.g, "favorableness toward cooperation with nation B"), not that one factor underlies those items.
Applying multidimensional scaling to a correlation matrix could discover all these simple patterns of differences among variables. Thus multidimensional scaling seeks factors which differentiate variables while factor analysis looks for the factors which underlie the variables. Scaling may sometimes find simplicity where factor analysis finds none, and factor analysis may find simplicity where scaling finds none.

Factor analysis is a statistical method used to describe variability among observed variables in terms of fewer unobserved variables called factors. The observed variables are modeled as linear combinations of the factors, plus "error" terms. The information gained about the interdependencies can be used later to reduce the set of variables in a dataset. Factor analysis originated in psychometrics, and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data.
Factor analysis is often confused with principal components analysis. The two methods are related, but distinct, though factor analysis becomes essentially equivalent to principal components analysis if the "errors" in the factor analysis model (see below) are assumed to all have the same variance.
http://en.wikipedia.org/wiki/Factor_analysis

DESPERADO

Senin, 09 Februari 2009

analisis faktor (sari saraswati_15407073)

Tidak ada komentar:

Posting Komentar

Pengikut

Arsip Blog

Mengenai Saya

Diskusi MAP 2