Profile Analysis

Multivariate Analysis
Profile Analysis

Consider the graph below which shows the profiles for two groups. The question that we try to answer in profile analysis is whether or not the profiles are equivalent, i.e., are the profiles piecewise parallel. Then, if it is true that the profiles are parallel, are there group differences, i.e., is the separation of the profiles significant. Finally, if it is true that the profiles are parallel, are the profiles flat, i.e., are values of the dependent variables the same. Please note that when doing profile analysis you need the different dependent variables to be commensurable, i.e., expressed in comparable units.

Contrast the graph above with the one below in which the profiles are clearly not the same.

Profile analysis is really just a kind of repeated measures mixed models analysis. There are some tricks to doing the analysis such that we can test the piecewise parallelness. Here is a summary of the various test that are performed in a profile analysis.

We begin by testing for parallelism. If the profiles are not piecewise parallel then the remaining tests do not make a whole lot of sense.

We begin with a standard one-way manova followed by a test that use a transformation of the dependent variables defined by the transformation matrix c1.

                    1  -1   0   0
             c1  =  0   1  -1   0
                    0   0   1  -1

2. Test of Levels (Group differences)

If we have parallelism than we can test whether there is a difference in the groups. We do this with another transformation matrix, c2.

             c1  =  1   1   1   1

3. Test of Flatness

Lastly, we can test whether the profiles are flat by using the c1 transformation against the intercept term.

By using different transformations of the dependent variables we can test hypotheses about trend across the dependent variables. Here are the transformation matrices using coefficients for orthogonal polynomials for linear (o1) and nonlinear (o2) effects.

                  o1  =  -3  -1   1   3
            
            
                  o2  =   1  -1  -1   1
                         -1   3  -3   1

Here is an example of a simple profile analysis with four groups and three dependent variables.

input id y1 y2 y3 grp 
    1   19   20   18     1 
    2   20   21   19     1 
    3   19   22   22     1 
    4   18   19   21     1 
    5   16   18   20     1 
    6   17   22   19     1 
    7   20   19   20     1 
    8   15   19   19     1 
    9   12   14   12     2 
   10   15   15   17     2 
   11   15   17   15     2 
   12   13   14   14     2 
   13   14   16   13     2 
   14   15   14   17     3 
   15   13   14   15     3 
   16   12   15   15     3 
   17   12   13   13     3 
   18    8    9   10     4 
   19   10   10   12     4 
   20   11   10   10     4 
   21   11    7   12     4 
end

tabstat y1 y2 y3, by(grp)

Summary statistics: mean
  by categories of: grp 

     grp |        y1        y2        y3
---------+------------------------------
       1 |        18        20     19.75
       2 |      13.8      15.2      14.2
       3 |        13        14        15
       4 |        10         9        11
---------+------------------------------
   Total |  14.52381  15.61905  15.85714
----------------------------------------

/* profile plot by individual -- findit profileplot */

profileplot y1 y2 y3, by(id) legend(off) nosymbols ytitle(score)



/* preliminary one-way manova */
manova y1 y2 y3 = grp

                           Number of obs =      21

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.0479      3     9.0    36.7    10.12 0.0000 a
                         | P   1.1609            9.0    51.0     3.58 0.0016 a
                         | L  15.6417            9.0    41.0    23.75 0.0000 a
                         | R  15.3753            3.0    17.0    87.13 0.0000 u
                         |--------------------------------------------------
                Residual |                17
              -----------+--------------------------------------------------
                   Total |                20
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of parallelism */
mat c1 = (1,-1,0\0,1,-1)
manovatest grp, ytrans(c1)

 Transformations of the dependent variables
 (1)    y1 - y2
 (2)    y2 - y3

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.5633      3     6.0    32.0     1.77 0.1364 e
                         | P   0.4873            6.0    34.0     1.83 0.1234 a
                         | L   0.6853            6.0    30.0     1.71 0.1522 a
                         | R   0.5088            3.0    17.0     2.88 0.0662 u
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of levels */
mat c2 = (1,1,1)
manovatest grp, ytrans(c2)

 Transformation of the dependent variables
 (1)    y1 + y2 + y3

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.0740      3     3.0    17.0    70.93 0.0000 e
                         | P   0.9260            3.0    17.0    70.93 0.0000 e
                         | L  12.5165            3.0    17.0    70.93 0.0000 e
                         | R  12.5165            3.0    17.0    70.93 0.0000 e
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on

/* test of flatness */
mat xm = (1,0,0,0,0)        /* used to select the constant only */
manovatest, test(xm) ytrans(c1)

 Transformations of the dependent variables
 (1)    y1 - y2
 (2)    y2 - y3

 Test constraint
 (1)    _cons = 0

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
              manovatest | W   0.7927      1     2.0    16.0     2.09 0.1559 e
                         | P   0.2073            2.0    16.0     2.09 0.1559 e
                         | L   0.2615            2.0    16.0     2.09 0.1559 e
                         | R   0.2615            2.0    16.0     2.09 0.1559 e
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F


/* orthogonal polynomials */

/* linear */

mat o1 = (-1,0,1)
manovatest grp, ytrans(o1)

 Transformation of the dependent variables
 (1)    - y1 + y3

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.8487      3     3.0    17.0     1.01 0.4126 e
                         | P   0.1513            3.0    17.0     1.01 0.4126 e
                         | L   0.1782            3.0    17.0     1.01 0.4126 e
                         | R   0.1782            3.0    17.0     1.01 0.4126 e
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F
/* quadratic */

mat o2 = (1,-2,1)
manovatest grp, ytrans(o2)

 Transformation of the dependent variables
 (1)    y1 - 2 y2 + y3

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.6678      3     3.0    17.0     2.82 0.0702 e
                         | P   0.3322            3.0    17.0     2.82 0.0702 e
                         | L   0.4974            3.0    17.0     2.82 0.0702 e
                         | R   0.4974            3.0    17.0     2.82 0.0702 e
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F


/* profile plot by groups  --  findit profileplot */

profileplot y1 y2 y3, by(grp) xtitle(Variable) ytitle(Value)

Example 2

Here is an example in which there is no parallelism. It is based on the hsb2 dataset.

use hsb2, clear

/* preliminary one-way manova */
manova read write math science socst = prog

                           Number of obs =     200

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                    prog | W   0.6682      2    10.0   386.0     8.62 0.0000 e
                         | P   0.3413           10.0   388.0     7.98 0.0000 a
                         | L   0.4826           10.0   384.0     9.27 0.0000 a
                         | R   0.4514            5.0   194.0    17.51 0.0000 u
                         |--------------------------------------------------
                Residual |               197
              -----------+--------------------------------------------------
                   Total |               199
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of parallelism */
mat c1 = (1,-1,0,0,0\0,1,-1,0,0\0,0,1,-1,0\0,0,0,1,-1)

manovatest prog, ytrans(c1)

 Transformations of the dependent variables
 (1)    read - write
 (2)    write - math
 (3)    math - science
 (4)    science - socst

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                    prog | W   0.8983      2     8.0   388.0     2.67 0.0072 e
                         | P   0.1025            8.0   390.0     2.63 0.0081 a
                         | L   0.1125            8.0   386.0     2.71 0.0064 a
                         | R   0.1047            4.0   195.0     5.10 0.0006 u
                         |--------------------------------------------------
                Residual |               197
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* profile plot -- findit profileplot */

profileplot read write math science socst, by(prog) xtitle(Variable) ytitle(Mean Score)

The lack of parallelism is most likely because the top line goes down and then up for the last two variables while in the bottom two lines the pattern is up then down.
Depression Example
In a study of post-partum depression, two groups of women were studied over a four month period. The two groups were the placebo treatment group and the estrogen treatment group.
use http://www.philender.com/courses/data/deprofile, clear tabstat dep1 dep2 dep3 dep4, by(group) Summary statistics: mean by categories of: group group | dep1 dep2 dep3 dep4 ---------+---------------------------------------- placebo | 16.58824 14.97294 14.12882 12.27471 estrogen | 12.9825 11.17286 8.924643 8.827857 ---------+---------------------------------------- Total | 14.34467 12.60844 10.89067 10.13 -------------------------------------------------- /* profile plot by individual */ reshape long dep, i(subj) j(v) (note: j = 1 2 3 4) Data wide -> long ----------------------------------------------------------------------------- Number of obs. 45 -> 180 Number of variables 6 -> 4 j variable (4 values) -> v xij variables: dep1 dep2 ... dep4 -> dep ----------------------------------------------------------------------------- /* placebo group */ twoway line dep v if group==0 /* estrogen group */ twoway line dep v if group==1 reshape wide (note: j = 1 2 3 4) Data long -> wide ----------------------------------------------------------------------------- Number of obs. 180 -> 45 Number of variables 4 -> 6 j variable (4 values) v -> (dropped) xij variables: dep -> dep1 dep2 ... dep4 ----------------------------------------------------------------------------- /* preliminary one-way manova */ manova dep1 dep2 dep3 dep4 = group Number of obs = 45 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+-------------------------------------------------- group | W 0.7623 1 4.0 40.0 3.12 0.0253 e | P 0.2377 4.0 40.0 3.12 0.0253 e | L 0.3117 4.0 40.0 3.12 0.0253 e | R 0.3117 4.0 40.0 3.12 0.0253 e |-------------------------------------------------- Residual | 43 -----------+-------------------------------------------------- Total | 44 -------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F /* test of parallelism */ mat c1 = (1,-1,0,0\0,1,-1,0\0,0,1,-1) manovatest group, ytrans(c1) Transformations of the dependent variables (1) dep1 - dep2 (2) dep2 - dep3 (3) dep3 - dep4 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+-------------------------------------------------- group | W 0.9095 1 3.0 41.0 1.36 0.2684 e | P 0.0905 3.0 41.0 1.36 0.2684 e | L 0.0995 3.0 41.0 1.36 0.2684 e | R 0.0995 3.0 41.0 1.36 0.2684 e |-------------------------------------------------- Residual | 43 -------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F /* test of levels */ mat c2 = (1,1,1,1) manovatest group, ytrans(c2) Transformation of the dependent variables (1) dep1 + dep2 + dep3 + dep4 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+-------------------------------------------------- group | W 0.8448 1 1.0 43.0 7.90 0.0074 e | P 0.1552 1.0 43.0 7.90 0.0074 e | L 0.1837 1.0 43.0 7.90 0.0074 e | R 0.1837 1.0 43.0 7.90 0.0074 e |-------------------------------------------------- Residual | 43 -------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F /* test of flatness */ mat xm = (1,0,0) /* used to select the constant only */ manovatest, test(xm) ytrans(c1) Transformations of the dependent variables (1) dep1 - dep2 (2) dep2 - dep3 (3) dep3 - dep4 Test constraint (1) _cons = 0 W = Wilks' lambda L = Lawley-Hotelling trace P = Pillai's trace R = Roy's largest root Source | Statistic df F(df1, df2) = F Prob>F -----------+-------------------------------------------------- manovatest | W 0.6104 1 3.0 41.0 8.72 0.0001 e | P 0.3896 3.0 41.0 8.72 0.0001 e | L 0.6383 3.0 41.0 8.72 0.0001 e | R 0.6383 3.0 41.0 8.72 0.0001 e |-------------------------------------------------- Residual | 43 -------------------------------------------------------------- e = exact, a = approximate, u = upper bound on F /* profile plot by groups */ tabstat dep1 dep2 dep3 dep4, by(group) Summary statistics: mean by categories of: group group | dep1 dep2 dep3 dep4 ---------+---------------------------------------- placebo | 16.58824 14.97294 14.12882 12.27471 estrogen | 12.9825 11.17286 8.924643 8.827857 ---------+---------------------------------------- Total | 14.34467 12.60844 10.89067 10.13 -------------------------------------------------- /* findit profileplot */ profileplot dep1 dep2 dep3 dep4, by(group) xtitle(Month) ytitle(Depression)

Multivariate Course Page

Phil Ender, 1nov07, 3dec04

Multivariate Analysis Profile Analysis

Example 2

Depression Example

Multivariate Analysis
Profile Analysis