Multivariate Analysis
Profile Analysis


Consider the graph below which shows the profiles for two groups. The question that we try to answer in profile analysis is whether or not the profiles are equivalent, i.e., are the profiles piecewise parallel. Then, if it is true that the profiles are parallel, are there group differences, i.e., is the separation of the profiles significant. Finally, if it is true that the profiles are parallel, are the profiles flat, i.e., are values of the dependent variables the same. Please note that when doing profile analysis you need the different dependent variables to be commensurable, i.e., expressed in comparable units.

Contrast the graph above with the one below in which the profiles are clearly not the same.

Profile analysis is really just a kind of repeated measures mixed models analysis. There are some tricks to doing the analysis such that we can test the piecewise parallelness. Here is a summary of the various test that are performed in a profile analysis.

By using different transformations of the dependent variables we can test hypotheses about trend across the dependent variables. Here are the transformation matrices using coefficients for orthogonal polynomials for linear (o1) and nonlinear (o2) effects.
                  o1  =  -3  -1   1   3
            
            
                  o2  =   1  -1  -1   1
                         -1   3  -3   1
Here is an example of a simple profile analysis with four groups and three dependent variables.
input id y1 y2 y3 grp 
    1   19   20   18     1 
    2   20   21   19     1 
    3   19   22   22     1 
    4   18   19   21     1 
    5   16   18   20     1 
    6   17   22   19     1 
    7   20   19   20     1 
    8   15   19   19     1 
    9   12   14   12     2 
   10   15   15   17     2 
   11   15   17   15     2 
   12   13   14   14     2 
   13   14   16   13     2 
   14   15   14   17     3 
   15   13   14   15     3 
   16   12   15   15     3 
   17   12   13   13     3 
   18    8    9   10     4 
   19   10   10   12     4 
   20   11   10   10     4 
   21   11    7   12     4 
end

tabstat y1 y2 y3, by(grp)

Summary statistics: mean
  by categories of: grp 

     grp |        y1        y2        y3
---------+------------------------------
       1 |        18        20     19.75
       2 |      13.8      15.2      14.2
       3 |        13        14        15
       4 |        10         9        11
---------+------------------------------
   Total |  14.52381  15.61905  15.85714
----------------------------------------

/* profile plot by individual -- findit profileplot */

profileplot y1 y2 y3, by(id) legend(off) nosymbols ytitle(score)



/* preliminary one-way manova */
manova y1 y2 y3 = grp

                           Number of obs =      21

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.0479      3     9.0    36.7    10.12 0.0000 a
                         | P   1.1609            9.0    51.0     3.58 0.0016 a
                         | L  15.6417            9.0    41.0    23.75 0.0000 a
                         | R  15.3753            3.0    17.0    87.13 0.0000 u
                         |--------------------------------------------------
                Residual |                17
              -----------+--------------------------------------------------
                   Total |                20
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of parallelism */
mat c1 = (1,-1,0\0,1,-1)
manovatest grp, ytrans(c1)

 Transformations of the dependent variables
 (1)    y1 - y2
 (2)    y2 - y3

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.5633      3     6.0    32.0     1.77 0.1364 e
                         | P   0.4873            6.0    34.0     1.83 0.1234 a
                         | L   0.6853            6.0    30.0     1.71 0.1522 a
                         | R   0.5088            3.0    17.0     2.88 0.0662 u
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of levels */
mat c2 = (1,1,1)
manovatest grp, ytrans(c2)

 Transformation of the dependent variables
 (1)    y1 + y2 + y3

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.0740      3     3.0    17.0    70.93 0.0000 e
                         | P   0.9260            3.0    17.0    70.93 0.0000 e
                         | L  12.5165            3.0    17.0    70.93 0.0000 e
                         | R  12.5165            3.0    17.0    70.93 0.0000 e
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on

/* test of flatness */
mat xm = (1,0,0,0,0)        /* used to select the constant only */
manovatest, test(xm) ytrans(c1)

 Transformations of the dependent variables
 (1)    y1 - y2
 (2)    y2 - y3

 Test constraint
 (1)    _cons = 0

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
              manovatest | W   0.7927      1     2.0    16.0     2.09 0.1559 e
                         | P   0.2073            2.0    16.0     2.09 0.1559 e
                         | L   0.2615            2.0    16.0     2.09 0.1559 e
                         | R   0.2615            2.0    16.0     2.09 0.1559 e
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F


/* orthogonal polynomials */

/* linear */

mat o1 = (-1,0,1)
manovatest grp, ytrans(o1)

 Transformation of the dependent variables
 (1)    - y1 + y3

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.8487      3     3.0    17.0     1.01 0.4126 e
                         | P   0.1513            3.0    17.0     1.01 0.4126 e
                         | L   0.1782            3.0    17.0     1.01 0.4126 e
                         | R   0.1782            3.0    17.0     1.01 0.4126 e
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F
/* quadratic */

mat o2 = (1,-2,1)
manovatest grp, ytrans(o2)

 Transformation of the dependent variables
 (1)    y1 - 2 y2 + y3

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                     grp | W   0.6678      3     3.0    17.0     2.82 0.0702 e
                         | P   0.3322            3.0    17.0     2.82 0.0702 e
                         | L   0.4974            3.0    17.0     2.82 0.0702 e
                         | R   0.4974            3.0    17.0     2.82 0.0702 e
                         |--------------------------------------------------
                Residual |                17
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F


/* profile plot by groups  --  findit profileplot */

profileplot y1 y2 y3, by(grp) xtitle(Variable) ytitle(Value)


Example 2

Here is an example in which there is no parallelism. It is based on the hsb2 dataset.
use hsb2, clear

/* preliminary one-way manova */
manova read write math science socst = prog

                           Number of obs =     200

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                    prog | W   0.6682      2    10.0   386.0     8.62 0.0000 e
                         | P   0.3413           10.0   388.0     7.98 0.0000 a
                         | L   0.4826           10.0   384.0     9.27 0.0000 a
                         | R   0.4514            5.0   194.0    17.51 0.0000 u
                         |--------------------------------------------------
                Residual |               197
              -----------+--------------------------------------------------
                   Total |               199
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of parallelism */
mat c1 = (1,-1,0,0,0\0,1,-1,0,0\0,0,1,-1,0\0,0,0,1,-1)

manovatest prog, ytrans(c1)

 Transformations of the dependent variables
 (1)    read - write
 (2)    write - math
 (3)    math - science
 (4)    science - socst

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                    prog | W   0.8983      2     8.0   388.0     2.67 0.0072 e
                         | P   0.1025            8.0   390.0     2.63 0.0081 a
                         | L   0.1125            8.0   386.0     2.71 0.0064 a
                         | R   0.1047            4.0   195.0     5.10 0.0006 u
                         |--------------------------------------------------
                Residual |               197
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* profile plot -- findit profileplot */

profileplot read write math science socst, by(prog) xtitle(Variable) ytitle(Mean Score)


The lack of parallelism is most likely because the top line goes down and then up for the last two variables while in the bottom two lines the pattern is up then down.

Depression Example

In a study of post-partum depression, two groups of women were studied over a four month period. The two groups were the placebo treatment group and the estrogen treatment group.
use http://www.philender.com/courses/data/deprofile, clear

tabstat dep1 dep2 dep3 dep4, by(group)

Summary statistics: mean
  by categories of: group 

   group |      dep1      dep2      dep3      dep4
---------+----------------------------------------
 placebo |  16.58824  14.97294  14.12882  12.27471
estrogen |   12.9825  11.17286  8.924643  8.827857
---------+----------------------------------------
   Total |  14.34467  12.60844  10.89067     10.13
--------------------------------------------------

/* profile plot by individual */

reshape long dep, i(subj) j(v)
(note: j = 1 2 3 4)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       45   ->     180
Number of variables                   6   ->       4
j variable (4 values)                     ->   v
xij variables:
                     dep1 dep2 ... dep4   ->   dep
-----------------------------------------------------------------------------

/* placebo group */

twoway line dep v if group==0



/* estrogen group */

twoway line dep v if group==1



reshape wide
(note: j = 1 2 3 4)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                      180   ->      45
Number of variables                   4   ->       6
j variable (4 values)                 v   ->   (dropped)
xij variables:
                                    dep   ->   dep1 dep2 ... dep4
-----------------------------------------------------------------------------

/* preliminary one-way manova */
manova dep1 dep2 dep3 dep4 = group

                           Number of obs =      45

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                   group | W   0.7623      1     4.0    40.0     3.12 0.0253 e
                         | P   0.2377            4.0    40.0     3.12 0.0253 e
                         | L   0.3117            4.0    40.0     3.12 0.0253 e
                         | R   0.3117            4.0    40.0     3.12 0.0253 e
                         |--------------------------------------------------
                Residual |                43
              -----------+--------------------------------------------------
                   Total |                44
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of parallelism */
mat c1 = (1,-1,0,0\0,1,-1,0\0,0,1,-1)
manovatest group, ytrans(c1)

 Transformations of the dependent variables
 (1)    dep1 - dep2
 (2)    dep2 - dep3
 (3)    dep3 - dep4

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                   group | W   0.9095      1     3.0    41.0     1.36 0.2684 e
                         | P   0.0905            3.0    41.0     1.36 0.2684 e
                         | L   0.0995            3.0    41.0     1.36 0.2684 e
                         | R   0.0995            3.0    41.0     1.36 0.2684 e
                         |--------------------------------------------------
                Residual |                43
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of levels */
mat c2 = (1,1,1,1)
manovatest group, ytrans(c2)

 Transformation of the dependent variables
 (1)    dep1 + dep2 + dep3 + dep4

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                   group | W   0.8448      1     1.0    43.0     7.90 0.0074 e
                         | P   0.1552            1.0    43.0     7.90 0.0074 e
                         | L   0.1837            1.0    43.0     7.90 0.0074 e
                         | R   0.1837            1.0    43.0     7.90 0.0074 e
                         |--------------------------------------------------
                Residual |                43
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

/* test of flatness */
mat xm = (1,0,0)        /* used to select the constant only */
manovatest, test(xm) ytrans(c1)

 Transformations of the dependent variables
 (1)    dep1 - dep2
 (2)    dep2 - dep3
 (3)    dep3 - dep4

 Test constraint
 (1)    _cons = 0

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
              manovatest | W   0.6104      1     3.0    41.0     8.72 0.0001 e
                         | P   0.3896            3.0    41.0     8.72 0.0001 e
                         | L   0.6383            3.0    41.0     8.72 0.0001 e
                         | R   0.6383            3.0    41.0     8.72 0.0001 e
                         |--------------------------------------------------
                Residual |                43
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F


/* profile plot by groups */

tabstat dep1 dep2 dep3 dep4, by(group)

Summary statistics: mean
  by categories of: group 

   group |      dep1      dep2      dep3      dep4
---------+----------------------------------------
 placebo |  16.58824  14.97294  14.12882  12.27471
estrogen |   12.9825  11.17286  8.924643  8.827857
---------+----------------------------------------
   Total |  14.34467  12.60844  10.89067     10.13
--------------------------------------------------

/* findit profileplot */

profileplot dep1 dep2 dep3 dep4, by(group) xtitle(Month) ytitle(Depression)


Multivariate Course Page

Phil Ender, 1nov07, 3dec04