Topics: Linear Mixed Model, Random Effect, Facet Plot in R
LMMs, also called Hierarchical Linear Models, can be deemed a preparation for Generalized Linear Models.
Structures of data sets may imply that outcomes are correlated. For example, longitudinal observations i.e. responses from the same subject over time cannot be seen as independent observations. As a result, the independence assumption of ordinary linear regression fails.
Correlated outcomes provide less information than independent outcomes. However, Population elements are grouped into aggregates, and we often have information on both the individual elements and the aggregated groups.
For example, students are grouped by classes or teachers, then we'll obtain metrics at each level. Students represent level-one observational units, while teachers/classes represent level-two observational units. If we are interested in some response variables like test scores, we may want to examine the effects of both student characteristics like ethnicity and sex, and teacher features like teaching quality, etc.
Random effects: parameters that are treated as random variables in the regression model depending on differentiated groups. Random effects can be variables that were opportunistically measured whose variation needs to be accounted for between groups.包含在回歸方程中表示群體差異的隨機項。值得註意的是,這個分組變量隻是幫助我們更細致地刻畫解釋變量的組間差異,而變量本身未必是解釋觀測項Y的變量之一。比如,我們想要知道學生IQ和性別對學生成績是否有顯著影響,那麼通過年級或者通過科目分組可以更準確地刻畫IQ和性別與成績的關系,盡管我們並不將年級和科目作為學生成績的解釋變量。
Fixed effects: Parameters that are consistent among every group and used to explain the response variable Y.也就是回歸方程中固定的結構項,效應大小不因分組不同而改變。上述例子中的IQ和性別就是回歸中的結構項,結構項前面的系數成為固定效應。
Mixed-effects model: A model including both fixed and random effects as its parameters.
本文提到的固定和隨機效應與計量經濟學中的固定效應隨機效應模型略有不同。在計量中這兩種模型適用於考慮進時間維度的面板數據,而本文強調的是樣本數據的多層次的如“個體->小群體->大群體”的嵌套結構。
計量中的概念請見:
Note that linear mixed models are robust to violations of some of the assumptions.
實在忍不住吐槽一下robustness的翻譯2333魯棒性是個啥
假設3指的是組間獨立但組內觀測值可以存在相關性。還是學生成績的例子,我們要求學校之間要相互獨立,但同一個學校內的學生水平會因為學校的教研團隊質量、硬件設施、學風等因素而產生或多或少的聯系,所以不能確定是相互獨立的。
where boldsymbol{b} sim N(0,phi_{theta}) and epsilon sim N(0,gamma_{theta}) , X and Z represent model matrices for the fixed effects( boldsymbol{beta} ) and random effects( boldsymbol{b} ) respectively.
where mu_{ij} = boldsymbol{X_{ij}beta} + U_i.
U_i are random effects and X_{ij}beta are fixed effects as in the linear regression model. Note that given a random effect, the outcome variable Y_{ij} follows a normal distribution with a constant variance and means that depend on the choice of random effect.
推薦仔細閱讀 @包寒吳霜的大一統總結,其中多層線性模型HLM的內容與本文一致,但更加提綱挈領地建立起回歸體系的框架。重點關註隨機固定 斜率截距,random slope& random intercept是最主要的隨機效應的組成部分,本文不再贅述(但很關鍵!):
For hierarchical data, facet plots are good tools to show the relationships conditional on one or more variables based on the grouping units.
用於直觀比較組間水平的差異。
Think about the following thought experiment.
Our vocal pitch changes with different scenarios where we convey different degrees of politeness to the different target audience. We characterize such situations as formal(more polite) or informal(less polite) and want to know if this difference has any impact on our vocal pitch. 3 males and 3 females (labelled as F1,F2,F3,M3,M4,M7) participate in this experiment. They speak in both types of situations(7 each) and their vocal frequencies in different situations are conducted.
In this case, the independence assumption fails since frequencies conducted from the same person in different situations are definitely correlated.
The data is grouped by subject, not just by gender. Even though females usually have higher vocal pitches than males, there is still individual variation within the gender groups.
References:
'Beyond Multiple Linear Regression' Chapter 7
Bolton (2020), STA303 Lecture Week 5 [Linear Mixed Models], University of Toronto
下一篇