Linear Mixed Models(LMMs)線性混合模型

Topics: Linear Mixed Model, Random Effect, Facet Plot in R


Independence... Or Not?

LMMs, also called Hierarchical Linear Models, can be deemed a preparation for Generalized Linear Models.

Structures of data sets may imply that outcomes are correlated. For example, longitudinal observations i.e. responses from the same subject over time cannot be seen as independent observations. As a result, the independence assumption of ordinary linear regression fails.

Correlated outcomes provide less information than independent outcomes. However, Population elements are grouped into aggregates, and we often have information on both the individual elements and the aggregated groups.

For example, students are grouped by classes or teachers, then we'll obtain metrics at each level. Students represent level-one observational units, while teachers/classes represent level-two observational units. If we are interested in some response variables like test scores, we may want to examine the effects of both student characteristics like ethnicity and sex, and teacher features like teaching quality, etc.

Random Effect and Fixed Effect

Random effects: parameters that are treated as random variables in the regression model depending on differentiated groups. Random effects can be variables that were opportunistically measured whose variation needs to be accounted for between groups.包含在回歸方程中表示群體差異的隨機項。值得註意的是,這個分組變量隻是幫助我們更細致地刻畫解釋變量的組間差異,而變量本身未必是解釋觀測項Y的變量之一。比如,我們想要知道學生IQ和性別對學生成績是否有顯著影響,那麼通過年級或者通過科目分組可以更準確地刻畫IQ和性別與成績的關系,盡管我們並不將年級和科目作為學生成績的解釋變量。

Fixed effects: Parameters that are consistent among every group and used to explain the response variable Y.也就是回歸方程中固定的結構項,效應大小不因分組不同而改變。上述例子中的IQ和性別就是回歸中的結構項,結構項前面的系數成為固定效應。

Mixed-effects model: A model including both fixed and random effects as its parameters.

本文提到的固定和隨機效應與計量經濟學中的固定效應隨機效應模型略有不同。在計量中這兩種模型適用於考慮進時間維度的面板數據,而本文強調的是樣本數據的多層次的如“個體->小群體->大群體”的嵌套結構。

計量中的概念請見:

Assumptions of LMMs:

  1. The response variable is continuous.
  2. Both the random effects and within-unit residual errors follow normal distributions with constant variance.
  3. Groups are independent but observations within each group are taken not to be.

Note that linear mixed models are robust to violations of some of the assumptions.

實在忍不住吐槽一下robustness的翻譯2333魯棒性是個啥

假設3指的是組間獨立但組內觀測值可以存在相關性。還是學生成績的例子,我們要求學校之間要相互獨立,但同一個學校內的學生水平會因為學校的教研團隊質量、硬件設施、學風等因素而產生或多或少的聯系,所以不能確定是相互獨立的。

Formulations of LMMs

  • LMMs can be expressed in matrix form: boldsymbol{y = Xbeta + Zb} + epsilon

where boldsymbol{b} sim N(0,phi_{theta}) and epsilon sim N(0,gamma_{theta}) , X and Z represent model matrices for the fixed effects( boldsymbol{beta} ) and random effects( boldsymbol{b} ) respectively.

  • Another expression: Y_{ij}|U_i sim N(mu_{ij}, tau^2)

where mu_{ij} = boldsymbol{X_{ij}beta} + U_i.

U_i are random effects and X_{ij}beta are fixed effects as in the linear regression model. Note that given a random effect, the outcome variable Y_{ij} follows a normal distribution with a constant variance and means that depend on the choice of random effect.

推薦仔細閱讀 @包寒吳霜的大一統總結,其中多層線性模型HLM的內容與本文一致,但更加提綱挈領地建立起回歸體系的框架。重點關註隨機固定 斜率截距,random slope& random intercept是最主要的隨機效應的組成部分,本文不再贅述(但很關鍵!):

A small case study

For hierarchical data, facet plots are good tools to show the relationships conditional on one or more variables based on the grouping units.

用於直觀比較組間水平的差異。

Think about the following thought experiment.

Our vocal pitch changes with different scenarios where we convey different degrees of politeness to the different target audience. We characterize such situations as formal(more polite) or informal(less polite) and want to know if this difference has any impact on our vocal pitch. 3 males and 3 females (labelled as F1,F2,F3,M3,M4,M7) participate in this experiment. They speak in both types of situations(7 each) and their vocal frequencies in different situations are conducted.

In this case, the independence assumption fails since frequencies conducted from the same person in different situations are definitely correlated.

The data is grouped by subject, not just by gender. Even though females usually have higher vocal pitches than males, there is still individual variation within the gender groups.

References:

'Beyond Multiple Linear Regression' Chapter 7

Bolton (2020), STA303 Lecture Week 5 [Linear Mixed Models], University of Toronto

发表回复

相关推荐

“月光族”两大类型,你属于哪一类?

根据2017年关于白领一份最新调查结果显示:目前白领人群 中,有75%属于“月光族”,非常可怕的一个调查结果,事实上, 月光族 ...

· 6秒前

《佳木斯大學學報(自然科學版)》省級,知網,雙月刊

  《佳木斯大學學報(自然科學版)》雜志創刊於1983年,是由國傢科技部、國傢新聞出版署批準出刊,黑龍江省教育廳主管、佳...

· 1分钟前

PPT专用|免费高清、无水印、无版权的图片网站推荐

在正式开始介绍网站之前,先放上这几天整理了一些常用的PPT背景图。

· 2分钟前

如何寫讀後感?

訓練目標:1、學習用“引、議、聯、結”四字法寫讀後感。2、培養學生邊讀邊思考,讀有所獲的習慣。3、學會從書中獲得人生的啟迪...

· 2分钟前

2022年下半年中小学教师资格考试时间确定!

是的是的,2022年下半年的教资考试又来了!

· 4分钟前