众所周知,生存分析有:
1. 非参数法(寿命表法,乘积极限法即Kaplan-Meier法)
2. 半参数法(cox比例风险回归)
3. 参数法
前面两种都是很熟知的方法了,我就不赘述了。
翻看近几年的文献很多地方都提到了Parametric Survival Models,即参数法生存分析,这个我当时自学的时候确实没有找到很好的易学的资料,于是没办法,只能再啃一本英文教材,如下:
真的是非常好的自学材料,大幅提升了我对生存分析的认识!墙裂推荐!以下只能叫我的读书笔记。
1. Event:发生事件(死亡,复发,进展等),在status中记为1
2. Censoring:删失 don’t know survival time exactly,在status中记为0
右删失 right-censored 左删失 left-censored区间删失 interval-censored
总的来说,所有删失都是区间删失,左删失与右删失都是区间删失的特殊情况。
3. survivor function, denoted by S(t),生存函数
理论上的生存函数实际观测到的生存函数
4. hazard function, denoted by h(t),风险函数
a hazard function h(t) gives the instantaneous potential at time t for getting an event(like death or some disease of interest), given survival up to time t. Because of the given sign here, the hazard function is sometimes called a conditional failure rate. we obtain a probability per unit time, which is no longer a probability but a rate.
生存函数和风险函数是生存分析中最重要的两个函数,且他们之间的关系如下,只要知道其中一个就可以推导出另一个。
生存函数与风险函数的关系风险函数的性质
The key terms are the survivor function S(t) and the hazard function h(t), which are in essence opposed concepts, in that the survivor function focuses on surviving whereas the hazard function focuses on failing, given survival up to a certain time point. 生存函数关注生存,而风险函数关注failing,即事件(死亡,复发,进展等)
When the hazard function is constant, we say that the survival model is exponential. This term follows from the relationship between the survivor function and the hazard function. 当风险函数是常数时,我们说这个生存模型是指数型,这源于生存函数与风险函数的关系。
it may be used to identify a specific model form, such as an exponential, a Weibull, or a lognormal curve that fits one’s data; 风险函数可以定义特定的模型去适应不同的数据。
it is the vehicle by which mathematical modeling of survival data is carried out; that is, the survival model is usually written in terms of the hazard function. 风险函数是用来拟合生存数据的数学模型。
That is, the bold X represents a collection (sometimes called a “vector”) of predictor variables that is being modeled to predict an individual’s hazard. 粗体X指预测变量,是时间非依赖的(time-independent)。
It is possible, nevertheless, to consider X’s which do involve t. Such X’s are called time-dependent variables. If time-dependent variables are considered, the Cox model form may still be used, but such a model no longer satisfies the PH assumption, and is called the extended Cox model. 如果X是时间依赖的(time-dependent),此时PH假设不满足,称为扩展Cox模型(extended Cox model)
The PH assumption requires that the HR is constant over time, or equivalently, that the hazard for one individual is proportional to the hazard for any other individual, where the proportionality constant is independent of time. PH假设(The PH assumption)要求HR是恒定的,不随时间变化的,一个人的风险与任何其他人的风险成比例,其中比例常数是时间非依赖的(time-independent)。
The final expression for the hazard ratio therefore involves the estimated coefficients βi “hat” and the values of X* and X for each variable. However, because the baseline hazard has canceled out, the final expression does not involve time t. 想要得到HR,我们只需要估计系数βi,由于基线风险已在计算中被消除,所以无需估计基线风险,且HR与时间无关。
Cox模型应用广泛的一个关键原因是,即使没有指定基线风险,也可以在各种数据情况下获得对风险系数、相关风险比和调整后生存曲线的合理良好估计。也就是说Cox PH模型是一个“稳健”模型,因此使用Cox模型的结果将与使用正确的参数模型的结果非常接近。
Thus, when in doubt, as is typically the case, the Cox model will give reliable enough results so that it is a “safe” choice of model, and the user does not need to worry about whether the wrong parametric model is chosen. 当我们并不知道该用什么参数模型去拟合生存数据时,使用Cox模型总是一个“安全”的选择。
A parametric survival model is one in which survival time (the outcome) is assumed to follow a known distribution. 生存时间符合某一已知的分布。
Examples of distributions that are commonly used for survival time are:
The Cox proportional hazards model, by contrast, is not a fully parametric model. Rather it is a semi-parametric model because even if the regression parameters (the betas) are known, the distribution of the outcome remains unknown. The baseline survival (or hazard) function is not specified in a Cox model. Cox模型是半参数的,因为它只估计系数β,而对基线生存/风险函数的分布不做要求。
The underlying assumption for AFT models is that the effect of covariates is multiplicative (proportional) with respect to survival time, whereas for PH models the underlying assumption is that the effect of covariates is multiplicative with respect to the hazard. AFT模型的基本假设是,协变量的影响在生存时间上是成比例的,而对于PH模型,基本假设是协变量的作用在风险上是成比例的。
Many parametric models are acceleration failure time (AFT) models rather than PH models. The exponential and Weibull distributions can accommodate both the PH and AFT assumptions.
文中这里举了一个例子来说明AFT假设的含义。
假设人的寿命是狗的7倍,在AFT术语中,我们可以说狗在过去10年存活的概率等于人类在过去70年存活的概率。同样,我们可以说,狗存活6年的概率等于人类存活42年的概率。
Where γ is a constant called the acceleration factor comparing smokers to nonsmokers. In a regression framework, the acceleration factor γ could be parameterized as exp(α) where α is a parameter to be estimated from the data.
与HR不同的是,当HR>1时,暴露时一个危险因素;而当γ>1时,暴露是一个保护因素。比如,S1(t)=S2(γt),S1为对照组,S2为暴露组,γ=2,则对照组S1存活10年的概率等于暴露组S2存活20年的概率,故而暴露是一个保护因素。
the acceleration factor is a ratio of survival times corresponding to any fixed value of S(t). For AFT models, this ratio of survival times is assumed constant for all fixed values of S(t). 在AFT假设中,对于给定的一个生存概率,两组生存时间的比值是一个常数。
今天就到这里吧,看统计要看得头秃了。