第 7 章內生性問題

內生性(Endogeneity)問題的文獻龐大，涉及的因果分析議題眾多。文獻投入相當多的是政策效果這一塊研究，也就是「處置效果(treatment effect)」，相關內容請參考許育進 and 賴宗志 (2018) 的文獻討論。本章把中心放在傳統的 Panel Data 內生問題，涉及panel data但屬於政策效果的倍差法(difference-in-difference)，本章也不涉及。因為這類的議題，應該值得專門寫一本書來討論文獻和作法。

7.1 原理：何謂內生性？

用最簡單的話來說，內生性就是被解釋變數和解釋變數之間，產生雙向關係。例如，所得影響消費，消費也影響所得。更簡單說，就是「雞生蛋，蛋生雞」的問題。由下面方程式Eq.(7.1)解釋。

\[ \tag{7.1} y = a+b_{1}X_{1}+b_{2}X_{2}+e \]
因為殘差 \(e\) 是 \(y\) 的一部份，因此，計量學者定義內生性變數為：

與殘差項相關的變數，稱為內生變數

解釋變數內出現內生變數，就是此模型有內生性。當內生性問題存在時，傳統的估計方法LS、GLS和WLS等估計法都不再是不偏的估計。

解釋變數 \(X\) 和殘差 \(e\) 為相關的原因，簡單說有兩項：
(1) 解釋變數本生是內生的。也就是說，\(X\) 也被 \(y\) 決定。
(2) 解釋變數有衡量誤差(measurement errors)。

類似Q轉換的移除平均數功能，傳統迴歸的解決辦法之一為使用兩階段工具變數法(2-Stage LS: 2SLS)，將內生性移除。兩階段工具變數法:
已知迴歸矩陣式 \(y=Xb+e\) ，其LS估計式為 \(b_{LS}=(X'X)^{-1}X'y\)

令Z為工具變數(Instruments)矩陣，則2SLS的估計式

\[ b_{2SLS}={{({X}'Z{{({Z}'Z)}^{-1}}{Z}'X)}^{-1}}{X}'Z{{({Z}'Z)}^{-1}}{Z}'y \]

\(b_{2SLS}\) 的變異數則為

\[ Var({{b}_{2SLS}})={{S}^{2}}{{({X}'Z{{({Z}'Z)}^{-1}}{Z}'X)}^{-1}} \] \({S}^{2}\) 是樣本變異數估計值。
2SLS的注意事項

Order condition: 工具變數的個數不可少於要估計參數的個數。
2SLS在一般單純時間序列、橫斷面資料迴歸和Panel Data的用法皆相同。
除了宣告工具變數之外，其餘和前述之Panel用法皆相同。

在panel data，依照2SLS的架構， Baltagi (2006) 提出的 EC2SLS 是相當常用的估計方法，此外還有 Hausman and Taylor (1981) 兩個估計式。接下來我們依照內生性的種類，介紹這兩種估計方式。

7.2 誤差成分2SLS(EC-2SLS)

我們從一個panel data開始，如Eqs.(7.2)-(7.4)
\[ \tag{7.2} {y_{it}}=a+{b_1}{x_{1,it}}+{b_2}{x_{2,it}}+{{\mu }_{i}}+{e_{it}} \] \[ \tag{7.3} x_{1,it}=\theta_{0}+ \theta_1x_{2,it}+ \nu_{it} \] \[ \tag{7.4} cov({e_{it}} ,\nu_{it}) \ne 0 \]
這一型的內生性是原方程式解釋變數 \(x_1\) 和 \(x_2\) 有線性重合，且他們的線性組合與殘差有相關性。也就是 \({b_1}{x_{1,it}}+{b_2}{x_{2,it}}\) 和 \(\mu_i\) 有相關性。文獻上經典的例子就是 Cornwell and Trumbull (1994) 對於美國北卡州的犯罪率研究，如下方程式

\[ {y_{it}}=a+{b_1}{x_{1,it}}+{b_2}{x_{2,it}}+\cdots +{b_10}{x_{10,it}}+{u_{it}} \] 變數解釋如下
y=crime rate(crime/population)
x1=probability of arrest (arrests/offenses)
x2= probability of conviction, given arrest
x3= probability of a prison, given conviction
x4=sanction severity(average prison sentence in days )
x5=ability of police force to detect crime(# of police per capita)
x6=population density(POP/area)
x7=percent of young male(male/POP, 15<age<24)
x8=percent minority(non-white/total)
x9=1, if western or central county; 0 else.
x10=1, if SAMA, POP>50000; 0 else.

x1~x5和公部門的能力有關，也反映了資源配置的有效性。 Cornwell and Trumbull (1994) 估計Eq.(7.5)時，考慮了x1(被逮捕機率) 和 x5(警方偵察犯罪能力)的線性關係，因此他們在固定效果的設定使用了2SLS方法，估計出的結果，卻顯示x1, x2, 與x3三個重要變數的參數，不具備統計上顯著的結果。x2代表的是「被捕後定罪的機率」，這個參數預期負向關係，也就是說，如果這個機率高，則會減少犯罪率。統計上不相關，代表了犯罪率高低與公部門的打擊能力無關，似乎也隱含了資源分配可以減少。以「警民比」衡量的x5無用，因此預算可以砍，警員人數可以縮編。
Baltagi (2006) 則指出，這個研究的問題在於只考慮了線性重和，卻忽略了y, x1, and x5.三者之間的內生性。因此，Baltagi 使用了 EC-2SLS 估計方法，成功的解決了這個問題。 EC-2SLS必須先對解釋變數的內生和外生做分類，如下：

\[ \tag{7.6} y_{it}={\mathbf{Y}_{it}}\gamma +{\mathbf{X}_{1,it}}{\mathbf{b}}+{\mu_{i}}+e_{it} \]
變數解釋如下：
\(\mathbf{Y}_{it}\) 矩陣是解釋變數中有內生性的變數，因為內生性，所以，被允許和殘差之間產生相關性。
\({\mathbf{X}_{1,it}}\) 則是剩餘的外生解釋變數矩陣
\(IV_{it}\) 則是工具變數矩陣。

因此，前式可改寫為

\[ \tag{7.7} {y_{it}}={{\mathbf{Y}}_{it}}\gamma +{{\mathbf{X}}_{1it}}\mathbf{b}+{{\mu }_{i}}+{e_{it}}={{\mathbf{Z}}_{it}}\delta +{{\mu }_{i}}+{e_{it}} \]
其中 \({{\mathbf{Z}}_{it}}=[{{\mathbf{Y}}_{it}},{{\mathbf{X}}_{it}}]\text{ ; }{{\mathbf{X}}_{it}}=[{{\mathbf{X}}_{1it}},\mathbf{I}{{\mathbf{V}}_{it}}]\)

Baltagi (1981b) 的EC-2SLS，先用2SLS估計了Eq.(7.7)的固定效果估計值，再用2SLS估計Eq. (7.7)的Between估計值，然後對兩者加權平均。對於這個問題，在文獻上，至少有超過五種以上的估計方法，但是，泰半極為類似。大都是基於panel data的基本三個估計式，再由2SLS的一般化而來。另一有名的是 Balestra and Varadharajan-Krishnakumar (1987)，跟 Baltagi (1981b) 的處理方法也相當類似，文獻上的常用的代號為G2SLS，強調了他使用GLS在兩階段的估計。但是，雖然有歧見進理論上的特色，在實證上並沒有太大差異，本書就不介紹。有興趣的讀者，請索引原始論文。

7.2.1 R Lab

Step 1. 載入與建立panel data

library(plm)
temp1 = read.csv("data/crime.csv")
mydata1.pd= pdata.frame(temp1, index = c("county","year"))
head(mydata1.pd)

##      county year    crmrte   prbarr  prbconv  prbpris avgsen     polpc  density
## 1-81      1   81 0.0398849 0.289696 0.402062 0.472222   5.61 0.0017868 2.307159
## 1-82      1   82 0.0383449 0.338111 0.433005 0.506993   5.59 0.0017666 2.330254
## 1-83      1   83 0.0303048 0.330449 0.525703 0.479705   5.80 0.0018358 2.341801
## 1-84      1   84 0.0347259 0.362525 0.604706 0.520104   6.89 0.0018859 2.346420
## 1-85      1   85 0.0365730 0.325395 0.578723 0.497059   6.55 0.0019244 2.364896
## 1-86      1   86 0.0347524 0.326062 0.512324 0.439863   6.90 0.0018952 2.385681
##         taxpc  region smsa  pctmin     wcon      wtuc     wtrd     wfir
## 1-81 25.69763 central   no 20.2187 206.4803  333.6209 182.3330 272.4492
## 1-82 24.87425 central   no 20.2187 212.7542  369.2964 189.5414 300.8788
## 1-83 26.45144 central   no 20.2187 219.7802 1394.8030 196.6395 309.9696
## 1-84 26.84235 central   no 20.2187 223.4238  398.8604 200.5629 350.0863
## 1-85 28.14034 central   no 20.2187 243.7562  358.7830 206.8827 383.0707
## 1-86 29.74098 central   no 20.2187 257.9139  369.5465 218.5165 409.8842
##          wser   wmfg   wfed   wsta   wloc       mix   pctymle
## 1-81 215.7335 229.12 409.37 236.24 231.47 0.0999179 0.0876968
## 1-82 231.5767 240.33 419.70 253.88 236.79 0.1030491 0.0863767
## 1-83 240.1568 269.70 438.85 250.36 248.58 0.0806787 0.0850909
## 1-84 252.4477 281.74 459.17 261.93 264.38 0.0785035 0.0838333
## 1-85 261.0861 298.88 490.43 281.44 288.58 0.0932486 0.0823065
## 1-86 269.6129 322.65 478.67 286.91 306.70 0.0973228 0.0800806

#pdim(mydata1.pd)

Step 2. 建立公式。

定義迴歸公式。這個公式用 |. 分成兩部份：
|. 前面一部份是完整的迴歸方程式；
|. 後面的變數
帶 “-” 號的是需要工具變數處理的有問題變數，帶 “+” 號的是工具變數。因為在第二階段，工具變數部份替代了有問題的兩個變數。

myformulaIV=as.formula(log(crmrte) ~ log(prbarr)+log(polpc)+log(prbconv) +log(prbpris)+log(avgsen)+ log(density) +log(wcon)+log(wtuc) +log(wtrd) + log(wfir) + log(wser)+log(wmfg)+log(wfed) +log(wsta) + log(wloc) + log(pctymle) +log(pctmin)+region +smsa+factor(year)|. -log(prbarr) -log(polpc) +log(taxpc) +log(mix))

Step 3. 估計。
函數內的 inst.method =“” 就是估計方法。模組plm提供兩種：
內建baltagi，所以不宣告就是baltagi的兩階段方法。
另一個選項是 “bvk”，就是前面的 Balestra and Varadharajan-Krishnakumar (1987) 。

out_IV=plm(myformulaIV , data = mydata1.pd, model = "random",inst.method = "baltagi")
summary(out_IV)

## Oneway (individual) effect Random Effect Model 
##    (Swamy-Arora's transformation)
## Instrumental variable estimation
##    (Baltagi's transformation)
## 
## Call:
## plm(formula = myformulaIV, data = mydata1.pd, model = "random", 
##     inst.method = "baltagi")
## 
## Balanced Panel: n = 90, T = 7, N = 630
## 
## Effects:
##                   var std.dev share
## idiosyncratic 0.02227 0.14923 0.326
## individual    0.04604 0.21456 0.674
## theta: 0.7458
## 
## Residuals:
##      Min.   1st Qu.    Median   3rd Qu.      Max. 
## -4.997164 -0.465637  0.027153  0.512779  3.917220 
## 
## Coefficients:
##                  Estimate Std. Error z-value  Pr(>|z|)    
## (Intercept)    -1.1476553  1.2889537 -0.8904  0.373263    
## log(prbarr)    -0.4129201  0.0974056 -4.2392 2.243e-05 ***
## log(polpc)      0.4347568  0.0896981  4.8469 1.254e-06 ***
## log(prbconv)   -0.3228859  0.0535539 -6.0292 1.648e-09 ***
## log(prbpris)   -0.1863204  0.0419391 -4.4426 8.886e-06 ***
## log(avgsen)    -0.0101739  0.0270229 -0.3765  0.706551    
## log(density)    0.4290337  0.0548511  7.8218 5.208e-15 ***
## log(wcon)      -0.0074746  0.0395773 -0.1889  0.850202    
## log(wtuc)       0.0454430  0.0197925  2.2960  0.021678 *  
## log(wtrd)      -0.0081453  0.0413823 -0.1968  0.843960    
## log(wfir)      -0.0036394  0.0289236 -0.1258  0.899867    
## log(wser)       0.0056112  0.0201257  0.2788  0.780393    
## log(wmfg)      -0.2041324  0.0804418 -2.5376  0.011160 *  
## log(wfed)      -0.1635333  0.1594522 -1.0256  0.305083    
## log(wsta)      -0.0540400  0.1056774 -0.5114  0.609094    
## log(wloc)       0.1630405  0.1196368  1.3628  0.172947    
## log(pctymle)   -0.1080968  0.1397015 -0.7738  0.439067    
## log(pctmin)     0.1890388  0.0415013  4.5550 5.238e-06 ***
## regionother     0.1940408  0.0598277  3.2433  0.001181 ** 
## regionwest     -0.0327993  0.0887663 -0.3695  0.711754    
## smsayes        -0.2251624  0.1156369 -1.9471  0.051517 .  
## factor(year)82  0.0107457  0.0257968  0.4166  0.677006    
## factor(year)83 -0.0837924  0.0307088 -2.7286  0.006360 ** 
## factor(year)84 -0.1034973  0.0370886 -2.7905  0.005262 ** 
## factor(year)85 -0.0956959  0.0494505 -1.9352  0.052968 .  
## factor(year)86 -0.0688930  0.0595961 -1.1560  0.247681    
## factor(year)87 -0.0314024  0.0705204 -0.4453  0.656106    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    30.168
## Residual Sum of Squares: 544.47
## R-Squared:      0.59845
## Adj. R-Squared: 0.58114
## Chisq: 575.685 on 26 DF, p-value: < 2.22e-16

從迴歸公式和估計結果，有幾點需要注意。
1.一旦宣告了工具變數方法，我們就不能使用雙維模式宣告effect= “twoways”，必須採用混和模式。也就是N的個別效果用隨機效果（model = “random”）去估計，但是T的期間效果，則給予資料，讓他成為固定效果，這就是公式內的指令as.factor(year)。
2.如果要使用固定效果處理整個模型，則將宣告改成：model = “within”。但是，我們的方程式解釋變數有許多虛擬變數，例如，region, sama等等；固定效果設定的 \(Q \; transform\) 會將之全數移除。

7.3 Hausman and Taylor (1981)

接下來的這一型內生性，發生在觀察不到的「個別效果 \(\mu\) 」和「解釋變數」之間。如下式：
\[ \tag{7.8} {{y}_{it}}=a+{{b}_{1}}{{x}_{1,it}}+{{b}_{2}}{{x}_{2,it}}+{{\mu }_{i}}+{{e}_{it}} \] 也就是 \(\mu_i\)和\(x_{it}\) 之間有相關性。文獻上對於這一型的內生性，早期出現兩個極端：一、兩者全無內生性；二、「所有解釋變數」和「個別效果 \(\mu\) 」相關。對於後者，Mundlak (1978) 證明了固定效果是最好的估計方法。
不從全有或全無的極端，Hausman and Taylor (1981) 則對解釋變數區分出「部分有內生性」的狀況，提出處理方式。因為原文的數理過於龐雜，我們先用實證範例說明，如經典範例 Baltagi and Khanti-Akom (1990) 的薪資決定方程式，如下：
\[ {{y}_{it}}=a+{{b}_{1}}{{x}_{1,it}}+{{b}_{2}}{{x}_{2,it}}+\cdots +{{b}_{12}}{{x}_{12,it}}+{{\mu }_{i}}+{{e}_{it}} \]

變數解釋如下
y= log(wage)
x1= weeks worked
x2=1, if the individual resides in the South (RESIDENCE)
x3=1, if the individual resides in the SMSA (RESIDENCE)
x4=1, if the individual is married (MARITAL STATUS)
x5=years of full-time work experience
x6=the square of x5
x7=1, blue-collar worker (OCCUPATION)
x8=1, if the individual works in the manufacturing industry (INDUSTRY)
x9=1, the individual’s wage is set by Union contract (UNION COVERAGE)
x10=1, if the individual is female (SEX)
x11=1, if the individual is black (RACE)
x12=years of education
mi=觀察不到的橫斷面特徵，例如，能力

薪資反應的就是工作績效折現值，在這個迴歸式子，薪資的條件期望值一大部分，受許多可觀察的解釋變數所決定，其中，x5是工作的經驗資本，x12教育程度，衡量了人力資本的投資。然而，人的工作績效會受到稟賦能力的影響，這種稟賦無法被直接觀察，好比責任感，積極進取性，和態度等等；在panel data內，這個因素被視為個別效果\(\mu_i\)。但是，稟賦能力和受教育程度確有了高度相關性：稟賦能力較高者，追求更高教育的能力也越高，因為也願意投資自己。受教育能力月高的話，對於稟賦能力的的提升亦有所助益。所以這就產生了「觀察不到的因素i」和「解釋變數x12」之間的相關，也就是內生性；類似的關係，也發生在「觀察不到的因素 \(\mu_i\) 」和「解釋變數x5」之間。

Hausman and Taylor (1981) 估計式的代數結構如下

\[ \tag{7.9} {{y}_{it}}={{\mathbf{X}}_{it}}\mathbf{b}+{{\mathbf{Z}}_{i}}\mathbf{\delta }+\mu {}_{i}+{{e}_{it}} \]
其中，
\({{P}_{A}}=A{{\left( {A}'A \right)}^{-1}}{A}'\)，
\({{\hat{\delta }}_{2SLS}}={{\left( {Z}'{{P}_{A}}Z \right)}^{-1}}{Z}'{{P}_{A}}\hat{d}\)
\({{\hat{d}}_{i}}={{\bar{y}}_{i.}}-{{{\bar{X}}'}_{i.}}{{\tilde{\beta }}_{W}}\)
\(\tilde{\sigma }_{\varepsilon }^{2}=\frac{{\bar{y}}'{{{\bar{P}}}_{{\tilde{X}}}}\tilde{y}}{N\left( T-1 \right)}\)
\(\tilde{y}=Qy,\text{ }\tilde{X}=QX,\text{ }{{\bar{P}}_{A}}=I-{{P}_{A}}\)
\(\tilde{\sigma }_{1}^{2}=\frac{{{\left( {{y}_{it}}-{{X}_{it}}{{{\tilde{\beta }}}_{W}}-{{Z}_{i}}{{{\hat{\delta }}}_{2SLS}} \right)}^{\prime }}P\left( {{y}_{it}}-{{X}_{it}}{{{\tilde{\beta }}}_{W}}-{{Z}_{i}}{{{\hat{\delta }}}_{2SLS}} \right)}{N}\)

Hausman and Taylor估計式是對下式
\({{\hat{\Omega }}^{{-1}/{2}\;}}{{y}_{it}}={{\hat{\Omega }}^{{-1}/{2}\;}}{{X}_{it}}\beta +{{\hat{\Omega }}^{{-1}/{2}\;}}{{Z}_{i}}\delta +{{\hat{\Omega }}^{{-1}/{2}\;}}{{u}_{it}}\)

以工具變數矩陣 \({{A}_{HT}}=\left[ \tilde{X},{{{\bar{X}}}_{1}},{{Z}_{1}} \right]\) 做2SLS的結果。如下，

\[ \left( \begin{matrix} {\hat{b}} \\ {\hat{\gamma }} \\ \end{matrix} \right)={{\left[ \left( \begin{matrix} {{X}^{*}}^{\prime } \\ {{Z}^{*}}^{\prime } \\ \end{matrix} \right){{P}_{A}}\left( {{X}^{*}},{{Z}^{*}} \right) \right]}^{-1}}\left( \begin{matrix} {{X}^{*}}^{\prime } \\ {{Z}^{*}}^{\prime } \\ \end{matrix} \right){{P}_{A}}{{y}^{*}} \]

\(P_A\) 是在 \({{A}_{HT}}=\left[ \tilde{X},{{{\bar{X}}}_{1}},{{Z}_{1}} \right]\) 上的正交投影(projection)。

我們用薪資的例子來說明這個複雜的設計，重寫一這條方程式如下，

\[ {{y}_{it}}={{\mathbf{X}}_{it}}\mathbf{b}+{{\mathbf{Z}}_{i}}\mathbf{\delta }+\mu {}_{i}+{{e}_{it}} \]
上式中， \(X=[X_1, X_2]\) 和 \(Z=[Z_1, Z_2]\) 。裡面各由一個外生變數和內生變數的變數向量組成。

1. \([X_1, Z_1]\) 為外生成分，且不與 \(\mu\) 相關。
2. \([X_2, Z_2]\) 為內生成份, 與\(\mu\) 相關，但與殘差無關。

Hausman-Taylor的估計方法，只須要妥善的分類，就可以估計須要的參數。再進一步的細節，我們用R的程式來說明。

7.3.1 R Lab

Step 1. 載入資料

library(plm)
temp2 = read.csv("data/Wages.csv")
mydata2.pd= pdata.frame(temp2, index = c("ID","year"))
head(mydata2.pd)

##        ID year exp wks bluecol ind south smsa married  sex union ed black
## 1-1976  1 1976   3  32      no   0   yes   no     yes male    no  9    no
## 1-1977  1 1977   4  43      no   0   yes   no     yes male    no  9    no
## 1-1978  1 1978   5  40      no   0   yes   no     yes male    no  9    no
## 1-1979  1 1979   6  39      no   0   yes   no     yes male    no  9    no
## 1-1980  1 1980   7  42      no   1   yes   no     yes male    no  9    no
## 1-1981  1 1981   8  35      no   1   yes   no     yes male    no  9    no
##          lwage
## 1-1976 5.56068
## 1-1977 5.72031
## 1-1978 5.99645
## 1-1979 5.99645
## 1-1980 6.06146
## 1-1981 6.17379

Step 2. 建立公式。
這個公式用 | 分成兩部份：前面一部份是完整的迴歸方程式；後面的是工具變數分類處理。後面詳細解釋。

myFormulaHT=as.formula(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed |
bluecol + south + smsa + ind + sex + black |
wks + married + union + exp + I(exp ^ 2))

我們先看一看Hausman-Taylor下面分類：
X1=[bluecol, south, smsa, ind, sex, black]：隨時間變動的外生變數
X2=[exp, exp2, wks, ms, union]：隨時間變動的內生變數
Z1=[sex, black]：不隨時間變動的外生變數
Z2=[ed]：不隨時間變動的內生變數

這分類法方式，是Hausman-Taylor的特徵：不需要另外找工具變數，只需要將解釋變數作妥善的分類。因此，在R的程式，就需要認識宣告的問題：關鍵在於垂直線 | 前後，是宣告\(Z\)的兩群「不隨時間變動」的變數。我們先看看這個函數宣告的公式：

緊鄰垂線之後，是所有的外生變數，但是排序先的是不隨時間變動的。區分關鍵是
1.垂線之前的變數，和垂線之後的第1個變數相同。也就是SEX。
2.垂線之前以SEX開始，到垂線，共有3個變數：sex,black, ed。都是外生變數
3.這3個和垂線之後重疊的有2個：
(a) sex,black：為”不隨時間變動”的「外生」變數。
(b) 剩餘一個ed：是”不隨時間變動”的「內生」變數。
4.其餘X1和X2自動配置。

Step 3. 估計

out_ht=plm(myFormulaHT, data = mydata2.pd, random.method = "ht", model = "random", inst.method = "baltagi")
summary(out_ht)

## Oneway (individual) effect Random Effect Model 
##    (Hausman-Taylor's transformation)
## Instrumental variable estimation
##    (Baltagi's transformation)
## 
## Call:
## plm(formula = myFormulaHT, data = mydata2.pd, model = "random", 
##     random.method = "ht", inst.method = "baltagi")
## 
## Balanced Panel: n = 595, T = 7, N = 4165
## 
## Effects:
##                   var std.dev share
## idiosyncratic 0.02304 0.15180 0.025
## individual    0.88699 0.94180 0.975
## theta: 0.9392
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -12.643736  -0.466002   0.043285   0.524739  13.340263 
## 
## Coefficients:
##                Estimate  Std. Error z-value  Pr(>|z|)    
## (Intercept)  2.7818e+00  3.0765e-01  9.0422 < 2.2e-16 ***
## wks          8.3740e-04  5.9973e-04  1.3963   0.16263    
## southyes     7.4398e-03  3.1955e-02  0.2328   0.81590    
## smsayes     -4.1833e-02  1.8958e-02 -2.2066   0.02734 *  
## marriedyes  -2.9851e-02  1.8980e-02 -1.5728   0.11578    
## exp          1.1313e-01  2.4710e-03 45.7851 < 2.2e-16 ***
## I(exp^2)    -4.1886e-04  5.4598e-05 -7.6718 1.696e-14 ***
## bluecolyes  -2.0705e-02  1.3781e-02 -1.5024   0.13299    
## ind          1.3604e-02  1.5237e-02  0.8928   0.37196    
## unionyes     3.2771e-02  1.4908e-02  2.1982   0.02794 *  
## sexmale      1.3092e-01  1.2666e-01  1.0337   0.30129    
## blackyes    -2.8575e-01  1.5570e-01 -1.8352   0.06647 .  
## ed           1.3794e-01  2.1248e-02  6.4919 8.474e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    243.04
## Residual Sum of Squares: 4163.6
## R-Squared:      0.60945
## Adj. R-Squared: 0.60833
## Chisq: 6891.87 on 12 DF, p-value: < 2.22e-16

References

Balestra, Pietro, and Jayalakshmi Varadharajan-Krishnakumar. 1987. “Full Information Estimations of a System of Simultaneous Equations with Error Component Structure.” Econometric Theory 3 (2): 223–46. http://www.jstor.org/stable/3532463.

———. 1981b. “Simultaneous Equations with Error Components.” Journal of Econometrics 17 (2): 189–200. https://doi.org/https://doi.org/10.1016/0304-4076(81)90026-9.

———. 2006. “Estimating an Economic Model of Crime Using Panel Data from North Carolina.” Journal of Applied Econometrics 21 (4): 543–47. https://doi.org/https://doi.org/10.1002/jae.861.

Baltagi, Badi H., and Sophon Khanti-Akom. 1990. “On Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental Variables Estimators.” Journal of Applied Econometrics 5 (4): 401–6. https://doi.org/https://doi.org/10.1002/jae.3950050408.

Cornwell, Christopher, and William N. Trumbull. 1994. “Estimating the Economic Model of Crime with Panel Data.” The Review of Economics and Statistics 76 (2): 360–66. http://www.jstor.org/stable/2109893.

Hausman, Jerry A., and William E. Taylor. 1981. “Panel Data and Unobservable Individual Effects.” Journal of Econometrics 16 (1): 155. https://doi.org/https://doi.org/10.1016/0304-4076(81)90085-3.

Mundlak, Yair. 1978. “On the Pooling of Time Series and Cross Section Data.” Econometrica 46 (1): 69–85. http://www.jstor.org/stable/1913646.

許育進, and 賴宗志. 2018. “處理效果文獻回顧.” 經濟論文叢刊 46 (4): 501–21. https://doi.org/10.6277/TER.201812_46(4).0001.

第 7 章 內生性問題