It is a markdown demo of a lagel analytic research, “Predict Child Custody Judge Case by xgBoost.” The main knowledge points are based on our research Paper, “The Application of Artificial Intelligence and Legal Analytics: Focused on Decisions Regarding Child Custody” (2019), NTU Law Journal(TSSCI), 48(4). Please check the detail in the paper.「人工智慧與法律資料分析之方法與應用:以單獨親權酌定裁判的預測模型為例」,『台大法學論叢』,48卷4期。

1. Prepare Data

1-1 data resource

We have used 690 custody cases between 2013-2016, with were tagged by speciailists. We need to change some type of the variables, and transform it to a Matrix by “sparse.model.matrix function”. The Matrix include all features that we can predict case results.

1-2 Definition the Result Factor

Please notice that we are focusing the “result of judge” which is called “factorForFather”.

1-3 Sampling and K-folding

We use 80% of fulldata as train-set,and rest 20% as test-set. Then we fold it 10 times.

2. Build Model

2-1. Setup Xgboost Parameters

## [1]  train-auc:0.963357+0.008568 test-auc:0.934646+0.039506 
## Multiple eval metrics are present. Will use test_auc for early stopping.
## Will train until test_auc hasn't improved in 100 rounds.
## 
## [51] train-auc:1.000000+0.000000 test-auc:0.985876+0.008631 
## [101]    train-auc:1.000000+0.000000 test-auc:0.986330+0.007577 
## [151]    train-auc:1.000000+0.000000 test-auc:0.985846+0.007902 
## Stopping. Best iteration:
## [61] train-auc:1.000000+0.000000 test-auc:0.986846+0.007312
## [1]  train-error:0.038647 
## [51] train-error:0.000000 
## [61] train-error:0.000000

2-2 Display Importance of the Factors

Now, we visualize importance of the Factors by bar-chart. You can see the longer bar figured more important factor. The x-axis indicates importance-proportion of them. Sum of the proportion can be one.

##                 Feature         Gain        Cover   Frequency
##  1:           caregiver 0.3217237839 0.1045417681 0.084848485
##  2:           childWill 0.2616556523 0.1529756631 0.088888889
##  3:         interaction 0.1553694038 0.1890666268 0.101010101
##  4:     socialWorkerRep 0.0655308966 0.0704793535 0.062626263
##  5:                  ID 0.0503301832 0.1011276654 0.276767677
##  6:       supportSystem 0.0385168865 0.0881428593 0.072727273
##  7:         parentMoral 0.0158465131 0.0519858047 0.016161616
##  8:       parentEconomy 0.0134708329 0.0282823373 0.070707071
##  9:            childNum 0.0110963558 0.0200304276 0.044444444
## 10:            careTime 0.0110836183 0.0119224002 0.024242424
## 11:     careEnvironment 0.0102072561 0.0088009807 0.012121212
## 12:       undueBehavior 0.0100726979 0.0601366926 0.014141414
## 13:    currentResidence 0.0092678520 0.0167552879 0.034343434
## 14:      friendlyParent 0.0087021082 0.0136962952 0.020202020
## 15:            childSex 0.0069179862 0.0141437688 0.034343434
## 16:            carePlan 0.0067913664 0.0329968118 0.022222222
## 17:        parentHealth 0.0019311949 0.0021137343 0.004040404
## 18:     parentEducation 0.0008744753 0.0305807560 0.010101010
## 19:   otherRelationship 0.0003176229 0.0012492993 0.002020202
## 20: parentUnderstanding 0.0002933139 0.0009714673 0.004040404

2-3 Predicted the Results I

Data: xgb.preds2 in 111 controls (test.child[, factorForFather] 0) < 27 cases (test.child[, factorForFather] 1). Area under the curve: 0.9923

## XGB Area under the curve 0.9961315

2-4 Predicted the Results II: MLmetrics

We provide 5 indicates to measure the model.

##       y_pred
## y_true  0  1
##      0 47  0
##      1  2 20
## [1] "AUC: 0.996131528046422"
## [1] "Accuracy: 0.971014492753623"
## [1] "F1-score: 0.952380952380952"

3. Look Detailly in Everycase

We can see different think pathes in different cases. The red bar shows a advantage to “MOTHER”, and the blues to “FATHER”. we can see in every case the judge give every factors a wright, them out them together to decide.

Figure 1. It is a typical “Mother-win” case. The mother have token three main factors, such as “interaction”,“childwill” and “caregiver”. Thought “supportSystem” which often means the richer one, is to the father, finally the mother won. The rest factor could only a few weight in this case.

3-1 Figure 1, A Mother-win Case, 士林地院101,婚,185判決.

3-2 Figure 2. A Father-win Case, 桃園地院102,婚,294判決

Figure 2 provides another situation, “caregiver”,“childwell”,“interaction”,and “social worker report” are all to the father very effectivelly. Almost undoubtable is the father better than the other side to his child in the case. Blue part is far leading than the reds apparently. It means the predict is to father, and also the real result is.