It is a markdown demo of a lagel analytic research, “Predict Child Custody Judge Case by xgBoost.” The main knowledge points are based on our research Paper, “The Application of Artificial Intelligence and Legal Analytics: Focused on Decisions Regarding Child Custody” (2019), NTU Law Journal(TSSCI), 48(4). Please check the detail in the paper.「人工智慧與法律資料分析之方法與應用:以單獨親權酌定裁判的預測模型為例」,『台大法學論叢』,48卷4期。
We have used 690 custody cases between 2013-2016, with were tagged by speciailists. We need to change some type of the variables, and transform it to a Matrix by “sparse.model.matrix function”. The Matrix include all features that we can predict case results.
Please notice that we are focusing the “result of judge” which is called “factorForFather”.
We use 80% of fulldata as train-set,and rest 20% as test-set. Then we fold it 10 times.
## [1] train-auc:0.963357+0.008568 test-auc:0.934646+0.039506
## Multiple eval metrics are present. Will use test_auc for early stopping.
## Will train until test_auc hasn't improved in 100 rounds.
##
## [51] train-auc:1.000000+0.000000 test-auc:0.985876+0.008631
## [101] train-auc:1.000000+0.000000 test-auc:0.986330+0.007577
## [151] train-auc:1.000000+0.000000 test-auc:0.985846+0.007902
## Stopping. Best iteration:
## [61] train-auc:1.000000+0.000000 test-auc:0.986846+0.007312
## [1] train-error:0.038647
## [51] train-error:0.000000
## [61] train-error:0.000000
Now, we visualize importance of the Factors by bar-chart. You can see the longer bar figured more important factor. The x-axis indicates importance-proportion of them. Sum of the proportion can be one.
## Feature Gain Cover Frequency
## 1: caregiver 0.3217237839 0.1045417681 0.084848485
## 2: childWill 0.2616556523 0.1529756631 0.088888889
## 3: interaction 0.1553694038 0.1890666268 0.101010101
## 4: socialWorkerRep 0.0655308966 0.0704793535 0.062626263
## 5: ID 0.0503301832 0.1011276654 0.276767677
## 6: supportSystem 0.0385168865 0.0881428593 0.072727273
## 7: parentMoral 0.0158465131 0.0519858047 0.016161616
## 8: parentEconomy 0.0134708329 0.0282823373 0.070707071
## 9: childNum 0.0110963558 0.0200304276 0.044444444
## 10: careTime 0.0110836183 0.0119224002 0.024242424
## 11: careEnvironment 0.0102072561 0.0088009807 0.012121212
## 12: undueBehavior 0.0100726979 0.0601366926 0.014141414
## 13: currentResidence 0.0092678520 0.0167552879 0.034343434
## 14: friendlyParent 0.0087021082 0.0136962952 0.020202020
## 15: childSex 0.0069179862 0.0141437688 0.034343434
## 16: carePlan 0.0067913664 0.0329968118 0.022222222
## 17: parentHealth 0.0019311949 0.0021137343 0.004040404
## 18: parentEducation 0.0008744753 0.0305807560 0.010101010
## 19: otherRelationship 0.0003176229 0.0012492993 0.002020202
## 20: parentUnderstanding 0.0002933139 0.0009714673 0.004040404
Data: xgb.preds2 in 111 controls (test.child[, factorForFather] 0) < 27 cases (test.child[, factorForFather] 1). Area under the curve: 0.9923
## XGB Area under the curve 0.9961315
We provide 5 indicates to measure the model.
## y_pred
## y_true 0 1
## 0 47 0
## 1 2 20
## [1] "AUC: 0.996131528046422"
## [1] "Accuracy: 0.971014492753623"
## [1] "F1-score: 0.952380952380952"
We can see different think pathes in different cases. The red bar shows a advantage to “MOTHER”, and the blues to “FATHER”. we can see in every case the judge give every factors a wright, them out them together to decide.
Figure 1. It is a typical “Mother-win” case. The mother have token three main factors, such as “interaction”,“childwill” and “caregiver”. Thought “supportSystem” which often means the richer one, is to the father, finally the mother won. The rest factor could only a few weight in this case.
Figure 2 provides another situation, “caregiver”,“childwell”,“interaction”,and “social worker report” are all to the father very effectivelly. Almost undoubtable is the father better than the other side to his child in the case. Blue part is far leading than the reds apparently. It means the predict is to father, and also the real result is.
Figure 3(idx63) gives us a little ambiguous case. “Caregiver” is to the father significantly, and “interaction”,“careTime”,“parentEconomy”,“currentResidence” is too. But other factors, such as “socialworkRep”,“supportSystem” and rest small-weight factors are to the mother. The red part pull advantage part by part, then they almost even on the total score. the final predict is 0.12 to the father, and it is also the real judge for.
The prediction ends on very few advantage for father, and also the real end is.
The remarkable advances in artificial intelligence influences human lives in almost every aspect including business and academic research. For example, it became more and more common to use machine learning techniques to analyze, categorize texts and predict outcomes, which can assist human in making more accurate decisions. This research attempts to explore the possibility to apply artificial intelligence approaches to legal studies. Firstly, this study introduces recent developments on artificial intelligence and the basic concepts regarding machine learning. Secondly, it explains how machine learning algorithms can be used to better predict legal outcomes. To demonstrate the strength of predictions, this article applies gradient boosting to analyze decisions related to child custody in Taiwan. We collected 448 cases from 2012 through 2014, involving 690 children whose parents were both Taiwanese and willing to acquire the custody, and in which the Taiwanese district court granted one parent sole custody. It is found that among factors enumerated in Article 1055-1 of Taiwan Civil Code, the three most important ones that judges consider are primary caregiver (gain=0.356), wishes of the child (gain=0.267), and parent-child interaction (gain=0.152). In terms of outcome predictions, the accuracy of the model is 95.7 % and F1 score is 0.927. The model built by gradient boosting could also demonstrate its application on individual cases - that is to say, it is able to reveal factors and how much they weighed on affecting the machine’s prediction for a given case. By visualizing through waterfall charts, we may have a better understanding of criteria inside the machine’s "mind". This clearly illustrates in custody disputes what factors and to what extent judges consider important in Taiwan. In addition, this effective predictive model can help improve the predictability and certainly of law. Based on this, divorce lawyers can preliminarily assess their clients’ chances at winning divorce lawsuits and propose the most optimal dispute resolution strategy. The informational asymmetries leading to wasteful expenditure on litigation may be reduced. In the long run, legal analytics can improve the acquirability and affordability of information about legal rights and responsibilities, which will enhance public trust and confidence in judicial system.
## [1] "case 1 childID:28, 101,婚,185, 審酌前揭訪視報告意見,認原告於經濟能力、親子關係及親職能力等方面均具單獨照護未成年子女之條件,且其本身亦有監護之強烈意願。又未成年子女鄭宇傑自幼之日常生活起居及學習均由原告照顧,長期以來原告為子女之主要照顧者,對於子女之了解與需求自較被告熟悉,原告與子女間情感依附關係緊密,互動關係良好,如驟然變動子女生活環境,恐使子女之身心無從於穩定之環境中成長發展。至被告雖亦有經濟能力撫育子女,惟其照顧未成年子女日常生活之能力上有疑慮,且其情緒控制能力亦尚待加強,較之原告殊難期待其能善盡監護子女之責。此外,鄭宇傑於社工訪視及本院審理時均表明與原告共同生活之意願,其意願亦應予以適度尊重。綜上,本院認對於鄭宇傑權利義務之行使或負擔由原告單獨任之"
## [1] "case 2 childID:123, 102,婚,294, 審酌上情及訪視報告內容,認為被告確難認能善盡照護長男、長女之責,而原告則具備監護意願及能力,支持系統亦屬穩定,又長男、長女與原告之父母相處感情甚篤,並無任何不適當之處等一切情狀,認對於兩造所生未成年子女葉東霖、葉彥儀之權利義務之行使及負擔,如由原告任之"
## [1] "case 3 childID:581, 102,婚,171, 審酌該子女意願,其現就讀學校現況,並考量該未成子女之現在教育及生活情形、兩造經濟能力、人格、親屬援助之可能性、家庭環境及訪視報告,並參酌兩造之互動,短期難見改善,如採行共同監護者,日後為未成年子女辦理學籍、戶籍或醫療等重要事項,需共同決定或簽名同意,難免再次爭執,影響到小孩的感受,因此,為免於日後雙方再事造成爭執等一切情狀,認有關該未成年子女權利義務之行使或負擔,由被告任之,應符合其等利益。"
## [1] "case 4 childID:468, 103,婚,608, 兩造均有心監護照顧子女,且均有監護能力,雙方亦各有家人可協助照顧子女;惟兩名未成年子女過去生活一向由林益正及其父親擔任主要照顧人,與林益正及其家人已形成緊密的依附關係,又未成年子女林郁安亦向社工表示對呂昭儀之負面印象,並表達想與林益正同住之意願,足見未成年子女已習慣林益正所提供之生活環境,並與林益正及其父母建立緊密依附關係,基於「主要照顧者原則」、「最小變動原則」、「尊重子女意願原則」之子女最佳利益;復參酌兩造之經濟能力、監護能力及意願、親屬援助之可能性、家庭環境及監護現狀等一切情狀,認兩造所生之未成年子女林郁安、林彥丞權利義務之行使或負擔,應由林益正任之"