稅碼分析工具-應(yīng)用機(jī)器學(xué)習(xí)繪制稅法_第1頁
稅碼分析工具-應(yīng)用機(jī)器學(xué)習(xí)繪制稅法_第2頁
稅碼分析工具-應(yīng)用機(jī)器學(xué)習(xí)繪制稅法_第3頁
稅碼分析工具-應(yīng)用機(jī)器學(xué)習(xí)繪制稅法_第4頁
稅碼分析工具-應(yīng)用機(jī)器學(xué)習(xí)繪制稅法_第5頁
已閱讀5頁,還剩85頁未讀 繼續(xù)免費閱讀

下載本文檔

版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)

文檔簡介

WorkingPaper

CARTERC.PRICE,LEONARDOBUENO,KARAJIA,SABAHATZAFAR,GEORGEZUO

TaxCodeAnalysis

Tool

ApplyingMachineLearningtoMaptheTaxCode

RANDEducationandLabor

WR-A4129-1

August2025

PreparedfortheDiller-VonFurstenbergFamilyFoundation

ThisworkingpaperhasbeenapprovedforcirculationbyRANDEducationandLabor.Unlessotherwiseindicated,

workingpaperscanbequotedandcitedwithoutpermissionoftheauthor,providedthesourceisclearlyreferredtoasaworkingpaper.RAND’spublicationsdonotnecessarilyreflecttheopinionsofitsresearchclientsandsponsors.isaregisteredtrademark.Learnmoreat

.

WR-A4129-1

Formoreinformationonthispublication,visit

/t/WRA4129-1

.

AboutRAND

RANDisaresearchorganizationthatdevelopssolutionstopublicpolicychallengestohelpmakecommunitiesthroughouttheworldsaferandmoresecure,healthierandmoreprosperous.RANDisnonprofit,nonpartisan,andcommittedtothepublicinterest.TolearnmoreaboutRAND,visit

.

ResearchIntegrity

Ourmissiontohelpimprovepolicyanddecisionmakingthroughresearchandanalysisisenabledthroughourcorevaluesofqualityandobjectivityandourunwaveringcommitmenttothehighestlevelofintegrityandethicalbehavior.Tohelpensureourresearchandanalysisarerigorous,objective,andnonpartisan,wesubjectourresearchpublicationstoarobustandexactingquality-assuranceprocess;avoidboththeappearanceandrealityoffinancialandotherconflictsofinterestthroughstafftraining,projectscreening,andapolicyofmandatorydisclosure;andpursuetransparencyinourresearchengagementsthroughourcommitmenttotheopenpublicationofourresearchfindingsandrecommendations,disclosureofthesourceoffundingofpublishedresearch,andpoliciestoensureintellectualindependence.Formoreinformation,visit

/about/research-integrity

.

RAND’spublicationsdonotnecessarilyreflecttheopinionsofitsresearchclientsandsponsors.

PublishedbytheRANDCorporation,SantaMonica,Calif.

?2025RANDCorporation

RANDeisaregisteredtrademark.

LimitedPrintandElectronicDistributionRights

Thispublicationandtrademark(s)containedhereinareprotectedbylaw.ThisrepresentationofRANDintellectualpropertyisprovidedfornoncommercialuseonly.Unauthorizedpostingofthispublicationonlineisprohibited;linkingdirectlytoitswebpageonisencouraged.PermissionisrequiredfromRANDtoreproduce,orreuseinanotherform,anyofitsresearchproductsforcommercialpurposes.Forinformationonreprintandreusepermissions,visit

/about/publishing/permissions

.

AboutThisWorkingPaper

iii

Inthisworkingpaper,wedescribeatoolthatwecreatedbyapplyingnaturallanguage

processingmethodstoproduceagraphdatabaseoflegaltext.ThetooliscalledtheTaxCode

AnalysisTool(CAT),anditconsistsofacomprehensivegraphdatabasemappingofTitle26oftheU.S.Codeoninternalrevenue.CATconnectsvariousentities(suchasindividuals,married

couples,partnerships,orcorporations)andtaxconcepts(suchaswages,deductions,orcredits)

andlinksthemtotheirspecificsectionsintheU.S.federaltaxcode.Thisenablesusersto

comparethetreatmentofdifferententitiesandconcepts,tracktheirinteractions,andinvestigatethepotentialimplicationsofproposedlegislativechanges.Additionally,CATcanbeconnectedtootherdatasets,suchasInternalRevenueServicedataontaxreturnsorsurveydataontheU.S.population,toprovideadditionalinformationabouttherelativeimpactsofdifferentconceptsandsections.

Withthisworkingpaper,weseektoobtainpreliminaryfeedbackonCATfromexpertsintaxpolicyandrelatedareas.Inits2025form,CATisintendedforusebyresearcherswithdatabasequerylanguageexperienceandthosewhoareinterestedintextualanalysisoftheU.S.taxcode.WearefurtherdevelopingCATtomakeitaccessibletouserswithoutdatabasequerylanguageexperience.Futureversionsofthetoolmightbeofinteresttothebroadercommunityoftax

accountants,lawyers,andpolicymakersinterestedinstudyingtheramificationsofU.S.federalfiscalpolicy.

AboutRANDEducationandLabor

ThisworkwasconductedwithinRANDEducationandLabor,adivisionofRANDthat

conductsresearchonearlychildhoodthroughpostsecondaryeducationprograms,workforce

development,andprogramsandpoliciesaffectingworkers,entrepreneurship,andfinancial

literacy,anddecisionmaking.Formoreinformation,visit

/education-and-labor

oremail

educationandlabor@

.

Funding

ThisresearchwassponsoredbytheDiller-VonFurstenbergFamilyFoundation.

Acknowledgments

WethankthemanystaffandcolleaguesatRANDwhohelpedusherthisprojectforward.Inparticular,wewouldliketothankPhilipArmour,ElviraLoredo,andKarlynStanleyfortheir

thoughtfulandthoroughreviewsofthiswork.

Summary

iv

Inthisworkingpaper,wedescribeourmethodstoconstructagraphdatabasethatwecalltheTaxCodeAnalysisTool(CAT).Thepurposeofthisworkingpaperistoobtainfeedbackfrom

expertsintaxpolicyandrelatedareasonthisfirstiterationofCAT.

CATisalargegraphdatabasethatlinksthetextofTitle26oftheU.S.Codetotheentitiesandconceptsitcovers.1Title26specifiestheincometaxesforindividualsandbusinessesandhasmorethan5,500sections.Thiscomplexitymakesitanaturaltargetformoderntextanalysismethods,includinglargelanguagemodels(LLMs).

Approach

CATgraphicallydescribestherelationshipbetweendifferentpartsofthelaw(e.g.,achapter,section,orsubsection),theentitiesthosepartsrelateto,andtherelevantconceptsincludedinthepart.Webuiltthistoolbyapplyingnaturallanguageprocessingtomaptheconnectionsbetweentaxcodesections.Specifically,thetwotypesofconnectionsarereferences(whenonepart

referencesanotherpart)andhierarchicalrelationships(whenonepartcontainsanother,suchasasubsectionwithinasectionorasectionwithinachapter).

WeusedanLLMtoexamineeachsectionandextractkeywords,includingtheentities(e.g.,individuals,partnerships,corporations)andtheconcepts(e.g.,income,deduction,employee,orcredit).2Theentitiesandconceptsarelinkedtothepartofthelawinwhichtheywerereferenced.Wethenmanuallyprunedthelistsofentitiesandconceptstocreateaconsistentandcoherent

list.WeintendtoreleaseCATasopen-sourcesoftwareinthefuture.

Uses

WedemonstratethefollowingfourusecasesforthisiterationofCAT:

1.identifyingtheimportanceofsectionsofthetaxcodeortaxconceptsusingthenumberandnatureofconnections3

2.mappingtheconnectionsofaconcept,entity,orindextootherconcepts,entities,andindexesinthetaxcodetounderstandhowtheyrelate

1U.S.Code,Title26,InternalRevenueCode.

2Specifically,weusedGPT-4o.

3Importancecanmeanthenumberofcitationsinothersectionsorsectionsthatincludeaconceptorentity.Forexample,thiscanbeusedtoidentifywhichsectionsofthetaxcodearemostcommonlyreferencedbyother

sections.

v

3.comparingthetreatmentofdifferenttaxconceptsorentities4

4.linkingthetaxcodetothefiscalimportanceofdifferentconcepts.

Theseareonlyafewofthepossibleapplicationsforthetool,andweexpectthatuserswillfindotheruses.

FutureWork

Thepathsforfutureworkincludethefollowing:

?Refineandorganizetheconceptsandentities.Therearerelationshipsbetweenthe

differenttypesofconceptsandentitiesthatarenotcapturedinthetoolatthetimeofthiswriting.Forexample,CcorporationsandScorporationsarebothtypesofcorporations.Similarly,corporationsandpartnershipscanbothbetypesofbusinesses.Thus,

hierarchicalrelationshipsbetweenentitiesandconceptscanprovidehelpfulinformationfortaxcodemodeling.

?MapthetaxcodetoInternalRevenueService(IRS)forms.BecauseIRSformsareakeymechanismforcollectingandreportinginformation,itwouldbehelpfultomapthetaxcodetotheforms.ThiswouldimprovethereliabilityofthedataconnectionsthataredescribedinChapter4.

?Linkthetaxcodetoindividualsurveydata.InadditiontoIRSdatasources,such

sourcesastheAmericanCommunitySurvey,CurrentPopulationSurvey,andtheSurveyofConsumerFinancecapturerepresentativeinformationabouttheU.S.population.

Similarly,bylinkingthetaxcodetothesedatasets,wecandescribetheuniverseofallpossiblepeoplewhocouldparticipateinsomeprogramsorreceiveacertaintaxcredit.Additionally,wecouldusetheconnectiontothetaxdatatorefinetheestimatesinthesurveydata.

?Linkthetaxcodetofirmdata.Thetaxcodealsoappliestobusinesses,soitwouldbeusefultolinkthetaxcodetosuchdatasetsastheAmericanBusinessSurveyandthe

SmallBusinessCreditSurvey,tobetterunderstandhowdifferentpartsofthetaxcoderelatetofirms.

?Developagraphicaluserinterface.Theexistingtoolisnotstructuredforageneral

audienceandrequiressomefamiliaritywiththeCypherquerylanguage.Infuture

iterations,alargeraudiencecouldperformqueriesthroughagraphicaluserinterface.

?Addotherlegalsources.TheU.S.Codeisnotthesolesourceoftaxpolicy,andsuch

sourcesastheFederalRegister,courtorders,andpolicydirectivescouldalsobeincludedinthecodeanalysistool.Thesewouldbeaddedasadditionalclassesofnodesandlinkedtotheconcepts,entities,andindexesinthesamefashionastheexistingcodeanalysis

tool.

?Includeotherrevenuesources.AlthoughTitle26coversmanysourcesoffederal

revenue,itdoesnotcoverallsources.Revenuefromexcisetaxesandtariffscouldbeaddedtothistooltoprovideafullerpictureoffederalrevenue.

4Wewilldemonstratethiswithacomparisonofthetaxtreatmentof501(c)(3)nonprofitcorporationstothetreatmentof501(c)(4)nonprofitcorporations.

vi

?Comparewithforeigntaxcodes.Becausemultinationalcorporationsandotherentitiesmightseektogamethetaxsystems,acomparisonoftheU.S.taxcodewithforeigntaxcodescouldhelpidentifyareasthatcouldbetargetedtoreducetaxavoidance.

?Informmodelsoftaxation.Bymappingoutalltheconnectionsinthetaxcode,thistypeoftoolcancapturenuancesmissedbysimplemodelsoftaxation.CATcanalsobeusedtoinformmodelsofproposedchangestothetaxcodebytrackingthefirst-orderchangesandhowthosepropagatetootherpartsofthetaxcode.

Contents

vii

AboutThisWorkingPaper iii

Summary iv

FiguresandTables viii

Chapter1.Introduction 1

Approach 1

Limitations 2

OrganizationofThisWorkingPaper 2

Chapter2.DataSourceandApproach 4

Title26oftheU.S.Code 4

MappingtheRelationshipsintheTaxCode 5

KeywordIdentificationandSorting 6

Example 9

Chapter3.AnalysisUsingtheTaxCodeAnalysisTool 11

IdentifyingtheKeySectionsoftheTaxCode 11

ComparingSectionsoftheTaxCode 12

Chapter4.ConnectiontoIRSData 15

Chapter5.Conclusion 23

FutureWork 23

AppendixA.Complexity 25

ComplexityMeasure1:NodeCentralityMeasures 25

ComplexityMeasure2:NodeDegreeDistribution 27

ComplexityMeasures3,4,and5:DegreeandNodeTypeAssortativityandModularity 32

Implications 33

AppendixB.MappingtheTaxCodeAnalysisTooltoIRSForms 34

Abbreviations 39

References 40

FiguresandTables

viii

Figures

Figure2.1.Co-OccurrenceMatrixHeatmapfortheTop24ConceptTerms 8

Figure2.2.Section3126withConcepts,Entities,andIndexesHighlighted 9

Figure2.3.GraphoftheSection3126Branch 10

Figure3.1.Subgraphsfor501(c)(3)and501(c)(4)fromtheTaxCodeAnalysisTool 13

Figure4.2.BreakoutofCorporateIncomeTaxItems 19

Figure4.3.TaxCodeAnalysisToolGraphofAdoption 21

FigureA.1.NodeDegreeDistribution 28

FigureA.2.NodeDegreeProbabilityDistributionandNodeDegreeComplementaryCumulative

Distribution 29

FigureA.3.Log-ScaleDegreeDistribution,byNodeType 31

FigureB.1.FrequencyRadarofSectionsReportedinIRSFormsLinkedtoS-Corporations 37

FigureB.2.FrequencyandAmountAgainstSectionsofTitle26 37

Tables

Table2.1.SummaryofTitle26,InternalRevenueCode 4

Table2.2.Most-FrequentlyCitedEntitiesinTitle26 6

TableA.1.TopFiveNodes,byDegreeCentrality 25

TableA.2.TopFiveNodes,byClosenessCentrality 26

TableB.1.CorporationTaxReturns:EstimatedNumberofActiveReturnsforTaxYears2018–

2021 35

TableB.2.SampleTableShowingDetailsofLines,Sections,andQuantificationTerms 36

Chapter1.Introduction

1

Title26oftheU.S.CodeistheInternalRevenueCode,whichcontainsmorethan5,500

sectionsandismorethan7,000pageslong.5Thistitledetailsincometaxesforindividualsandbusinesses.Infiscalyear2023,theInternalRevenueService(IRS)raisedabout$3trillionfromthesetaxes.

Giventhecomplexityofthetaxcode,itischallengingtodeterminehowchangesinone

sectionmightaffectotherareas.Forexample,whentheRevenueActof1978includedSection

401(k),theintentwastoadjusthowdeferredcompensationwastaxed,nottocreateanew

savingsvehiclethatnowhousesmanytrillionsofdollarsinretirementsavingsforU.S.workers.6Asthetaxcodehasgrown,thecomplexityandriskofunintendedconsequenceshavegrownaswell.

Inthisworkingpaper,wedescribeatoolthatwecreatedtoanalyzetheU.S.taxcodeasa

graphstructuretomorenaturallyexploretheconnections.Agraphisamathematicalstructureinwhichnodesareconnectedbyedges.Inourcase,thenodescanbeportionsofthelaw(indexes),entities(thepeopleandorganizationsdescribedintheU.S.Code),andconcepts(other

keywords).Theedgesrepresentrelationshipsbetweenthenodes,suchascontains(forwhenanentityorconceptiscontainedwithinanindex)orreferences(forwhenoneindexcontainsa

referencetoanotherindex).ByapplyingagraphstructuretotheU.S.taxcode,wecanmarshalthetoolsofnetworkanalysistoquantifythecomplexity,findconnections,andcreateastructuretowhichwecanconnectadditionalinformation.

Approach

Thisgraphdatabase,calledtheTaxCodeAnalysisTool(CAT),canbeusedtodescribetherelationshipsbetweendifferentpartsofthelaw(e.g.,achapter,section,orsubsection),the

entitiesthosepartsrelateto,andtherelevantconceptsincludedinthepart.Webuiltthistoolbyfirstapplyingnaturallanguageprocessingtomaptheconnectionsbetweentaxcodesections.

Specifically,thetwotypesofconnectionsarereferences(whenonepartreferencesanother)andhierarchicalrelationships(whenonepartcontainsanother,suchasasubsectionwithinasectionorasectionwithinachapter).

Next,weusedalargelanguagemodel(LLM)toexamineeachsectionandextractkeywords,includingtheentities(e.g.,individuals,partnerships,corporations)andtheconcepts(e.g.,

5OfficeoftheLawRevisionCounsel(OLRC),“UnitedStatesCode,”webpage,undated.

6KathleenElkins,“ABriefHistoryofthe401(k),WhichChangedHowAmericansRetire,”CNBC,January4,2017.

2

income,deduction,employee,orcredit).7Theentitiesandconceptsarelinkedtothepartofthelawinwhichtheywerereferenced.Wethenmanuallyprunedthelistsofentitiesandconceptstocreateaconsistentandcoherentlist.

Finally,wecanlinktheseentitiesandconceptstootherdatasets,includingsurveydata

describingthepopulationoftheentitiesandIRSandotherdatasourcesabouttheconcepts.WehavelinkedCATonlytosomeIRSdatainthisiteration(discussedinChapter4)butintendtouseadditionaldatasetsinthefuture.ThisresultsinalargegraphthatlinksthetextoftheU.S.

Codetotheentitiesandconceptsitcoversandtodataaboutthoseentitiesandconcepts.

Limitations

Inthisworkingpaper,wedescribeourconstructionofatoolwithagraphstructurebasedonTitle26oftheU.S.Code.Naturally,therearelimitations,bothinthetool’sscopeandinour

approachtoconstructingthistool.

Asof2025,CATdoesnotincludesectionsreferencingTitle26fromothertitles,butitdoesincludereferenceswithinTitle26tosectionsinothertitles.Thus,thegraphmightnotcontainsomerelevantnodesandedges.AtoolbuiltonthefullU.S.Codewouldaddressthisconcern

andisoneofourlong-termgoalsofthebroaderRANDBudgetInitiative.

Similarly,statutesareonlyapartofwhatisrelevantindeterminingoutcomes,suchas

revenue,taxincidence,orotherpolicy-relevantdetails.Regulationsandotherpolicydocumentsdeterminetheinterpretationofthecode.Thestateoftheeconomy,includingincomedistributionacrossindividualsandfirms,isrelevanttodecidingtherevenue.Likewise,thedegreeof

compliance,enforcement,andresourcesforIRSoversightisrelevanttoo.

WehaveincludedsomeIRSdataatthisstage.InfutureversionsofCAT,wehopetoincludeotherdatasourcesandaddthecontentsoftheFederalRegister.

WeanticipatecontinuouslydevelopingCATforsometime,soweplantoaddressseveraloftheselimitationsinfutureversionsofthetoolandofthisworkingpaper.

OrganizationofThisWorkingPaper

InChapter2,wedescribethemethodologyforconstructingCATinmoredetail.InChapter3,wepresentsomeusecasesforthecodeanalysistool.InChapter4,wedescribeusecases

whenconnectedtoIRSdata.InChapter5,wedescribeourexistingplansforfutureimprovementstoCAT.

Thisworkingpaperalsoincludestwotechnicalappendixesthatareintendedforreaderswithsomebackgroundingraphtheoryandnetworkanalysis.InAppendixA,wediscussthenetwork

7Specifically,weusedGPT4.

3

analysismeasures.InAppendixB,wedemonstrateanapproachformappingtheIRSformsforSCorporationswiththetool.

Chapter2.DataSourceandApproach

4

Inthischapter,wedescribethetextofTitle26,theprocessofbuildingCAT,anoverviewoftheanalysis,andtheadditionaldatasourcesusedintheanalysis.Theprimarytextforthis

analysisisTitle26,InternalRevenueCodeoftheU.S.Code,assourcedfromtheOLRC.8

Title26oftheU.S.Code

Title26iscomposedof11subtitlesfromAtoK,chaptersfrom1to100,andsectionsfrom1to9834.Table2.1summarizesTitle26innumericalorderofitschapters,subchapters,and

sections.ThenumberingofthechaptersandsectionsisnotcontinuousbecausetheU.S.tax

code’sstructurehasbeenamendedovertime,leadingtodiscontinuitiesinchapterandsection

numbering.Title26hasbeenamendednumeroustimessinceitsinception,and,sometimes,thesectionandsubsectionnumbersmightnotbeusedaftertheyarerepealed.Moreover,

occasionallyunusedsectionnumbersareintentionallyleftemptytoaccommodatefuture

amendmentsortheadditionofnewsectionsorsubsections.Thisdoesnotaffectthestructureofourwork,butitdoesmeanthatwemustpaycarefulattentiontothenumberingwhendevelopingamapping.

Table2.1.SummaryofTitle26,InternalRevenueCode

Subtitle

ChaptersSubchaptersSectionsIncluded

A—IncomeTaxes1–6261–1564

B—EstateandGiftTaxes11–15132001–2801

C—EmploymentTaxes21–2593101–3512

D—MiscellaneousExciseTaxes31–50A334001–5000D

E—Alcohol,Tobacco,andCertainOtherExciseTaxes51–55215001–5891

F—ProcedureandAdministration61–80456001–7874

G—TheJointCommitteeonTaxation91–9208001–8023

H—FinancingofPresidentialElectionCampaigns95–9609001–9042

8OLRC,undated.

5

Subtitle

Chapters

Subchapters

SectionsIncluded

I—TrustFundCode

98

2

9500–9602

J—CoalIndustryHealthBenefits

99

4

9702–9722

K—GroupHealthPlanRequirements

100

3

9801–9834

SOURCE:FeaturesinformationfromOLRC,undated;U.S.Code,Title26.ThesewerepulledMarch10,2025,andmaynolongerbeaccurate.

MappingtheRelationshipsintheTaxCode

TogenerateCAT,wemadeaPythonscriptthatextractstextfromthetaxcodeandstructuresitintoJavaScriptObjectNotation(JSON)topreservethehierarchicalstructure(e.g.,title,

subtitle,chapter).WethenmodeledthistaxcodeinaNeo4jgraphdatabase.Therearethreekindsofnodesinthisgraphdatabase:

?entity:asubjectwhowouldtypicallybeaffectedbythetaxcode(e.g.,secretary,farmingbusiness)

?concept:anabstractterm(e.g.,taxexception,loan)

?index:alineorheaderinthetaxcode(e.g.,chapter,section).

Wedescribethespecificapproachforproducingthesetofconceptsandentitiesinthenextsectioninthischapter.

Usingascript,weextractedentitiesandconceptsinthetaxcodeandmappedrelationships

betweenthoseentitiesorconceptsandthetaxcodeline(index)referenced.Thekindsof

relationshipsconceptsorentitiescanhavewithindexnodescanbeDEFINED_IN(theentityorconceptisdefinedinthatindex)orREFERENCED_IN(theentityorconceptisreferencedin

thatindex).ThekindsofrelationshipsbetweenindexesareINCLUDES(anindexisdirectlya

higherlevelthananotherindex,suchasachapterincludingasubchapter),PART_OF(anindexisdirectlyalowerlevelthananotherindex,suchasasubsectionbeingpartofasection),

REFERENCES(anindexreferencesanotherindexinthetext),andREFERENCED_IN(an

indexisreferencedinthetextofanotherindex).TogettheINCLUDES/PART_OFrelationships,weusedtheJSONformatofthetaxcodetodeterminethehierarchyandrelationshipbetween

indexes.TogettheREFERENCES/REFERENCED_INrelationships,weusedanLLMto

extractallinstancesinwhichanindexismentionedinataxcodeline.Itshouldbenotedthat

everyrelationshipisreciprocal.Forexample,ifthereisaREFERENCED_INrelationshipgoingfromnodeAtonodeB,thenthereisaREFERENCESrelationshipgoingfromnodeBtonodeA.Thismakesthegraphmoreundirected,makingiteasiertowritegraphqueriesthattraverse

thegraphinanydirection.Moredetailsabouttheapproach,includingthespecificprompts,canbefoundinAppendixA.

Thegraphhas27,694nodesand,whenconsideringtheundirectedgraph,297,341edges.Ifthetaxcodewerefullyconnected,itwouldhave766,929,942edges.However,onlyabout0.04

6

percentofthepossibleedgesexistinthegraph,indicatingsubstantialinformationintheexistenceofaconnectionbetweentwonodes.9

KeywordIdentificationandSorting

Weextractedthetextfromeachsection,includingentitiesandconcepts.Byentity,wemeanallcategoriesofnounsclassifyingindividuals,corporations,businesses,andgovernmententitiesmentionedinasection.Byconcept,wemeankeytermsthatrelatetothepurposeofthesection;suchtermsincludecredit,deduction,andincome.10WeusedGPT-4owithchain-of-thought

promptingtoextracttheentitiesandconceptsfromTitle26.Theselistswerecleanedbymergingduplicates(e.g.,“l(fā)oss”and“l(fā)osses”or“Rule”and“rule”).

Forentities,wemanuallyreviewedtheentitylistproducedbytheLLMforallthesections—fromtheirhierarchyofmajorsectionstotheclauses—andlistedalltheentitiesusedthere.We

foundmorethan3,300entitieswithafrequencyofmorethan12,500times.Themostquoted

entitywas“Secretary,”referredto1,308times,11followedby“Taxpayer”and“Individual”(seeTable2.2).

Table2.2.Most-FrequentlyCitedEntitiesinTitle26

Entity

NumberofMentions

Secretary1,308

Taxpayer706

Individual265

UnitedStates236

Corporation234

Partnership(s)219

Person211

Employer173

Trust162

Employee120

9Hereweuseinformationinthesenseofinformationtheory.Becausethevastmajorityofpossibleedgesarenotpresent,thepresenceofanedgeconveysinformation.

10Thedistinctionbetweenentitiesandconceptsistheabilitytomakedecisions.Entities,suchasindividualsandfirms,makedecisionsthathavetaximplications.Entitieswillalwaysbenouns,whileconceptscanbeanypartofspeech.

11Infutureversions,weintendtodisambiguate“Secretary”tospecifywhichSecretary.So,forexample,wecoulddistinguishreferencestotheSecretaryoftheTreasuryfromthosefortheSecretaryofCommerce.

7

Entity

Numberof

Mentions

Shareholder(s)

114

Foreigncorporation(s)

106

Taxcourt

104

Scorporation

102

Partner(s)

91

State

85

Estate

83

Spouse

74

Relatedperson(s)

69

Controlledforeigncorporation

67

Domesticcorporation

67

Beneficiary

63

Participant(s)

59

Weextractedconceptsusingthesamegeneralapproachthatweappliedtoentities.We

extracted18,133potentialconcepts,ofwhich14,787conceptsappearedonlyonceinthecorpus.Wethencleanedthislistbymergingduplicatescreatedbycapitalization,pluralization,orotherlinguisticvariations.Toquantifythemeaningfulrelationshipsbetweentheextractedconcepts,

wedevelopedaco-occurrencematrixusingmultiplestrings,inwhichthemultipletermsappeartogetherandthefrequencyofeachterminthetaxcodeispresentedasthecellvalue.Wefilteredthedatasettoidentifytherowsinwhichthemost-frequentlyc

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
  • 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
  • 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
  • 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
  • 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論