版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報或認(rèn)領(lǐng)
文檔簡介
WorkingPaper
CARTERC.PRICE,LEONARDOBUENO,KARAJIA,SABAHATZAFAR,GEORGEZUO
TaxCodeAnalysis
Tool
ApplyingMachineLearningtoMaptheTaxCode
RANDEducationandLabor
WR-A4129-1
August2025
PreparedfortheDiller-VonFurstenbergFamilyFoundation
ThisworkingpaperhasbeenapprovedforcirculationbyRANDEducationandLabor.Unlessotherwiseindicated,
workingpaperscanbequotedandcitedwithoutpermissionoftheauthor,providedthesourceisclearlyreferredtoasaworkingpaper.RAND’spublicationsdonotnecessarilyreflecttheopinionsofitsresearchclientsandsponsors.isaregisteredtrademark.Learnmoreat
.
WR-A4129-1
Formoreinformationonthispublication,visit
/t/WRA4129-1
.
AboutRAND
RANDisaresearchorganizationthatdevelopssolutionstopublicpolicychallengestohelpmakecommunitiesthroughouttheworldsaferandmoresecure,healthierandmoreprosperous.RANDisnonprofit,nonpartisan,andcommittedtothepublicinterest.TolearnmoreaboutRAND,visit
.
ResearchIntegrity
Ourmissiontohelpimprovepolicyanddecisionmakingthroughresearchandanalysisisenabledthroughourcorevaluesofqualityandobjectivityandourunwaveringcommitmenttothehighestlevelofintegrityandethicalbehavior.Tohelpensureourresearchandanalysisarerigorous,objective,andnonpartisan,wesubjectourresearchpublicationstoarobustandexactingquality-assuranceprocess;avoidboththeappearanceandrealityoffinancialandotherconflictsofinterestthroughstafftraining,projectscreening,andapolicyofmandatorydisclosure;andpursuetransparencyinourresearchengagementsthroughourcommitmenttotheopenpublicationofourresearchfindingsandrecommendations,disclosureofthesourceoffundingofpublishedresearch,andpoliciestoensureintellectualindependence.Formoreinformation,visit
/about/research-integrity
.
RAND’spublicationsdonotnecessarilyreflecttheopinionsofitsresearchclientsandsponsors.
PublishedbytheRANDCorporation,SantaMonica,Calif.
?2025RANDCorporation
RANDeisaregisteredtrademark.
LimitedPrintandElectronicDistributionRights
Thispublicationandtrademark(s)containedhereinareprotectedbylaw.ThisrepresentationofRANDintellectualpropertyisprovidedfornoncommercialuseonly.Unauthorizedpostingofthispublicationonlineisprohibited;linkingdirectlytoitswebpageonisencouraged.PermissionisrequiredfromRANDtoreproduce,orreuseinanotherform,anyofitsresearchproductsforcommercialpurposes.Forinformationonreprintandreusepermissions,visit
/about/publishing/permissions
.
AboutThisWorkingPaper
iii
Inthisworkingpaper,wedescribeatoolthatwecreatedbyapplyingnaturallanguage
processingmethodstoproduceagraphdatabaseoflegaltext.ThetooliscalledtheTaxCode
AnalysisTool(CAT),anditconsistsofacomprehensivegraphdatabasemappingofTitle26oftheU.S.Codeoninternalrevenue.CATconnectsvariousentities(suchasindividuals,married
couples,partnerships,orcorporations)andtaxconcepts(suchaswages,deductions,orcredits)
andlinksthemtotheirspecificsectionsintheU.S.federaltaxcode.Thisenablesusersto
comparethetreatmentofdifferententitiesandconcepts,tracktheirinteractions,andinvestigatethepotentialimplicationsofproposedlegislativechanges.Additionally,CATcanbeconnectedtootherdatasets,suchasInternalRevenueServicedataontaxreturnsorsurveydataontheU.S.population,toprovideadditionalinformationabouttherelativeimpactsofdifferentconceptsandsections.
Withthisworkingpaper,weseektoobtainpreliminaryfeedbackonCATfromexpertsintaxpolicyandrelatedareas.Inits2025form,CATisintendedforusebyresearcherswithdatabasequerylanguageexperienceandthosewhoareinterestedintextualanalysisoftheU.S.taxcode.WearefurtherdevelopingCATtomakeitaccessibletouserswithoutdatabasequerylanguageexperience.Futureversionsofthetoolmightbeofinteresttothebroadercommunityoftax
accountants,lawyers,andpolicymakersinterestedinstudyingtheramificationsofU.S.federalfiscalpolicy.
AboutRANDEducationandLabor
ThisworkwasconductedwithinRANDEducationandLabor,adivisionofRANDthat
conductsresearchonearlychildhoodthroughpostsecondaryeducationprograms,workforce
development,andprogramsandpoliciesaffectingworkers,entrepreneurship,andfinancial
literacy,anddecisionmaking.Formoreinformation,visit
/education-and-labor
oremail
educationandlabor@
.
Funding
ThisresearchwassponsoredbytheDiller-VonFurstenbergFamilyFoundation.
Acknowledgments
WethankthemanystaffandcolleaguesatRANDwhohelpedusherthisprojectforward.Inparticular,wewouldliketothankPhilipArmour,ElviraLoredo,andKarlynStanleyfortheir
thoughtfulandthoroughreviewsofthiswork.
Summary
iv
Inthisworkingpaper,wedescribeourmethodstoconstructagraphdatabasethatwecalltheTaxCodeAnalysisTool(CAT).Thepurposeofthisworkingpaperistoobtainfeedbackfrom
expertsintaxpolicyandrelatedareasonthisfirstiterationofCAT.
CATisalargegraphdatabasethatlinksthetextofTitle26oftheU.S.Codetotheentitiesandconceptsitcovers.1Title26specifiestheincometaxesforindividualsandbusinessesandhasmorethan5,500sections.Thiscomplexitymakesitanaturaltargetformoderntextanalysismethods,includinglargelanguagemodels(LLMs).
Approach
CATgraphicallydescribestherelationshipbetweendifferentpartsofthelaw(e.g.,achapter,section,orsubsection),theentitiesthosepartsrelateto,andtherelevantconceptsincludedinthepart.Webuiltthistoolbyapplyingnaturallanguageprocessingtomaptheconnectionsbetweentaxcodesections.Specifically,thetwotypesofconnectionsarereferences(whenonepart
referencesanotherpart)andhierarchicalrelationships(whenonepartcontainsanother,suchasasubsectionwithinasectionorasectionwithinachapter).
WeusedanLLMtoexamineeachsectionandextractkeywords,includingtheentities(e.g.,individuals,partnerships,corporations)andtheconcepts(e.g.,income,deduction,employee,orcredit).2Theentitiesandconceptsarelinkedtothepartofthelawinwhichtheywerereferenced.Wethenmanuallyprunedthelistsofentitiesandconceptstocreateaconsistentandcoherent
list.WeintendtoreleaseCATasopen-sourcesoftwareinthefuture.
Uses
WedemonstratethefollowingfourusecasesforthisiterationofCAT:
1.identifyingtheimportanceofsectionsofthetaxcodeortaxconceptsusingthenumberandnatureofconnections3
2.mappingtheconnectionsofaconcept,entity,orindextootherconcepts,entities,andindexesinthetaxcodetounderstandhowtheyrelate
1U.S.Code,Title26,InternalRevenueCode.
2Specifically,weusedGPT-4o.
3Importancecanmeanthenumberofcitationsinothersectionsorsectionsthatincludeaconceptorentity.Forexample,thiscanbeusedtoidentifywhichsectionsofthetaxcodearemostcommonlyreferencedbyother
sections.
v
3.comparingthetreatmentofdifferenttaxconceptsorentities4
4.linkingthetaxcodetothefiscalimportanceofdifferentconcepts.
Theseareonlyafewofthepossibleapplicationsforthetool,andweexpectthatuserswillfindotheruses.
FutureWork
Thepathsforfutureworkincludethefollowing:
?Refineandorganizetheconceptsandentities.Therearerelationshipsbetweenthe
differenttypesofconceptsandentitiesthatarenotcapturedinthetoolatthetimeofthiswriting.Forexample,CcorporationsandScorporationsarebothtypesofcorporations.Similarly,corporationsandpartnershipscanbothbetypesofbusinesses.Thus,
hierarchicalrelationshipsbetweenentitiesandconceptscanprovidehelpfulinformationfortaxcodemodeling.
?MapthetaxcodetoInternalRevenueService(IRS)forms.BecauseIRSformsareakeymechanismforcollectingandreportinginformation,itwouldbehelpfultomapthetaxcodetotheforms.ThiswouldimprovethereliabilityofthedataconnectionsthataredescribedinChapter4.
?Linkthetaxcodetoindividualsurveydata.InadditiontoIRSdatasources,such
sourcesastheAmericanCommunitySurvey,CurrentPopulationSurvey,andtheSurveyofConsumerFinancecapturerepresentativeinformationabouttheU.S.population.
Similarly,bylinkingthetaxcodetothesedatasets,wecandescribetheuniverseofallpossiblepeoplewhocouldparticipateinsomeprogramsorreceiveacertaintaxcredit.Additionally,wecouldusetheconnectiontothetaxdatatorefinetheestimatesinthesurveydata.
?Linkthetaxcodetofirmdata.Thetaxcodealsoappliestobusinesses,soitwouldbeusefultolinkthetaxcodetosuchdatasetsastheAmericanBusinessSurveyandthe
SmallBusinessCreditSurvey,tobetterunderstandhowdifferentpartsofthetaxcoderelatetofirms.
?Developagraphicaluserinterface.Theexistingtoolisnotstructuredforageneral
audienceandrequiressomefamiliaritywiththeCypherquerylanguage.Infuture
iterations,alargeraudiencecouldperformqueriesthroughagraphicaluserinterface.
?Addotherlegalsources.TheU.S.Codeisnotthesolesourceoftaxpolicy,andsuch
sourcesastheFederalRegister,courtorders,andpolicydirectivescouldalsobeincludedinthecodeanalysistool.Thesewouldbeaddedasadditionalclassesofnodesandlinkedtotheconcepts,entities,andindexesinthesamefashionastheexistingcodeanalysis
tool.
?Includeotherrevenuesources.AlthoughTitle26coversmanysourcesoffederal
revenue,itdoesnotcoverallsources.Revenuefromexcisetaxesandtariffscouldbeaddedtothistooltoprovideafullerpictureoffederalrevenue.
4Wewilldemonstratethiswithacomparisonofthetaxtreatmentof501(c)(3)nonprofitcorporationstothetreatmentof501(c)(4)nonprofitcorporations.
vi
?Comparewithforeigntaxcodes.Becausemultinationalcorporationsandotherentitiesmightseektogamethetaxsystems,acomparisonoftheU.S.taxcodewithforeigntaxcodescouldhelpidentifyareasthatcouldbetargetedtoreducetaxavoidance.
?Informmodelsoftaxation.Bymappingoutalltheconnectionsinthetaxcode,thistypeoftoolcancapturenuancesmissedbysimplemodelsoftaxation.CATcanalsobeusedtoinformmodelsofproposedchangestothetaxcodebytrackingthefirst-orderchangesandhowthosepropagatetootherpartsofthetaxcode.
Contents
vii
AboutThisWorkingPaper iii
Summary iv
FiguresandTables viii
Chapter1.Introduction 1
Approach 1
Limitations 2
OrganizationofThisWorkingPaper 2
Chapter2.DataSourceandApproach 4
Title26oftheU.S.Code 4
MappingtheRelationshipsintheTaxCode 5
KeywordIdentificationandSorting 6
Example 9
Chapter3.AnalysisUsingtheTaxCodeAnalysisTool 11
IdentifyingtheKeySectionsoftheTaxCode 11
ComparingSectionsoftheTaxCode 12
Chapter4.ConnectiontoIRSData 15
Chapter5.Conclusion 23
FutureWork 23
AppendixA.Complexity 25
ComplexityMeasure1:NodeCentralityMeasures 25
ComplexityMeasure2:NodeDegreeDistribution 27
ComplexityMeasures3,4,and5:DegreeandNodeTypeAssortativityandModularity 32
Implications 33
AppendixB.MappingtheTaxCodeAnalysisTooltoIRSForms 34
Abbreviations 39
References 40
FiguresandTables
viii
Figures
Figure2.1.Co-OccurrenceMatrixHeatmapfortheTop24ConceptTerms 8
Figure2.2.Section3126withConcepts,Entities,andIndexesHighlighted 9
Figure2.3.GraphoftheSection3126Branch 10
Figure3.1.Subgraphsfor501(c)(3)and501(c)(4)fromtheTaxCodeAnalysisTool 13
Figure4.2.BreakoutofCorporateIncomeTaxItems 19
Figure4.3.TaxCodeAnalysisToolGraphofAdoption 21
FigureA.1.NodeDegreeDistribution 28
FigureA.2.NodeDegreeProbabilityDistributionandNodeDegreeComplementaryCumulative
Distribution 29
FigureA.3.Log-ScaleDegreeDistribution,byNodeType 31
FigureB.1.FrequencyRadarofSectionsReportedinIRSFormsLinkedtoS-Corporations 37
FigureB.2.FrequencyandAmountAgainstSectionsofTitle26 37
Tables
Table2.1.SummaryofTitle26,InternalRevenueCode 4
Table2.2.Most-FrequentlyCitedEntitiesinTitle26 6
TableA.1.TopFiveNodes,byDegreeCentrality 25
TableA.2.TopFiveNodes,byClosenessCentrality 26
TableB.1.CorporationTaxReturns:EstimatedNumberofActiveReturnsforTaxYears2018–
2021 35
TableB.2.SampleTableShowingDetailsofLines,Sections,andQuantificationTerms 36
Chapter1.Introduction
1
Title26oftheU.S.CodeistheInternalRevenueCode,whichcontainsmorethan5,500
sectionsandismorethan7,000pageslong.5Thistitledetailsincometaxesforindividualsandbusinesses.Infiscalyear2023,theInternalRevenueService(IRS)raisedabout$3trillionfromthesetaxes.
Giventhecomplexityofthetaxcode,itischallengingtodeterminehowchangesinone
sectionmightaffectotherareas.Forexample,whentheRevenueActof1978includedSection
401(k),theintentwastoadjusthowdeferredcompensationwastaxed,nottocreateanew
savingsvehiclethatnowhousesmanytrillionsofdollarsinretirementsavingsforU.S.workers.6Asthetaxcodehasgrown,thecomplexityandriskofunintendedconsequenceshavegrownaswell.
Inthisworkingpaper,wedescribeatoolthatwecreatedtoanalyzetheU.S.taxcodeasa
graphstructuretomorenaturallyexploretheconnections.Agraphisamathematicalstructureinwhichnodesareconnectedbyedges.Inourcase,thenodescanbeportionsofthelaw(indexes),entities(thepeopleandorganizationsdescribedintheU.S.Code),andconcepts(other
keywords).Theedgesrepresentrelationshipsbetweenthenodes,suchascontains(forwhenanentityorconceptiscontainedwithinanindex)orreferences(forwhenoneindexcontainsa
referencetoanotherindex).ByapplyingagraphstructuretotheU.S.taxcode,wecanmarshalthetoolsofnetworkanalysistoquantifythecomplexity,findconnections,andcreateastructuretowhichwecanconnectadditionalinformation.
Approach
Thisgraphdatabase,calledtheTaxCodeAnalysisTool(CAT),canbeusedtodescribetherelationshipsbetweendifferentpartsofthelaw(e.g.,achapter,section,orsubsection),the
entitiesthosepartsrelateto,andtherelevantconceptsincludedinthepart.Webuiltthistoolbyfirstapplyingnaturallanguageprocessingtomaptheconnectionsbetweentaxcodesections.
Specifically,thetwotypesofconnectionsarereferences(whenonepartreferencesanother)andhierarchicalrelationships(whenonepartcontainsanother,suchasasubsectionwithinasectionorasectionwithinachapter).
Next,weusedalargelanguagemodel(LLM)toexamineeachsectionandextractkeywords,includingtheentities(e.g.,individuals,partnerships,corporations)andtheconcepts(e.g.,
5OfficeoftheLawRevisionCounsel(OLRC),“UnitedStatesCode,”webpage,undated.
6KathleenElkins,“ABriefHistoryofthe401(k),WhichChangedHowAmericansRetire,”CNBC,January4,2017.
2
income,deduction,employee,orcredit).7Theentitiesandconceptsarelinkedtothepartofthelawinwhichtheywerereferenced.Wethenmanuallyprunedthelistsofentitiesandconceptstocreateaconsistentandcoherentlist.
Finally,wecanlinktheseentitiesandconceptstootherdatasets,includingsurveydata
describingthepopulationoftheentitiesandIRSandotherdatasourcesabouttheconcepts.WehavelinkedCATonlytosomeIRSdatainthisiteration(discussedinChapter4)butintendtouseadditionaldatasetsinthefuture.ThisresultsinalargegraphthatlinksthetextoftheU.S.
Codetotheentitiesandconceptsitcoversandtodataaboutthoseentitiesandconcepts.
Limitations
Inthisworkingpaper,wedescribeourconstructionofatoolwithagraphstructurebasedonTitle26oftheU.S.Code.Naturally,therearelimitations,bothinthetool’sscopeandinour
approachtoconstructingthistool.
Asof2025,CATdoesnotincludesectionsreferencingTitle26fromothertitles,butitdoesincludereferenceswithinTitle26tosectionsinothertitles.Thus,thegraphmightnotcontainsomerelevantnodesandedges.AtoolbuiltonthefullU.S.Codewouldaddressthisconcern
andisoneofourlong-termgoalsofthebroaderRANDBudgetInitiative.
Similarly,statutesareonlyapartofwhatisrelevantindeterminingoutcomes,suchas
revenue,taxincidence,orotherpolicy-relevantdetails.Regulationsandotherpolicydocumentsdeterminetheinterpretationofthecode.Thestateoftheeconomy,includingincomedistributionacrossindividualsandfirms,isrelevanttodecidingtherevenue.Likewise,thedegreeof
compliance,enforcement,andresourcesforIRSoversightisrelevanttoo.
WehaveincludedsomeIRSdataatthisstage.InfutureversionsofCAT,wehopetoincludeotherdatasourcesandaddthecontentsoftheFederalRegister.
WeanticipatecontinuouslydevelopingCATforsometime,soweplantoaddressseveraloftheselimitationsinfutureversionsofthetoolandofthisworkingpaper.
OrganizationofThisWorkingPaper
InChapter2,wedescribethemethodologyforconstructingCATinmoredetail.InChapter3,wepresentsomeusecasesforthecodeanalysistool.InChapter4,wedescribeusecases
whenconnectedtoIRSdata.InChapter5,wedescribeourexistingplansforfutureimprovementstoCAT.
Thisworkingpaperalsoincludestwotechnicalappendixesthatareintendedforreaderswithsomebackgroundingraphtheoryandnetworkanalysis.InAppendixA,wediscussthenetwork
7Specifically,weusedGPT4.
3
analysismeasures.InAppendixB,wedemonstrateanapproachformappingtheIRSformsforSCorporationswiththetool.
Chapter2.DataSourceandApproach
4
Inthischapter,wedescribethetextofTitle26,theprocessofbuildingCAT,anoverviewoftheanalysis,andtheadditionaldatasourcesusedintheanalysis.Theprimarytextforthis
analysisisTitle26,InternalRevenueCodeoftheU.S.Code,assourcedfromtheOLRC.8
Title26oftheU.S.Code
Title26iscomposedof11subtitlesfromAtoK,chaptersfrom1to100,andsectionsfrom1to9834.Table2.1summarizesTitle26innumericalorderofitschapters,subchapters,and
sections.ThenumberingofthechaptersandsectionsisnotcontinuousbecausetheU.S.tax
code’sstructurehasbeenamendedovertime,leadingtodiscontinuitiesinchapterandsection
numbering.Title26hasbeenamendednumeroustimessinceitsinception,and,sometimes,thesectionandsubsectionnumbersmightnotbeusedaftertheyarerepealed.Moreover,
occasionallyunusedsectionnumbersareintentionallyleftemptytoaccommodatefuture
amendmentsortheadditionofnewsectionsorsubsections.Thisdoesnotaffectthestructureofourwork,butitdoesmeanthatwemustpaycarefulattentiontothenumberingwhendevelopingamapping.
Table2.1.SummaryofTitle26,InternalRevenueCode
Subtitle
ChaptersSubchaptersSectionsIncluded
A—IncomeTaxes1–6261–1564
B—EstateandGiftTaxes11–15132001–2801
C—EmploymentTaxes21–2593101–3512
D—MiscellaneousExciseTaxes31–50A334001–5000D
E—Alcohol,Tobacco,andCertainOtherExciseTaxes51–55215001–5891
F—ProcedureandAdministration61–80456001–7874
G—TheJointCommitteeonTaxation91–9208001–8023
H—FinancingofPresidentialElectionCampaigns95–9609001–9042
8OLRC,undated.
5
Subtitle
Chapters
Subchapters
SectionsIncluded
I—TrustFundCode
98
2
9500–9602
J—CoalIndustryHealthBenefits
99
4
9702–9722
K—GroupHealthPlanRequirements
100
3
9801–9834
SOURCE:FeaturesinformationfromOLRC,undated;U.S.Code,Title26.ThesewerepulledMarch10,2025,andmaynolongerbeaccurate.
MappingtheRelationshipsintheTaxCode
TogenerateCAT,wemadeaPythonscriptthatextractstextfromthetaxcodeandstructuresitintoJavaScriptObjectNotation(JSON)topreservethehierarchicalstructure(e.g.,title,
subtitle,chapter).WethenmodeledthistaxcodeinaNeo4jgraphdatabase.Therearethreekindsofnodesinthisgraphdatabase:
?entity:asubjectwhowouldtypicallybeaffectedbythetaxcode(e.g.,secretary,farmingbusiness)
?concept:anabstractterm(e.g.,taxexception,loan)
?index:alineorheaderinthetaxcode(e.g.,chapter,section).
Wedescribethespecificapproachforproducingthesetofconceptsandentitiesinthenextsectioninthischapter.
Usingascript,weextractedentitiesandconceptsinthetaxcodeandmappedrelationships
betweenthoseentitiesorconceptsandthetaxcodeline(index)referenced.Thekindsof
relationshipsconceptsorentitiescanhavewithindexnodescanbeDEFINED_IN(theentityorconceptisdefinedinthatindex)orREFERENCED_IN(theentityorconceptisreferencedin
thatindex).ThekindsofrelationshipsbetweenindexesareINCLUDES(anindexisdirectlya
higherlevelthananotherindex,suchasachapterincludingasubchapter),PART_OF(anindexisdirectlyalowerlevelthananotherindex,suchasasubsectionbeingpartofasection),
REFERENCES(anindexreferencesanotherindexinthetext),andREFERENCED_IN(an
indexisreferencedinthetextofanotherindex).TogettheINCLUDES/PART_OFrelationships,weusedtheJSONformatofthetaxcodetodeterminethehierarchyandrelationshipbetween
indexes.TogettheREFERENCES/REFERENCED_INrelationships,weusedanLLMto
extractallinstancesinwhichanindexismentionedinataxcodeline.Itshouldbenotedthat
everyrelationshipisreciprocal.Forexample,ifthereisaREFERENCED_INrelationshipgoingfromnodeAtonodeB,thenthereisaREFERENCESrelationshipgoingfromnodeBtonodeA.Thismakesthegraphmoreundirected,makingiteasiertowritegraphqueriesthattraverse
thegraphinanydirection.Moredetailsabouttheapproach,includingthespecificprompts,canbefoundinAppendixA.
Thegraphhas27,694nodesand,whenconsideringtheundirectedgraph,297,341edges.Ifthetaxcodewerefullyconnected,itwouldhave766,929,942edges.However,onlyabout0.04
6
percentofthepossibleedgesexistinthegraph,indicatingsubstantialinformationintheexistenceofaconnectionbetweentwonodes.9
KeywordIdentificationandSorting
Weextractedthetextfromeachsection,includingentitiesandconcepts.Byentity,wemeanallcategoriesofnounsclassifyingindividuals,corporations,businesses,andgovernmententitiesmentionedinasection.Byconcept,wemeankeytermsthatrelatetothepurposeofthesection;suchtermsincludecredit,deduction,andincome.10WeusedGPT-4owithchain-of-thought
promptingtoextracttheentitiesandconceptsfromTitle26.Theselistswerecleanedbymergingduplicates(e.g.,“l(fā)oss”and“l(fā)osses”or“Rule”and“rule”).
Forentities,wemanuallyreviewedtheentitylistproducedbytheLLMforallthesections—fromtheirhierarchyofmajorsectionstotheclauses—andlistedalltheentitiesusedthere.We
foundmorethan3,300entitieswithafrequencyofmorethan12,500times.Themostquoted
entitywas“Secretary,”referredto1,308times,11followedby“Taxpayer”and“Individual”(seeTable2.2).
Table2.2.Most-FrequentlyCitedEntitiesinTitle26
Entity
NumberofMentions
Secretary1,308
Taxpayer706
Individual265
UnitedStates236
Corporation234
Partnership(s)219
Person211
Employer173
Trust162
Employee120
9Hereweuseinformationinthesenseofinformationtheory.Becausethevastmajorityofpossibleedgesarenotpresent,thepresenceofanedgeconveysinformation.
10Thedistinctionbetweenentitiesandconceptsistheabilitytomakedecisions.Entities,suchasindividualsandfirms,makedecisionsthathavetaximplications.Entitieswillalwaysbenouns,whileconceptscanbeanypartofspeech.
11Infutureversions,weintendtodisambiguate“Secretary”tospecifywhichSecretary.So,forexample,wecoulddistinguishreferencestotheSecretaryoftheTreasuryfromthosefortheSecretaryofCommerce.
7
Entity
Numberof
Mentions
Shareholder(s)
114
Foreigncorporation(s)
106
Taxcourt
104
Scorporation
102
Partner(s)
91
State
85
Estate
83
Spouse
74
Relatedperson(s)
69
Controlledforeigncorporation
67
Domesticcorporation
67
Beneficiary
63
Participant(s)
59
Weextractedconceptsusingthesamegeneralapproachthatweappliedtoentities.We
extracted18,133potentialconcepts,ofwhich14,787conceptsappearedonlyonceinthecorpus.Wethencleanedthislistbymergingduplicatescreatedbycapitalization,pluralization,orotherlinguisticvariations.Toquantifythemeaningfulrelationshipsbetweentheextractedconcepts,
wedevelopedaco-occurrencematrixusingmultiplestrings,inwhichthemultipletermsappeartogetherandthefrequencyofeachterminthetaxcodeispresentedasthecellvalue.Wefilteredthedatasettoidentifytherowsinwhichthemost-frequentlyc
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 生物標(biāo)志物指導(dǎo)下的臨床試驗個體化方案
- 生物標(biāo)志物在藥物臨床試驗中的臨床試驗策略
- 生物材料動態(tài)性能優(yōu)化策略
- 生物化學(xué)綜合設(shè)計虛擬實驗案例庫建設(shè)
- 生物制品穩(wěn)定性試驗數(shù)字化管理規(guī)范
- 生物制劑失應(yīng)答的炎癥性腸病治療新靶點探索
- 深度解析(2026)《GBT 20314-2017液晶顯示器用薄浮法玻璃》
- 數(shù)據(jù)安全師面試題含答案
- 深度解析(2026)《GBT 19558-2004集成電路(IC)卡公用付費電話系統(tǒng)總技術(shù)要求》
- 深度解析(2026)《GBT 19403.1-2003半導(dǎo)體器件 集成電路 第11部分第1篇半導(dǎo)體集成電路 內(nèi)部目檢 (不包括混合電路)》
- 《國家賠償法》期末終結(jié)性考試(占總成績50%)-國開(ZJ)-參考資料
- 油煙清洗報告【范本模板】
- T-CPIA 0054-2023 光伏發(fā)電系統(tǒng)用柔性鋁合金電纜
- JC-T 424-2005 耐酸耐溫磚行業(yè)標(biāo)準(zhǔn)
- 懷念戰(zhàn)友混聲四部合唱簡譜
- 實驗針灸學(xué)-實驗針灸學(xué)研究程序與方法
- 倉庫工作人員職責(zé)培訓(xùn)課件
- 新教科版四上科學(xué)2.2《呼吸與健康生活》優(yōu)質(zhì)課件
- 綠盾加密軟件技術(shù)白皮書
- GB/T 7600-2014運行中變壓器油和汽輪機(jī)油水分含量測定法(庫侖法)
- 比較文學(xué)概論馬工程課件 第5章
評論
0/150
提交評論