Section 5 Data Dictionary for Repository
This section details each file contained in the code repository and a summary of what it is used for in the project. For a more detailed explanation, please see Section 6, “Methods.”
5.1 MEIS_Methodology Folder
This is the main folder, containing all data in the repository.
- .gitignore: Lists files that will be ignored when using Git to synchronize updates to repository.
- DEPRECATED_run_analysis_master.R: This file contains the script used for the 2021 project. This process guide pertains mostly to the code running from this master script.
- LICENSE: Contains licensing information for the repository.
- MEIS_Methodology.Rproj: The R project package used to open the repository in an IDE such as RStudio.
- parameters.R: Contains all variables that may require user editing for code to run properly. Use of this file is documented in the README.md file.
- README.md: The first file you should read in any repository. Contains basic installation instructions, information on the repository contents and guidance on how to use the repository.
- .Rhistory: Contains the history of commands given by a user in an R Session.
- run_analysis_master.R: The newest version of the code to process the Federal data. Currently, this file is in development for the upcoming 2022 version of this project and is not yet ready for use.
5.2 Data Folder
This is a folder located in the main folder and is designed to hold data for this project.
5.2.1 Raw Folder
This folder is within the data folder and contains all raw data provided to the user. The purpose of the raw data is to simplify data processing.
- 2007_to_2017_NAICS.xlsx: This file contains the crosswalk used in the master script to update 2007 NAICS codes to 2017 NAICS codes. This file is a new development to automate error checking when linking IMPLAN codes to NAICS codes. It is NOT used in the Deprecated master script.
- 2012_2017_NAICS_to_IMPLAN.xlsx: This file contains the 2017 NAICS to IMPLAN crosswalk as well as several additions. 2012 NAICS to IMPLAN values were appended to this file in addition to several 2002 NAICS codes that required updating. This file is a new development to automate error checking when linking IMPLAN codes to NAICS codes. It is NOT used in the Deprecated master script.
- business_type_to_implan546_crosswalk.csv: This file was created to serve as a crosswalk between grant recipient business types and their corresponding IMPLAN codes. This file was made by manually looking up each business type and entering its corresponding IMPLAN code value.
- CALIFORNIA_emp.csv: This file was created to hold employment numbers for selected federal agencies for the state of California. It serves as an example of how to fill out the “TEMPLATE_emp.csv” file and to allow for a repeat of the 2021 study.
- contract_recipient_to_district_crosswalk.xlsx: This file was created to serve as an aid to error checking by assigning a district value to certain contract data that lacked them. Creation of this file involved looking up addresses of businesses with no given district in order to correctly assign a district to the data. It is NOT used in the Deprecated master script.
- dod_county_shares.xlsx: This file contains the percentage of each county by land area that is assigned to each California congressional district as of 2021.
- TEMPLATE_emp.csv: This file is a template for users to fill out employment numbers for their region of interest. Follow instructions in Section 4.2 to make use of this file.
5.2.1.1 Blank_Sheets_for_R Folder
This folder contains the template spreadsheets used in the final preparation of data for entry into IMPLAN. For more information on the difference between the event types that comprise IMPLAN’s activity sheet, refer to IMPLAN’s article on this topic.
- CommodityOutput2.xlsx: An Excel sheet that goes into the IMPLAN activity sheets as the second worksheet tab.
- LaborIncomeChange3.xlsx: An Excel sheet that goes into the IMPLAN activity sheets as the third worksheet tab.
- HouseholdSpendingChange4.xlsx: An Excel sheet that goes into the IMPLAN activity sheets as the fourth worksheet tab. This Excel worksheet tab is what will hold the Veteran Affairs direct payments data.
- IndustrySpendingPattern5.xlsx: An Excel sheet that goes into the IMPLAN activity sheets as the fifth worksheet tab.
- InstitutionSpendingPattern6.xlsx: An Excel sheet that goes into the IMPLAN activity sheets as the sixth worksheet tab.
5.2.1.2 Deprecated Folder
This folder contains raw data only used by the Deprecated master script.
- 2017_implan_online_naics_to_implan546.xlsx: This file contains the crosswalk for linking 2017 IMPLAN codes to NAICS code.
- 2021_employment_totals.xlsx: This file contains the necessary employment data broken out by each California county and congressional district.
5.2.2 Temp Folder
This folder is also located within the data folder and holds all files that are made while either master script is running. These files are deleted when the final output is finished being processed.
- placeholderfortemp.txt: This file prevents Github from deleting the temp folder, as it is empty by default.
5.3 Output Folder
This folder is located within the main folder and holds all the final output files needed by IMPLAN to complete this project. New output files will get created and placed into this folder as scripts are run.
- placeholder.txt: This file prevents Github from deleting the output folder, as it is empty by default.
5.4 SRC Folder
This folder within the main folder contains all script files needed by the master script when it is running. Scripts in this folder may be used by either of the master scripts contained in this repository. DO NOT MODIFY ANY OF THESE FILES.
- obtain_usaspending.R: An R script that calls to USAspending’s Application Programming Interface (API) in order to obtain the USAspending data desired based on the criteria defined in this R script.
- filter_usaspending.R: An R script that defines a function for further filtering the USAspending data that was obtained from the API call.
- error_check_and_weight_contracts.R: An R script that works through multiple steps and lines of code to fix errors in the USAspending contracts data based on the type of error.
- error_check_grants.R: An R script that works through multiple steps and lines of code to fix errors in the USAspending grants data.
- concatenate_usaspending.R: An R script that defines a function for concatenating a certain group of CSVs based on the file name pattern (in this case, the cleaned USAspending data) while ignoring the Veteran Affairs benefits CSV file.
- split_usaspending.R: An R script that defines a function for splitting out Department of Energy spending from the main USAspending data prior to aggregating the spending.
- natsec_doe.R: An R script that filters down the Department of Energy spending to national security-related offices and gives a national security adjustment proportion to use for Energy employment and SmartPay calculations.
- aggregate_usaspending.R: An R script that defines a function for aggregating a defined dataframe’s spending figure (in this study, federal_action_obligation from USAspending data) by IMPLAN code.
- generate_employment_dataframe.R: An R script that calculates employment data and generates two dataframes, one for the county and one for the district IMPLAN sheets.
5.4.1 Deprecated Folder
This folder within the src folder contains all scripts that are only used by the DEPRECATED_run_analysis_master.R script. DO NOT MODIFY ANY OF THESE FILES.
- DEPRECATED_error_check_contracts.R: An R script of hardcode fixes to errors in the contracts data in order to get an IMPLAN code for every entry.
- DEPRECATED_error_check_grants.R: An R script of hardcode fixes to errors in the grants data in order to get an IMPLAN code for every entry.
- DEPRECATED_create_implan_sheets.R: An R script that runs two for loop codes that help to generate the IMPLAN activity sheets for every California county and congressional district.