Section 4 Setting Up the Repository
The code repository is necessary to repeat the California study. It may also be used as a framework to produce estimates for other regions. These instructions assume GitHub will be used to obtain and manage a copy of the repository.
4.1 Obtain Repository
- Using GitHub, navigate to the repository and fork the repository to the user’s account.
- Clone the repository to a new RStudio Project.
- Open the ‘README.md’ file.
- Fill out the parameters file with required specifications. The ‘README.md’ file contains general instructions, and Section 6.1 of the Methods section has more detailed explanations.
4.2 Modify Data Files
There are certain custom user files that must be created as part of this analysis. In particular, the TEMPLATE_emp.csv file in the data/raw/ folder must be altered to properly load in and calculate employment data. Because employment data for this report is not easily obtainable or transmissible via code, our team has created this template file for users to fill in their statewide employment numbers. Also provided is the .csv file used in this study, labeled CALIFORNIA_emp.csv, to serve as an example of a properly filled out form. The first two columns in this file, active_duty_military and reserve_military, are military employment numbers from DMDC, while the remaining seven columns are civilian employment numbers (for the Departments of Defense, Homeland Security, Veterans Affairs, and Energy) from FedScope. For more information on how to obtain this data, refer to Section 3.1 on data requirements for employment data. After edits are made to this file, it should be renamed based on the study area named in the “state” general variable in the parameters file and saved to the data/raw/ folder in the repository.
The repository requires two additional files to be obtained and added to the data/temp/ folder:
- Federal Real Public Property Data
- American Community Survey Data
Refer to Section 3.3.2 for details on how to retrieve these files. After these files are downloaded, they should be renamed and saved to the data/temp/ folder in the repository. It is recommended to use a consistent naming convention to make references and use of these files easier. Our naming convention involves naming the relevant data contained in the file, and the analysis year the file was obtained for. Hence the property data becomes “fed_prop_2021.csv.” The ACS data was renamed “2019_acs.xlsx” to reflect the year the file was obtained for, and the data it contains.
The American Community Survey Data file requires an additional step of modification after saving. Within the file itself, the first row of data needs to be deleted. Then, the columns “Geographic Area Name” and “Estimate!!EMPLOYMENT STATUS!!Population 16 years and over!!In labor force!!Armed Forces” should be renamed to “geography” and “armed_forces_employed” respectively in order to simplify using the data in later steps.
The next section details data contained in the Repository and its purpose.