Importing Surveys of Incarcerated Populations into R

5년 전


The Bureau of Justice Statistics (BJS) has a ton of data regarding law-enforcement and incarceration. For a class research project, I am looking into examining data on the pre-incarceration incomes of incarcerated parents. For this topic, I am looking to examine three datasets: the Survey of Inmates in Federal Correctional Facilities (SIFCF), the Survey of Inmates in State Correctional Facilities (SISCF), and the Survey of Inmates in Local Jails (SILJ). I will be downloading the most recent datasets from each, which includes 2004 data from the SIFCF and SISCF as well as 2002 data from the SILJ.

This data, for the most part, is formatted to be accessible in closed-source programs that I would need to purchase a license to use. There are two problems with that: federally funded data should not require the public to purchase software to access it, and I don't want to use that software. Here's how to circumvent that and import the data from ASCII + SAS format into R:

#download datasets
#I am not allowed to re-distribute this original dataset
#use "ASCII + SAS setup" links
#unzip and concatenate datasets to same two folders (jails: "ICPSR_04359", prisons: "ICPSR_04572")

#set dir to dir with both data folders

#convert data to R, save in repsective folders
#SAScii package allows R to follow .sas input directions for .txt ASCII files, so long as the user specifies line in the .sas file at which "INPUT" starts manually
#Use a text editor to open .sas files and find "INPUT" lines
#define input lines
input04359_DS0001 <- 3681
input04359_DS0002 <- 50
input04572_DS0001 <- 3904
input04572_DS0002 <- 3898
input04572_DS0003 <- 424
input04572_DS0004 <- 424

#define lrcel, also manually from .sas files
lrcel04359_DS0001 <- 4679
lrcel04359_DS0002 <- 14326
lrcel04572_DS0001 <- 8845
lrcel04572_DS0002 <- 8845
lrcel04572_DS0003 <- 2580
lrcel04572_DS0004 <- 2580

#load data. takes a long time.
data04359_DS0001 <- read.SAScii("ICPSR_04359/DS0001/04359-0001-Data.txt", "ICPSR_04359/DS0001/", input04359_DS0001, lrecl = lrcel04359_DS0001)
data04359_DS0002 <- read.SAScii("ICPSR_04359/DS0002/04359-0002-Data.txt", "ICPSR_04359/DS0002/", input04359_DS0002, lrecl = lrcel04359_DS0002)
data04572_DS0001 <- read.SAScii("ICPSR_04572/DS0001/04572-0001-Data.txt", "ICPSR_04572/DS0001/", input04572_DS0001, lrecl = lrcel04572_DS0001)
data04572_DS0002 <- read.SAScii("ICPSR_04572/DS0002/04572-0002-Data.txt", "ICPSR_04572/DS0002/", input04572_DS0002, lrecl = lrcel04572_DS0002)
data04572_DS0003 <- read.SAScii("ICPSR_04572/DS0003/04572-0003-Data.txt", "ICPSR_04572/DS0003/", input04572_DS0003, lrecl = lrcel04572_DS0003) 
data04572_DS0004 <- read.SAScii("ICPSR_04572/DS0004/04572-0004-Data.txt", "ICPSR_04572/DS0004/", input04572_DS0004, lrecl = lrcel04572_DS0004)

#save in "data" folder

And that's it. Analyze away. This script is on GitHub.


This post is CC BY

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  trending

Nice. Is that original code or did you grab it from somewhere? It's like ripping YouTube videos to .mp4 files, but a more hardcore version of it, lol


Yep! Thanks for reminding me actually, I forgot to include a link to the GitHub repository I have for this.

Clueless here, but I appreciate your point!