Ztrax

Introduction

In cooperation with the DLab and with support from The Center on the Economics and Demography of Aging the Demography Lab houses the complete ZTRAX dataset. These data cover a huge subset of current and historical US real estate transactions. The data are also available at the Econometrics Lab, the Fisher Center and (soon) via the Savio Cluster. As the user community grows, we hope to develop nifty and fun ways to support sharing and communication.

Through this cooperative arrangement, the ZTRAX data are available free to any interested UC Berkeley affiliated researcher. We ask that users help improve the locally developed tools and documentation by politely pointing out our errors and suggesting improvements. At present, the easiest and most worked out pathway to assembling useful datasets is via R and Rstudio.

Administrative steps

  • Permission to use the ZTRAX data can be obtained directly from Patty Frontiera, please send email to ztrax@berkeley.edu
  • Once approved, researchers can access the ZTRAX data through the Demography Lab. Contact Carl Boe (cboe@berkeley.edu) for onboarding and support.

Overview

The ZTRAX dataset is approximately 1 TB in size and separated into 2500 distinct files. The files are grouped by state and for each state there are three separate groups of files which correspond to

  • Transactions from the buyer/seller point of view
  • Transactions from the county assessor point of view
  • Assessor records

Thus there are approximately 150 (3 per state) distinct sets of files which are logically seen as databases --in the sense that the files in each set have a common variable by which they can be joined. -- and that no successful joins are possible on that index variable across the data sets. This makes sense since the index variables refer to parcels (of land) or individual real estate transactions and the data sets are geographically defined.

Using R with ZTRAX

The nature of these data make R with data.tables a particularly compelling tool for constructing (if not analyzing) datasets. A locally written R package allows users to read all or parts of each set of files and perform very efficient SQL-like operations. A small state's Transactions data can be read in about 1 minute and a national dataset with modest number of variables can be read and constructed in about 90 minutes.

Once you have access to the data via Rstudio, just open the QuickStart.Rmd and Quickstart.html file for some details and examples of how it all works. You can click on these files in the File pane in Rstudio.

Available variables

Since real estate transactions are governed by county laws and officials, expect to find lots of anomalies and idiosyncrasies across the records. Nonetheless, starting with a well organized database makes a huge difference.

Documents from Zillow (requires CalNet ID)