Data Sources and Limitations
The U.S. Census County Business Patterns (CBP) data gives the most complete picture of the industrial structure of U.S. regions. This data includes annual employment levels, establishment counts, and payroll totals in North American Industry Classification System (NAICS) codes for every state, economic area, metropolitan and micropolitan statistical area (MSA), and county in the U.S. annually from 1998 to 2012. The cluster data on the website is refreshed when new underlying industrial data becomes available, which is typically each year in June or July.
Although data from the U.S. Bureau of Labor Statistics (BLS) has a more frequent update cycle, we use CBP data for defining clusters because it is more complete. Importantly for our purposes, the CBP provides useful estimates for employment when disclosure standards force the suppression of actual region-industry data. For instance, flags are provided to midrange estimates on employment, which allows us to better estimate employment for rural areas. Unfortunately, the two sources are substantially incompatible for working with jointly. In areas where there is a choice in data sources between CBP and BLS on the website, we have used CBP for consistency with the Clusters portion of the site.
The following types of data are excluded from the CBP:
- Migrant farmers
- Government employees
- Sole proprietorships
- Odd exceptions of individuals who don’t pay Social Security taxes, such as some railroad workers or teachers in some cases (i.e. Chicago Public School teachers)
Although the CBP does not include agricultural employment or other agricultural measures, we think these are an important class of measures to add to the site. We will likely add agricultural output and agricultural payroll measures to the Region Dashboard: Performance & Drivers sections in the future.
The BEA Economic Areas comprehensively define the relevant regional markets surrounding metropolitan or micropolitan statistical areas in 179 regions of the U.S. The project also uses BEA's Benchmark Input-Output data files for the input-output data underlying the cluster definitions.
The project uses BLS's Occupational Employment Statistics for certain data underlying the cluster definitions.
STATS America is a service of the Indiana Business Research Center at Indiana University's Kelley School of Business that provides nearly all of the performance, business environment, and demographics & geography data available in the Regions section.
The Regional Innovation Acceleration Network (RIAN) provides the Venture Development Organizations data available in the Organizations section.
U.S. patents by location of inventor will be allocated to industries and clusters using a concordance of technology classifications with these codes to clusters. This patent data is pending and will be updated in the summer of 2014.
The cluster mapping data provided by the U.S. Cluster Mapping website comes from publicly available sources so as to be accessible by as many users as possible. However, one of the limitations of using public data is that the analyses on the website are largely constrained by the data that is available. This leads to limitations due to set industry definitions, set regional definitions, estimated data, and lagged data.
First, economic data collected by the U.S. Census Bureau is grouped into industry codes (North American Industry Classification System (NAICS) codes) that better reflect the industries of the past than predict the industries of the future. These codes are updated every five years, but are focused on past economic activity. By necessity, the clusters are limited to the industry codes that exist and so may be slow to adopt newly emergent industries. The industry codes also provide differing levels of granularity. For example, in services, there are relatively few industry codes compared to manufacturing despite their increasing level of differentiation and importance to the U.S. economy.
Second, economic data is usually provided at the level of administratively defined regions such as states or economic areas. These arbitrary boundaries may not match the true geographic scope of specific clusters.
Third, there are confidentiality limitations to the U.S. Census Bureau's County Business Patterns (CBP) data, which serves as the main source of information on this website. To maintain the confidentiality of certain firms, data for a given NAICS code will only be given as a range. To account for this, the U.S. Cluster Mapping analyses use the midpoint of the range where necessary. This generally does not impact the results in aggregate or for large NAICS or regions. However, it can affect the results for smaller NAICS and regions.
Finally, the reported data from the U.S. Census has a lag of about two years. This makes it impossible to have the most current economic activity incorporated into the clusters. However, because the patterns and trends of economic geography change slowly over time, this lag in data being available does not significantly reduce the value of the data for regional economic development purposes.
How to Cite the Cluster Mapping Data
Bridges between Changes in NAICS
The NAICS codes have been updated every five years since their first appearance in 1997. NAICS 1997 covers data from 1998-2002, NAICS 2002 covers data from 2003-2007, and NAICS 2007 covers data from 2008-2013. The U.S. Census changes between NAICS 2002 and NAICS 2007 are relatively minor, with most industries remaining the same or cleanly mapping into a new industry while discontinuing the old.
Because of these changes, it was necessary to create a backwards mapping from NAICS 2007 to NAICS 1997 in order to consistently report information for clusters over time, which is available in the appendix download below.