Cluster Mapping Methodology
Traded Clusters Appendix
Traded Clusters Appendix
Local Clusters Appendix
Local Clusters Appendix
Categorization of Traded and Local Industries in the US Economy
US Cluster Definitions
BEA Economic Areas and Counties
Cluster mapping creates a dataset on the presence of clusters across geographies, based on a standardized set of benchmark cluster definitions that group individual industries uniquely into cluster categories. Researchers from Harvard Business School, MIT Sloan School of Management, and Temple University's Fox School of Business generated cluster definitions based on a novel algorithm that allows for the systematic generation and comparison of clusters across the United States. The paper that explains this methodology is “Defining Clusters of Related Industries” (Delgado, Porter and Stern 2016), which revisits and extends "The Economic Performance of Regions” (Porter 2003).
Industries are first classified as "traded" or "local." Traded industries are industries that are concentrated in a subset of geographic areas and sell to other regions and nations. Local industries are industries present in most (if not all) geographic areas, and primarily sell locally. Within the two large groups, sets of traded industries are then organized into traded clusters based on an overall measure of relatedness between individual industries across a range of linkages, including input-output measures, use of labor occupations, and co-location patterns of employment and establishments. Local industries are grouped primarily based on similarities in activities reflected in aggregated U.S. industry categories.
The geographic scope of a cluster is provided by the distances over which linkages and externalities have a meaningful impact. These distances differ by cluster categories and their underlying types of economic activities. For practical purposes, the geographic scope used in cluster mapping is an administratively defined region such as a state or economic area, even if it does not necessarily match the true geographic scope of specific clusters.
Regions will tend to have some level of economic activity in almost all cluster categories. A regional cluster exists when the level of this activity is overrepresented relative to the national average, measured as locational specialization above a certain set of cut-off points. This overrepresentation signals the presence of a critical mass at which cluster dynamics kick in.
The U.S. Cluster Mapping Project provides nationally consistent benchmark cluster definitions that can be used to assess the presence of clusters at any regional unit. The methodology groups 778 six-digit NAICS (North American Industry Classification System) industries into 51 traded cluster categories, and 310 NAICS industries into 16 local cluster categories (all mutually exclusive).
To generate the set of cluster definitions, the U.S. Cluster Mapping research team developed a novel clustering algorithm that assesses the quality of alternative sets of cluster definitions and captures multiple types of inter-industry linkages. The algorithm relies upon clustering analysis, numerical methods used to classify similar objects into groups, and a set of well-specified parameters including choice of underlying data and the number of initial groups from which to start the analysis. The process generates many different cluster configurations by applying clustering functions to data that provides different measures of the relatedness between any two industries and modifying the parameter choices. Each configuration is composed of mutually exclusive groups of related industries (i.e., clusters). The algorithm then provides scores that can be used to assess the quality of each configuration. Quality here refers to the configurations' ability to capture meaningful inter-industry linkages within clusters. This allows for identification of the configuration that best captures certain types of inter-industry links (although what is considered best may depend on the context of a given analysis). Because an algorithm cannot perfectly substitute for expert judgment, the methodology concludes with an expert assessment and adjustment of individual clusters in the best configuration to determine a final set of cluster definitions. This clustering algorithm was used to create a new set of U.S. Benchmark Cluster Definitions (BCD) that capture a broad range of inter-industry linkages.
Cluster mapping is designed to enable systematic comparison across regions. Because the cluster definitions are designed to be benchmarks, they are most useful when looking across regions, not when looking at only one region. The Biopharmaceuticals cluster in Boston, for example, includes a significant share of the legal and financial services in the region that are specifically dedicated to serving the unique needs of companies in the Biopharmaceuticals cluster. But while it makes sense to include these numbers for a profile of the Boston Biopharmaceuticals cluster, a national comparison should cover only those industries that are across all regions systematically linked within Biopharmaceuticals.
Strong clusters are defined as those where the location quotient, i.e. the cluster’s relative employment specialization, puts them into the leading 25% of regions across the U.S. in their respective cluster category.
Research reveals that some industries in a particular cluster are closely related to industries in another, creating connections between clusters. This could be due, in part, to multiple forms of externalities, and the fact that some industries are suppliers or customers of many other industries. While an industry may be linked primarily to one cluster category, it may be related to another cluster. For example, while petroleum refining occurs primarily in the context of clusters of Oil and Gas Products and Transportation, it also shows strong connection to Downstream Chemical Products clusters. This can be the result of technology, skills, or other capabilities that have multiple uses. The linkages between cluster categories have been found to be important to understanding the process of cluster emergence and regional diversification: new clusters may emerge from existing clusters through such linkages. Data on existing cluster portfolios and on cluster linkages is thus a useful tool to assess the likelihood of where specific new clusters might emerge and prosper.
The main underlying data source for the generation of benchmark cluster definitions is the U.S. Census Bureau's County Business Patterns dataset on employment, establishments, and wages by six-digit NAICS code (North American Industry Classification System), collected at the regional level of states, economic areas, metropolitan and micropolitan statistical areas, and counties. Click here to learn about other specific data used for cluster mapping.
Traded Cluster Mapping Methodology
There is an ever-growing need for quality data and useful analytical tools to help develop and implement successful regional cluster strategies, especially as clusters are increasingly incorporated into regional economic development efforts. A primary goal of the U.S. Cluster Mapping Project is to respond to these needs by providing rigorous and relevant cluster definitions for policymakers, practitioners, and academics. With these definitions, practitioners will be able to evaluate and interpret the data in ways most relevant to their specific regions.
At the center of the project lies the concept of clusters. Porter (2003) defines a cluster as a “geographically proximate group of interconnected companies, suppliers, service providers and associated institutions in a particular field, linked by externalities of various types.” Clusters are important to their firms and associated organizations (such as universities and local governments) for a number of reasons. Within clusters, these entities can operate more efficiently and can share common technologies, infrastructure, pools of knowledge, and demand. The presence of these clusters can be important drivers of regional competitiveness and innovation. Porter (2003) also recognized the need to clearly identify the industry boundaries of each cluster and pioneered a set of cluster definitions that became the foundation for the cluster analysis on which the U.S. Cluster Mapping Project builds.
In order to more effectively compete, regions need to understand their cluster strengths as compared to other areas. To accurately make this comparison, a consistent, national set of cluster definitions that mark the industry boundaries of each cluster is required. A good set of cluster definitions should group closely related and supporting industries that capture as many linkages as possible (e.g., technology, skills, supply, and demand).
Building on Porter (2003), Professors Mercedes Delgado, Michael Porter, and Scott Stern reexamined the relationships between industries using relevant clustering methods to develop the U.S. Cluster Mapping Benchmark Cluster Definitions (2014) that better capture the structure of industry-interdependencies today. The Benchmark Cluster Definitions incorporate new clustering analysis, current data, and industry linkages based on input-output, labor occupations, and the co-location patterns of employment and establishments.
New definitions are created from a novel clustering algorithm that generates sets of quantitatively derived cluster definitions based on clearly specified parameter choices. The algorithm provides scores that assess the quality of each set of cluster definitions, and identifies the “best” set. Because clustering analysis cannot perfectly substitute for expert judgment, the methodology concludes with a systematic correction of anomalies and characterization of the individual clusters in the best set, resulting in the Benchmark Cluster Definitions. These definitions, with descriptions of each cluster and the associated NAICS codes, can be found in the documents available for download below. Further detail is available in Delgado, Porter, and Stern, “Defining Clusters of Related Industries” (2014).