Title: | Computing Key Indicators of the Spatial Distribution of Economic Activities |
---|---|
Description: | Functions to compute a series of indices commonly used in the fields of economic geography, economic complexity, and evolutionary economics to describe the location, distribution, spatial organization, structure, and complexity of economic activities. Functions include basic spatial indicators such as the location quotient, the Krugman specialization index, the Herfindahl or the Shannon entropy indices but also more advanced functions to compute different forms of normalized relatedness between economic activities or network-based measures of economic complexity. Most of the functions use matrix calculus and are based on bipartite (incidence) matrices consisting of region - industry pairs. |
Authors: | Pierre-Alexandre Balland <[email protected]> |
Maintainer: | Pierre-Alexandre Balland <[email protected]> |
License: | GPL-2 | GPL-3 [expanded from: GPL] |
Version: | 1.3 |
Built: | 2025-03-01 05:52:05 UTC |
Source: | https://github.com/paballand/econgeo |
This function computes the number of co-occurrences between industry pairs from an incidence (industry - event) matrix
co.occurrence(mat, diagonal = FALSE, list = FALSE)
co.occurrence(mat, diagonal = FALSE, list = FALSE)
mat |
An incidence matrix with industries in rows and events in columns |
diagonal |
Logical; shall the values in the diagonal of the co-occurrence matrix be included in the output? Defaults to FALSE (values in the diagonal are set to 0), but can be set to TRUE (values in the diagonal reflects in how many events a single industry can be found) |
list |
Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list |
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
relatedness
, relatedness.density
## generate a region - events matrix set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 5) rownames(mat) <- c ("I1", "I2", "I3", "I4") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## run the function co.occurrence (mat) co.occurrence (mat, diagonal = TRUE) ## generate a regular data frame (list) list <- get.list (mat) ## run the function co.occurrence (list, list = TRUE) co.occurrence (list, list = TRUE, diagonal = TRUE)
## generate a region - events matrix set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 5) rownames(mat) <- c ("I1", "I2", "I3", "I4") colnames(mat) <- c("US1", "US2", "US3", "US4", "US5") ## run the function co.occurrence (mat) co.occurrence (mat, diagonal = TRUE) ## generate a regular data frame (list) list <- get.list (mat) ## run the function co.occurrence (list, list = TRUE) co.occurrence (list, list = TRUE, diagonal = TRUE)
This function computes a simple measure of diversity of regions by counting the number of industries in which a region has a relative comparative advantage (location quotient > 1) from regions - industries (incidence) matrices
diversity(mat, RCA = FALSE)
diversity(mat, RCA = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
RCA |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
Pierre-Alexandre Balland [email protected]
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function diversity (mat, RCA = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function diversity (mat)
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function diversity (mat, RCA = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function diversity (mat)
This function computes the ease of recombination of a given technological class from technological classes - patents (incidence) matrices
ease.recombination(mat, sparse = FALSE, list = FALSE)
ease.recombination(mat, sparse = FALSE, list = FALSE)
mat |
A bipartite adjacency matrix (can be a sparse matrix) |
sparse |
Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix |
Pierre-Alexandre Balland [email protected]
Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1,30,replace=T), ncol = 5) rownames(mat) <- c ("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c ("US1", "US2", "US3", "US4", "US5") ## generate a technology - patent sparse matrix library (Matrix) smat <- Matrix(mat,sparse=TRUE) ## run the function ease.recombination (mat) ease.recombination (smat, sparse = TRUE) ## generate a regular data frame (list) list <- get.list (mat) ## run the function ease.recombination (list, list = TRUE)
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1,30,replace=T), ncol = 5) rownames(mat) <- c ("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c ("US1", "US2", "US3", "US4", "US5") ## generate a technology - patent sparse matrix library (Matrix) smat <- Matrix(mat,sparse=TRUE) ## run the function ease.recombination (mat) ease.recombination (smat, sparse = TRUE) ## generate a regular data frame (list) list <- get.list (mat) ## run the function ease.recombination (list, list = TRUE)
This function computes the Shannon entropy index from regions - industries matrices from (incidence) regions - industries matrices
entropy(mat)
entropy(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
Shannon, C.E., Weaver, W. (1949) The Mathematical Theory of Communication. Univ of Illinois Press.
Frenken, K., Van Oort, F. and Verburg, T. (2007) Related variety, unrelated variety and regional economic growth, Regional studies 41 (5): 685-697.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function entropy (mat)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function entropy (mat)
This function generates a data frame of entry events from multiple regions - industries matrices (different matrix compositions are allowed). In this function, the maximum number of periods is limited to 20.
entry.list(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
entry.list(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1 - mandatory) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2 - mandatory) |
mat... |
An incidence matrix with regions in rows and industries in columns (period ... - optional) |
Pierre-Alexandre Balland [email protected]
Wolf-Hendrik Uhlbach [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
entry
, exit
, exit.list
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[3,1] <- 1 ## run the function entry.list (mat1, mat2) ## generate a third region - industry matrix in which cells represent the presence/absence ## of a RCA (period 3) mat3 <- mat2 mat3[5,2] <- 1 ## run the function entry.list (mat1, mat2, mat3) ## generate a fourth region - industry matrix in which cells represent the presence/absence ## of a RCA (period 4) mat4 <- mat3 mat4[5,4] <- 1 ## run the function entry.list (mat1, mat2, mat3, mat4)
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[3,1] <- 1 ## run the function entry.list (mat1, mat2) ## generate a third region - industry matrix in which cells represent the presence/absence ## of a RCA (period 3) mat3 <- mat2 mat3[5,2] <- 1 ## run the function entry.list (mat1, mat2, mat3) ## generate a fourth region - industry matrix in which cells represent the presence/absence ## of a RCA (period 4) mat4 <- mat3 mat4[5,4] <- 1 ## run the function entry.list (mat1, mat2, mat3, mat4)
This function generates a matrix of entry events from two regions - industries matrices (different matrix compositions are allowed)
entry.mat(mat1, mat2)
entry.mat(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
Pierre-Alexandre Balland [email protected]
Wolf-Hendrik Uhlbach [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
exit
, entry.list
, exit.list
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[3,1] <- 1 ## run the function entry.mat (mat1, mat2)
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[3,1] <- 1 ## run the function entry.mat (mat1, mat2)
This function generates a data frame of exit events from multiple regions - industries matrices (different matrix compositions are allowed). In this function, the maximum number of periods is limited to 20.
exit.list(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
exit.list(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1 - mandatory) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2 - mandatory) |
mat... |
An incidence matrix with regions in rows and industries in columns (period ... - optional) |
Pierre-Alexandre Balland [email protected]
Wolf-Hendrik Uhlbach [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
entry
, exit
, entry.list
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[2,1] <- 0 ## run the function exit.list (mat1, mat2) ## generate a third region - industry matrix in which cells represent the presence/absence ## of a RCA (period 3) mat3 <- mat2 mat3[5,1] <- 0 ## run the function exit.list (mat1, mat2, mat3) ## generate a fourth region - industry matrix in which cells represent the presence/absence ## of a RCA (period 4) mat4 <- mat3 mat4[5,3] <- 0 ## run the function exit.list (mat1, mat2, mat3, mat4)
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[2,1] <- 0 ## run the function exit.list (mat1, mat2) ## generate a third region - industry matrix in which cells represent the presence/absence ## of a RCA (period 3) mat3 <- mat2 mat3[5,1] <- 0 ## run the function exit.list (mat1, mat2, mat3) ## generate a fourth region - industry matrix in which cells represent the presence/absence ## of a RCA (period 4) mat4 <- mat3 mat4[5,3] <- 0 ## run the function exit.list (mat1, mat2, mat3, mat4)
This function generates a matrix of exit events from two regions - industries matrices (different matrix compositions are allowed)
exit.mat(mat1, mat2)
exit.mat(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
Pierre-Alexandre Balland [email protected]
Wolf-Hendrik Uhlbach [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
entry
, exit.list
, entry.list
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[2,1] <- 0 ## run the function exit.mat (mat1, mat2)
## generate a first region - industry matrix in which cells represent the presence/absence ## of a RCA (period 1) set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix in which cells represent the presence/absence ## of a RCA (period 2) mat2 <- mat1 mat2[2,1] <- 0 ## run the function exit.mat (mat1, mat2)
This function computes the expy index of regions from (incidence) regions - industries matrices, as proposed by Hausmann, Hwang & Rodrik (2007). The index is a measure of the productivity level associated with a region's specialization pattern.
expy(mat, vec)
expy(mat, vec)
mat |
An incidence matrix with regions in rows and industries in columns |
vec |
A vector that gives GDP, R&D, education or any other relevant regional attribute that will be used to compute the weighted average for each industry |
Pierre-Alexandre Balland [email protected]
Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123
Hausmann, R., Hwang, J. & Rodrik, D. (2007) What you export matters, Journal of economic growth 12: 1-25.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## a vector of GDP of regions vec <- c (5, 10, 15, 25, 50) ## run the function expy (mat, vec)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## a vector of GDP of regions vec <- c (5, 10, 15, 25, 50) ## run the function expy (mat, vec)
This function creates regular data frames with three columns (regions, industries, count) from (incidence) matrices (wide to long format) using the reshape2 package
get.list (data)
get.list (data)
mat |
An incidence matrix with regions in rows and industries in columns (or the other way around) |
sparse |
Logical; is the input a sparse matrix? Defaults to FALSE |
Pierre-Alexandre Balland [email protected]
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function get.list (mat)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function get.list (mat)
This function creates regions - industries (incidence) matrices from regular data frames (long to wide format) using the reshape2 package or the Matrix package
get.matrix (data)
get.matrix (data)
data |
is a data frame with three columns (regions, industries, count) |
sparse |
Logical; shall the returned output be a sparse matrix? Defaults to FALSE, but can be set to TRUE if the dataset is very large |
Pierre-Alexandre Balland [email protected]
## generate a region - industry data frame set.seed(31) region <- c("R1", "R1", "R1", "R1", "R2", "R2", "R3", "R4", "R5", "R5") industry <- c("I1", "I2", "I3", "I4", "I1", "I2", "I1", "I1", "I3", "I3") data <- data.frame (region, industry) data$count <- 1 ## run the function get.matrix (data) get.matrix (data, sparse = TRUE)
## generate a region - industry data frame set.seed(31) region <- c("R1", "R1", "R1", "R1", "R2", "R2", "R3", "R4", "R5", "R5") industry <- c("I1", "I2", "I3", "I4", "I1", "I2", "I1", "I1", "I3", "I3") data <- data.frame (region, industry) data$count <- 1 ## run the function get.matrix (data) get.matrix (data, sparse = TRUE)
This function computes the Gini coefficient. The Gini index measures spatial inequality. It ranges from 0 (perfect income equality) to 1 (perfect income inequality) and is derived from the Lorenz curve. The Gini coefficient is defined as a ratio of two surfaces derived from the Lorenz curve. The numerator is given by the area between the Lorenz curve of the distribution and the uniform distribution line (45 degrees line). The denominator is the area under the uniform distribution line (the lower triangle). This index gives an indication of the unequal distribution of an industry accross n regions. Maximum inequality in the sample occurs when n-1 regions have a score of zero and one region has a positive score. The maximum value of the Gini coefficient is (n-1)/n and approaches 1 (theoretical maximum limit) as the number of observations (regions) increases.
Gini(mat)
Gini(mat)
ind |
A vector of industrial regional count |
Pierre-Alexandre Balland [email protected]
Gini, C. (1921) Measurement of Inequality of Incomes, The Economic Journal 31: 124-126
Hoover.Gini
, locational.Gini
, locational.Gini.curve
, Lorenz.curve
, Hoover.curve
## generate vectors of industrial count ind <- c(0, 10, 10, 30, 50) ## run the function Gini (ind) ## generate a region - industry matrix mat = matrix ( c (0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Gini (mat) ## run the function by aggregating all industries Gini (rowSums(mat)) ## run the function for industry #1 only (perfect equality) Gini (mat[,1]) ## run the function for industry #2 only (perfect equality) Gini (mat[,2]) ## run the function for industry #3 only (perfect unequality: max Gini = (5-1)/5) Gini (mat[,3]) ## run the function for industry #4 only (top 40% produces 100% of the output) Gini (mat[,4])
## generate vectors of industrial count ind <- c(0, 10, 10, 30, 50) ## run the function Gini (ind) ## generate a region - industry matrix mat = matrix ( c (0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Gini (mat) ## run the function by aggregating all industries Gini (rowSums(mat)) ## run the function for industry #1 only (perfect equality) Gini (mat[,1]) ## run the function for industry #2 only (perfect equality) Gini (mat[,2]) ## run the function for industry #3 only (perfect unequality: max Gini = (5-1)/5) Gini (mat[,3]) ## run the function for industry #4 only (top 40% produces 100% of the output) Gini (mat[,4])
This function generates a matrix of industrial growth by industries from two regions - industries matrices (same matrix composition from two different periods)
growth.ind(mat1, mat2)
growth.ind(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
exit
, entry.list
, exit.list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.ind (mat1, mat2)
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.ind (mat1, mat2)
This function generates a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods). In this function, the maximum number of periods is limited to 20.
growth.list(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
growth.list(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1 - mandatory) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2 - mandatory) |
mat... |
An incidence matrix with regions in rows and industries in columns (period ... - optional) |
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth
, exit
, exit.list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.list (mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5,2] <- 1 ## run the function growth.list (mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5,4] <- 1 ## run the function growth.list (mat1, mat2, mat3, mat4)
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.list (mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5,2] <- 1 ## run the function growth.list (mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5,4] <- 1 ## run the function growth.list (mat1, mat2, mat3, mat4)
This function generates a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods). In this function, the maximum number of periods is limited to 20.
growth.list.ind(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
growth.list.ind(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1 - mandatory) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2 - mandatory) |
mat... |
An incidence matrix with regions in rows and industries in columns (period ... - optional) |
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth
, exit
, exit.list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.list.ind (mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5,2] <- 1 ## run the function growth.list.ind (mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5,4] <- 1 ## run the function growth.list.ind (mat1, mat2, mat3, mat4)
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.list.ind (mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5,2] <- 1 ## run the function growth.list.ind (mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5,4] <- 1 ## run the function growth.list.ind (mat1, mat2, mat3, mat4)
This function generates a data frame of industrial growth in regions from multiple regions - industries matrices (same matrix composition for the different periods). In this function, the maximum number of periods is limited to 20.
growth.list.reg(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
growth.list.reg(mat1, mat2, mat3, mat4, mat5, mat6, mat7, mat8, mat9, mat10, mat11, mat12, mat13, mat14, mat15, mat16, mat17, mat18, mat19, mat20)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1 - mandatory) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2 - mandatory) |
mat... |
An incidence matrix with regions in rows and industries in columns (period ... - optional) |
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
growth
, exit
, exit.list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.list.reg (mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5,2] <- 1 ## run the function growth.list.reg (mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5,4] <- 1 ## run the function growth.list.reg (mat1, mat2, mat3, mat4)
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.list.reg (mat1, mat2) ## generate a third region - industry matrix with full count (period 3) mat3 <- mat2 mat3[5,2] <- 1 ## run the function growth.list.reg (mat1, mat2, mat3) ## generate a fourth region - industry matrix with full count (period 4) mat4 <- mat3 mat4[5,4] <- 1 ## run the function growth.list.reg (mat1, mat2, mat3, mat4)
This function generates a matrix of industrial growth in regions from two regions - industries matrices (same matrix composition from two different periods)
growth.mat(mat1, mat2)
growth.mat(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
exit
, entry.list
, exit.list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.mat (mat1, mat2)
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.mat (mat1, mat2)
This function generates a matrix of industrial growth by regions from two regions - industries matrices (same matrix composition from two different periods)
growth.reg(mat1, mat2)
growth.reg(mat1, mat2)
mat1 |
An incidence matrix with regions in rows and industries in columns (period 1) |
mat2 |
An incidence matrix with regions in rows and industries in columns (period 2) |
Pierre-Alexandre Balland [email protected]
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
Boschma, R., Heimeriks, G. and Balland, P.A. (2014) Scientific Knowledge Dynamics and Relatedness in Bio-Tech Cities, Research Policy 43 (1): 107-114
exit
, entry.list
, exit.list
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.reg (mat1, mat2)
## generate a first region - industry matrix with full count (period 1) set.seed(31) mat1 <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix with full count (period 2) mat2 <- mat1 mat2[3,1] <- 8 ## run the function growth.reg (mat1, mat2)
This function computes the Hachman index from regions - industries matrices. The Hachman index indicates how closely the industrial distribution of a region resembles the one of a more global economy (nation, world). The index varies between 0 (extreme dissimilarity between the region and the more global economy) and 1 (extreme similarity between the region and the more global economy)
Hachman(mat)
Hachman(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
average.location.quotient
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Hachman (mat)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Hachman (mat)
This function computes the Herfindahl index from regions - industries matrices from (incidence) regions - industries matrices. This index is also known as the Herfindahl-Hirschman index.
Herfindahl(mat)
Herfindahl(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
Herfindahl, O.C. (1959) Copper Costs and Prices: 1870-1957. Baltimore: The Johns Hopkins Press.
Hirschman, A.O. (1945) National Power and the Structure of Foreign Trade, Berkeley and Los Angeles: University of California Press.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Herfindahl (mat)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Herfindahl (mat)
This function plots a Hoover curve from regions - industries matrices.
Hoover.curve(mat, pop, plot = TRUE, pdf = FALSE)
Hoover.curve(mat, pop, plot = TRUE, pdf = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pop |
A vector of population regional count |
plot |
Logical; shall the curve be automatically plotted? Defaults to TRUE. If set to TRUE, the function will return x y coordinates that you can latter use to plot and customize the curve. |
pdf |
Logical; shall a pdf be saved to your current working directory? Defaults to FALSE. If set to TRUE, a pdf with all Hoover curves will be compiled and saved to your current working directory. |
Pierre-Alexandre Balland [email protected]
Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171
Hoover.Gini
, locational.Gini
, locational.Gini.curve
, Lorenz.curve
, Gini
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) Hoover.curve (ind, pop) Hoover.curve (ind, pop, pdf = TRUE) Hoover.curve (ind, pop, plot = F) ## generate a region - industry matrix mat = matrix ( c (0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Hoover.curve (mat, pop) Hoover.curve (mat, pop, pdf = TRUE) Hoover.curve (mat, pop, plot = FALSE) ## run the function by aggregating all industries Hoover.curve (rowSums(mat), pop) Hoover.curve (rowSums(mat), pop, pdf = TRUE) Hoover.curve (rowSums(mat), pop, plot = FALSE) ## run the function for industry #1 only Hoover.curve (mat[,1], pop) Hoover.curve (mat[,1], pop, pdf = TRUE) Hoover.curve (mat[,1], pop, plot = FALSE) ## run the function for industry #2 only (perfectly proportional to population) Hoover.curve (mat[,2], pop) Hoover.curve (mat[,2], pop, pdf = TRUE) Hoover.curve (mat[,2], pop, plot = FALSE) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) Hoover.curve (mat[,3], pop) Hoover.curve (mat[,3], pop, pdf = TRUE) Hoover.curve (mat[,3], pop, plot = FALSE) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) Hoover.curve (mat[,4], pop) Hoover.curve (mat[,4], pop, pdf = TRUE) Hoover.curve (mat[,4], pop, plot = FALSE) Compare the distribution of the #industries par(mfrow=c(2,2)) Hoover.curve (mat[,1], pop) Hoover.curve (mat[,2], pop) Hoover.curve (mat[,3], pop) Hoover.curve (mat[,4], pop)
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) Hoover.curve (ind, pop) Hoover.curve (ind, pop, pdf = TRUE) Hoover.curve (ind, pop, plot = F) ## generate a region - industry matrix mat = matrix ( c (0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Hoover.curve (mat, pop) Hoover.curve (mat, pop, pdf = TRUE) Hoover.curve (mat, pop, plot = FALSE) ## run the function by aggregating all industries Hoover.curve (rowSums(mat), pop) Hoover.curve (rowSums(mat), pop, pdf = TRUE) Hoover.curve (rowSums(mat), pop, plot = FALSE) ## run the function for industry #1 only Hoover.curve (mat[,1], pop) Hoover.curve (mat[,1], pop, pdf = TRUE) Hoover.curve (mat[,1], pop, plot = FALSE) ## run the function for industry #2 only (perfectly proportional to population) Hoover.curve (mat[,2], pop) Hoover.curve (mat[,2], pop, pdf = TRUE) Hoover.curve (mat[,2], pop, plot = FALSE) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) Hoover.curve (mat[,3], pop) Hoover.curve (mat[,3], pop, pdf = TRUE) Hoover.curve (mat[,3], pop, plot = FALSE) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) Hoover.curve (mat[,4], pop) Hoover.curve (mat[,4], pop, pdf = TRUE) Hoover.curve (mat[,4], pop, plot = FALSE) Compare the distribution of the #industries par(mfrow=c(2,2)) Hoover.curve (mat[,1], pop) Hoover.curve (mat[,2], pop) Hoover.curve (mat[,3], pop) Hoover.curve (mat[,4], pop)
This function computes the Hoover Gini, named after Hedgar Hoover. The Hoover index is a measure of spatial inequality. It ranges from 0 (perfect equality) to 1 (perfect inequality) and is calculated from the Hoover curve associated with a given distribution of population, industries or technologies and a reference category. In this sense, it is closely related to the Gini coefficient and the Hoover index. The numerator is given by the area between the Hoover curve of the distribution and the uniform distribution line (45 degrees line). The denominator is the area under the uniform distribution line (the lower triangle).
Hoover.Gini(mat, pop)
Hoover.Gini(mat, pop)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pop |
A vector of population regional count |
Pierre-Alexandre Balland [email protected]
Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171
Hoover.curve
, locational.Gini
, locational.Gini.curve
, Lorenz.curve
, Gini
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) Hoover.Gini (ind, pop) ## generate a region - industry matrix mat = matrix ( c (0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Hoover.Gini (mat, pop) ## run the function by aggregating all industries Hoover.Gini (rowSums(mat), pop) ## run the function for industry #1 only Hoover.Gini (mat[,1], pop) ## run the function for industry #2 only (perfectly proportional to population) Hoover.Gini (mat[,2], pop) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) Hoover.Gini (mat[,3], pop) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) Hoover.Gini (mat[,4], pop)
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) Hoover.Gini (ind, pop) ## generate a region - industry matrix mat = matrix ( c (0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Hoover.Gini (mat, pop) ## run the function by aggregating all industries Hoover.Gini (rowSums(mat), pop) ## run the function for industry #1 only Hoover.Gini (mat[,1], pop) ## run the function for industry #2 only (perfectly proportional to population) Hoover.Gini (mat[,2], pop) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) Hoover.Gini (mat[,3], pop) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) Hoover.Gini (mat[,4], pop)
This function computes the Hoover index, named after Hedgar Hoover. The Hoover index is a measure of spatial inequality. It ranges from 0 (perfect equality) to 100 (perfect inequality) and is calculated from the Lorenz curve associated with a given distribution of population, industries or technologies. In this sense, it is closely related to the Gini coefficient. The Hoover index represents the maximum vertical distance between the Lorenz curve and the 45 degree line of perfect spatial equality. It indicates the proportion of industries, jobs, or population needed to be transferred from the top to the bottom of the distribution to achieve perfect spatial equality. The Hoover index is also known as the Robin Hood index in studies of income inequality.
Computation of the Hoover index:
Hoover.index(mat, pop)
Hoover.index(mat, pop)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pop |
A vector of population regional count; if this argument is missing an equal distribution of the reference group will be assumed. |
pdf |
Logical; shall a pdf be saved to your current working directory? Defaults to FALSE. If set to TRUE, a pdf with all Hoover indices will be compiled and saved to your current working directory. |
Pierre-Alexandre Balland [email protected]
Hoover, E.M. (1936) The Measurement of Industrial Localization, The Review of Economics and Statistics 18 (1): 162-171
Hoover.curve
, Hoover.Gini
, locational.Gini
, locational.Gini.curve
, Lorenz.curve
, Gini
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) Hoover.index (ind, pop) ## generate a region - industry matrix mat = matrix ( c (0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Hoover.index (mat, pop) ## run the function by aggregating all industries Hoover.index (rowSums(mat), pop) ## run the function for industry #1 only Hoover.index (mat[,1], pop) ## run the function for industry #2 only (perfectly proportional to population) Hoover.index (mat[,2], pop) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) Hoover.index (mat[,3], pop) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) Hoover.index (mat[,4], pop)
## generate vectors of industrial and population count ind <- c(0, 10, 10, 30, 50) pop <- c(10, 15, 20, 25, 30) ## run the function (30% of the population produces 50% of the industrial output) Hoover.index (ind, pop) ## generate a region - industry matrix mat = matrix ( c (0, 10, 0, 0, 0, 15, 0, 0, 0, 20, 0, 0, 0, 25, 0, 1, 0, 30, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Hoover.index (mat, pop) ## run the function by aggregating all industries Hoover.index (rowSums(mat), pop) ## run the function for industry #1 only Hoover.index (mat[,1], pop) ## run the function for industry #2 only (perfectly proportional to population) Hoover.index (mat[,2], pop) ## run the function for industry #3 only (30% of the pop. produces 100% of the output) Hoover.index (mat[,3], pop) ## run the function for industry #4 only (55% of the pop. produces 100% of the output) Hoover.index (mat[,4], pop)
This function computes a measure of complexity from the inverse of the normalized ubiquity of industries. We divide the logarithm of the total count (employment, number of firms, number of patents, ...) in an industry by its ubiquity. Ubiquity is given by the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices
inv.norm.ubiquity(mat)
inv.norm.ubiquity(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
diversity
, location.quotient
, ubiquity
, TCI
, MORt
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function inv.norm.ubiquity (mat)
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function inv.norm.ubiquity (mat)
This function computes an index of knowledge complexity of regions using the eigenvector method from regions - industries (incidence) matrices. Technically, the function returns the eigenvector associated with the second largest eigenvalue of the projected region - region matrix.
KCI(mat, RCA = FALSE)
KCI(mat, RCA = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
RCA |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
Pierre-Alexandre Balland [email protected]
Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
location.quotient
, ubiquity
, diversity
, MORc
, TCI
, MORt
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function KCI (mat, RCA = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function KCI (mat) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1","P2", "P3", "P4", "P2", "P3", "P4", "P4") data <- data.frame(countries, products) data$freq <- 1 mat <- get.matrix (data) ## run the function KCI (mat)
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function KCI (mat, RCA = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function KCI (mat) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1","P2", "P3", "P4", "P2", "P3", "P4", "P4") data <- data.frame(countries, products) data$freq <- 1 mat <- get.matrix (data) ## run the function KCI (mat)
This function computes the Krugman index from regions - industries matrices. The higher the coefficient, the greater the regional specialization. This index is often referred to as the Krugman specialisation index and measures the distance between the distributions of industry shares in a region and at a more aggregated level (country for instance).
Krugman.index(mat)
Krugman.index(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
Krugman P. (1991) Geography and Trade, MIT Press, Cambridge
average.location.quotient
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Krugman.index (mat)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Krugman.index (mat)
This function computes location quotients from (incidence) regions - industries matrices. The numerator is the share of a given industry in a given region. The denominator is the share of a this industry in a larger economy (overall country for instance). This index is also refered to as the index of Revealed Comparative Advantage (RCA) following Ballasa (1965), or the Hoover-Balassa index.
location.quotient(mat, binary = FALSE)
location.quotient(mat, binary = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
binary |
Logical; shall the returned output be a dichotomized version (0/1) of the location quotient? Defaults to FALSE (the full values of the location quotient will be returned), but can be set to TRUE (location quotient values above 1 will be set to 1 & location quotient values below 1 will be set to 0) |
Pierre-Alexandre Balland [email protected]
Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123.
## generate a region - industry matrix mat = matrix ( c (100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0), ncol = 5, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4", "I5") ## run the function location.quotient (mat) location.quotient (mat, binary = TRUE)
## generate a region - industry matrix mat = matrix ( c (100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0), ncol = 5, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4", "I5") ## run the function location.quotient (mat) location.quotient (mat, binary = TRUE)
This function computes the average location quotients of regions from (incidence) regions - industries matrices. This index is also referred to as the coefficient of specialization (Hoover and Giarratani, 1985).
location.quotient.avg(mat)
location.quotient.avg(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
Hoover, E.M. and Giarratani, F. (1985) An Introduction to Regional Economics. 3rd edition. New York: Alfred A. Knopf
Boschma, R., Balland, P.A. and Kogler, D. (2015) Relatedness and Technological Change in Cities: The rise and fall of technological knowledge in U.S. metropolitan areas from 1981 to 2010, Industrial and Corporate Change 24 (1): 223-250
## generate a region - industry matrix mat = matrix ( c (100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0), ncol = 5, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4", "I5") ## run the function location.quotient.avg (mat)
## generate a region - industry matrix mat = matrix ( c (100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0), ncol = 5, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4", "I5") ## run the function location.quotient.avg (mat)
This function computes the locational Gini coefficient as proposed by Krugman from regions - industries matrices. The higher the coefficient (theoretical limit = 0.5), the greater the industrial concentration. The locational Gini of an industry that is not localized at all (perfectly spread out) in proportion to overall employment would be 0.
locational.Gini(mat)
locational.Gini(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
Krugman P. (1991) Geography and Trade, MIT Press, Cambridge (chapter 2 - p.56)
Hoover.Gini
, locational.Gini.curve
, Hoover.curve
, Lorenz.curve
, Gini
## generate a region - industry matrix mat = matrix ( c (100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0), ncol = 5, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4", "I5") ## run the function locational.Gini (mat)
## generate a region - industry matrix mat = matrix ( c (100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0), ncol = 5, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4", "I5") ## run the function locational.Gini (mat)
This function plots a locational Gini curve following Krugman from regions - industries matrices.
locational.Gini.curve(mat, pdf = FALSE)
locational.Gini.curve(mat, pdf = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pdf |
Logical; shall a pdf be saved to your current working directory? Defaults to FALSE. If set to TRUE, a pdf with all locational Gini curves will be compiled and saved to your current working directory. |
pop |
A vector of population regional count |
Pierre-Alexandre Balland [email protected]
Krugman P. (1991) Geography and Trade, MIT Press, Cambridge (chapter 2 - p.56)
Hoover.Gini
, locational.Gini
, Hoover.curve
, Lorenz.curve
, Gini
## generate a region - industry matrix mat = matrix ( c (100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0), ncol = 5, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4", "I5") ## run the function (shows industry #5) locational.Gini.curve (mat) locational.Gini.curve (mat, pdf = TRUE)
## generate a region - industry matrix mat = matrix ( c (100, 0, 0, 0, 0, 0, 15, 5, 70, 10, 0, 20, 10, 20, 50, 0, 25, 30, 5, 40, 0, 40, 55, 5, 0), ncol = 5, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4", "I5") ## run the function (shows industry #5) locational.Gini.curve (mat) locational.Gini.curve (mat, pdf = TRUE)
This function plots a Lorenz curve from regional industrial counts. This curve gives an indication of the unequal distribution of an industry accross regions.
Lorenz.curve(mat, pdf = FALSE, plot = TRUE)
Lorenz.curve(mat, pdf = FALSE, plot = TRUE)
mat |
An incidence matrix with regions in rows and industries in columns. The input can also be a vector of industrial regional count (a matrix with n regions in rows and a single column). |
pdf |
Logical; shall a pdf be saved to your current working directory? Defaults to FALSE. If set to TRUE, a pdf with all Lorenz curves will be compiled and saved to your current working directory. |
plot |
Logical; shall the curve be automatically plotted? Defaults to TRUE. If set to TRUE, the function will return x y coordinates that you can latter use to plot and customize the curve. |
Pierre-Alexandre Balland [email protected]
Lorenz, M. O. (1905) Methods of measuring the concentration of wealth, Publications of the American Statistical Association 9: 209–219
Hoover.Gini
, locational.Gini
, locational.Gini.curve
, Hoover.curve
, Gini
## generate vectors of industrial count ind <- c(0, 10, 10, 30, 50) ## run the function Lorenz.curve (ind) Lorenz.curve (ind, pdf = TRUE) Lorenz.curve (ind, plot = FALSE) ## generate a region - industry matrix mat = matrix ( c (0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Lorenz.curve (mat) Lorenz.curve (mat, pdf = TRUE) Lorenz.curve (mat, plot = FALSE) ## run the function by aggregating all industries Lorenz.curve (rowSums(mat)) Lorenz.curve (rowSums(mat), pdf = TRUE) Lorenz.curve (rowSums(mat), plot = FALSE) ## run the function for industry #1 only (perfect equality) Lorenz.curve (mat[,1]) Lorenz.curve (mat[,1], pdf = TRUE) Lorenz.curve (mat[,1], plot = FALSE) ## run the function for industry #2 only (perfect equality) Lorenz.curve (mat[,2]) Lorenz.curve (mat[,2], pdf = TRUE) Lorenz.curve (mat[,2], plot = FALSE) ## run the function for industry #3 only (perfect unequality) Lorenz.curve (mat[,3]) Lorenz.curve (mat[,3], pdf = TRUE) Lorenz.curve (mat[,3], plot = FALSE) ## run the function for industry #4 only (top 40% produces 100% of the output) Lorenz.curve (mat[,4]) Lorenz.curve (mat[,4], pdf = TRUE) Lorenz.curve (mat[,4], plot = FALSE) Compare the distribution of the #industries par(mfrow=c(2,2)) Lorenz.curve (mat[,1]) Lorenz.curve (mat[,2]) Lorenz.curve (mat[,3]) Lorenz.curve (mat[,4])
## generate vectors of industrial count ind <- c(0, 10, 10, 30, 50) ## run the function Lorenz.curve (ind) Lorenz.curve (ind, pdf = TRUE) Lorenz.curve (ind, plot = FALSE) ## generate a region - industry matrix mat = matrix ( c (0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1), ncol = 4, byrow = T) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function Lorenz.curve (mat) Lorenz.curve (mat, pdf = TRUE) Lorenz.curve (mat, plot = FALSE) ## run the function by aggregating all industries Lorenz.curve (rowSums(mat)) Lorenz.curve (rowSums(mat), pdf = TRUE) Lorenz.curve (rowSums(mat), plot = FALSE) ## run the function for industry #1 only (perfect equality) Lorenz.curve (mat[,1]) Lorenz.curve (mat[,1], pdf = TRUE) Lorenz.curve (mat[,1], plot = FALSE) ## run the function for industry #2 only (perfect equality) Lorenz.curve (mat[,2]) Lorenz.curve (mat[,2], pdf = TRUE) Lorenz.curve (mat[,2], plot = FALSE) ## run the function for industry #3 only (perfect unequality) Lorenz.curve (mat[,3]) Lorenz.curve (mat[,3], pdf = TRUE) Lorenz.curve (mat[,3], plot = FALSE) ## run the function for industry #4 only (top 40% produces 100% of the output) Lorenz.curve (mat[,4]) Lorenz.curve (mat[,4], pdf = TRUE) Lorenz.curve (mat[,4], plot = FALSE) Compare the distribution of the #industries par(mfrow=c(2,2)) Lorenz.curve (mat[,1]) Lorenz.curve (mat[,2]) Lorenz.curve (mat[,3]) Lorenz.curve (mat[,4])
This function e-arranges the dimension of a matrix based on the dimension of another matrix
match.mat(fill = mat1, dim = mat2, missing = T)
match.mat(fill = mat1, dim = mat2, missing = T)
fill |
A matrix that will be used to populate the matrix output |
dim |
A matrix that will be used to determine the dimensions of the matrix output |
missing |
Logical; Shall the cells of the non matching rows/columns set to NA? Default to TRUE but can be set to FALSE to set the cells of the non matching rows/columns to 0 instead. |
Pierre-Alexandre Balland [email protected]
## generate a first region - industry matrix set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix set.seed(31) mat2 <- matrix(sample(0:1,16,replace=T), ncol = 4) rownames(mat2) <- c ("R1", "R2", "R3", "R5") colnames(mat2) <- c ("I1", "I2", "I3", "I4") ## run the function match.mat (fill = mat1, dim = mat2) match.mat (fill = mat2, dim = mat1) match.mat (fill = mat2, dim = mat1, missing = F)
## generate a first region - industry matrix set.seed(31) mat1 <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat1) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat1) <- c ("I1", "I2", "I3", "I4") ## generate a second region - industry matrix set.seed(31) mat2 <- matrix(sample(0:1,16,replace=T), ncol = 4) rownames(mat2) <- c ("R1", "R2", "R3", "R5") colnames(mat2) <- c ("I1", "I2", "I3", "I4") ## run the function match.mat (fill = mat1, dim = mat2) match.mat (fill = mat2, dim = mat1) match.mat (fill = mat2, dim = mat1, missing = F)
This function computes a measure of modular complexity of patent documents from technological classes - patents (incidence) matrices
modular.complexity(mat, sparse = FALSE, list = FALSE)
modular.complexity(mat, sparse = FALSE, list = FALSE)
mat |
A bipartite adjacency matrix (can be a sparse matrix) |
sparse |
Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix |
list |
Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list |
Pierre-Alexandre Balland [email protected]
Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1,30,replace=T), ncol = 5) rownames(mat) <- c ("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c ("US1", "US2", "US3", "US4", "US5") ## run the function modular.complexity (mat) ## generate a technology - patent sparse matrix library (Matrix) ## run the function smat <- Matrix(mat,sparse=TRUE) modular.complexity (smat, sparse = TRUE) ## generate a regular data frame (list) list <- get.list (mat) ## run the function modular.complexity (list, list = TRUE)
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1,30,replace=T), ncol = 5) rownames(mat) <- c ("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c ("US1", "US2", "US3", "US4", "US5") ## run the function modular.complexity (mat) ## generate a technology - patent sparse matrix library (Matrix) ## run the function smat <- Matrix(mat,sparse=TRUE) modular.complexity (smat, sparse = TRUE) ## generate a regular data frame (list) list <- get.list (mat) ## run the function modular.complexity (list, list = TRUE)
This function computes a measure of average modular complexity of technologies (average complexity of patent documents in a given technological class) from technological classes - patents (incidence) matrices
modular.complexity.avg(mat, sparse = FALSE, list = FALSE)
modular.complexity.avg(mat, sparse = FALSE, list = FALSE)
mat |
A bipartite adjacency matrix (can be a sparse matrix) |
sparse |
Logical; is the input matrix a sparse matrix? Defaults to FALSE, but can be set to TRUE if the input matrix is a sparse matrix |
list |
Logical; is the input a list? Defaults to FALSE (input = adjacency matrix), but can be set to TRUE if the input is an edge list |
Pierre-Alexandre Balland [email protected]
Fleming, L. and Sorenson, O. (2001) Technology as a complex adaptive system: evidence from patent data, Research Policy 30: 1019-1039
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1,30,replace=T), ncol = 5) rownames(mat) <- c ("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c ("US1", "US2", "US3", "US4", "US5") ## run the function modular.complexity.avg (mat) ## generate a technology - patent sparse matrix library (Matrix) ## run the function smat <- Matrix(mat,sparse=TRUE) modular.complexity.avg (smat, sparse = TRUE) ## generate a regular data frame (list) list <- get.list (mat) ## run the function modular.complexity.avg (list, list = TRUE)
## generate a technology - patent matrix set.seed(31) mat <- matrix(sample(0:1,30,replace=T), ncol = 5) rownames(mat) <- c ("T1", "T2", "T3", "T4", "T5", "T6") colnames(mat) <- c ("US1", "US2", "US3", "US4", "US5") ## run the function modular.complexity.avg (mat) ## generate a technology - patent sparse matrix library (Matrix) ## run the function smat <- Matrix(mat,sparse=TRUE) modular.complexity.avg (smat, sparse = TRUE) ## generate a regular data frame (list) list <- get.list (mat) ## run the function modular.complexity.avg (list, list = TRUE)
This function computes an index of knowledge complexity of regions using the method of reflection from regions - industries (incidence) matrices. The index has been developed by Hidalgo and Hausmann (2009) for country - product matrices and adapted by Balland and Rigby (2016) to city - technology matrices.
MORc(mat, RCA = FALSE, steps = 20)
MORc(mat, RCA = FALSE, steps = 20)
mat |
An incidence matrix with regions in rows and industries in columns |
RCA |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
steps |
Number of iteration steps. Defaults to 20, but can be set to 0 to give diversity (number of industry in which a region has a RCA), to 1 to give the average ubiquity of the industries in which a region has a RCA, to 2 to give the average diversity of regions that have similar industrial structures, or to any other number of steps < or = to 22. Note that above steps = 2 the index will be rescaled from 0 (minimum relative complexity) to 100 (maximum relative complexity). |
Pierre-Alexandre Balland [email protected]
Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
location.quotient
, ubiquity
, diversity
, KCI
, TCI
, MORt
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function MORc (mat, RCA = TRUE) MORc (mat, RCA = TRUE, steps = 0) MORc (mat, RCA = TRUE, steps = 1) MORc (mat, RCA = TRUE, steps = 2) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(32) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function MORc (mat) MORc (mat, steps = 0) MORc (mat, steps = 1) MORc (mat, steps = 2) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1","P2", "P3", "P4", "P2", "P3", "P4", "P4") data <- data.frame(countries, products) data$freq <- 1 mat <- get.matrix (data) ## run the function MORc (mat) MORc (mat, steps = 0) MORc (mat, steps = 1) MORc (mat, steps = 2)
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function MORc (mat, RCA = TRUE) MORc (mat, RCA = TRUE, steps = 0) MORc (mat, RCA = TRUE, steps = 1) MORc (mat, RCA = TRUE, steps = 2) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(32) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function MORc (mat) MORc (mat, steps = 0) MORc (mat, steps = 1) MORc (mat, steps = 2) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1","P2", "P3", "P4", "P2", "P3", "P4", "P4") data <- data.frame(countries, products) data$freq <- 1 mat <- get.matrix (data) ## run the function MORc (mat) MORc (mat, steps = 0) MORc (mat, steps = 1) MORc (mat, steps = 2)
This function computes an index of knowledge complexity of industries using the method of reflection from regions - industries (incidence) matrices. The index has been developed by Hidalgo and Hausmann (2009) for country - product matrices and adapted by Balland and Rigby (2016) to city - technology matrices.
MORt(mat, RCA = FALSE, steps = 19)
MORt(mat, RCA = FALSE, steps = 19)
mat |
An incidence matrix with regions in rows and industries in columns |
RCA |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
steps |
Number of iteration steps. Defaults to 19, but can be set to 0 to give ubiquity (number of regions that have a RCA in a industry), to 1 to give the average diversity of the regions that have a RCA in this industry, to 2 to give the average ubiquity of technologies developed in the same regions, or to any other number of steps < or = to 21. Note that above steps = 2 the index will be rescaled from 0 (minimum relative complexity) to 100 (maximum relative complexity). |
Pierre-Alexandre Balland [email protected]
Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
location.quotient
, ubiquity
, diversity
, KCI
, TCI
, MORc
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function MORt (mat, RCA = TRUE) MORt (mat, RCA = TRUE, steps = 0) MORt (mat, RCA = TRUE, steps = 1) MORt (mat, RCA = TRUE, steps = 2) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(32) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function MORt (mat) MORt (mat, steps = 0) MORt (mat, steps = 1) MORt (mat, steps = 2) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1","P2", "P3", "P4", "P2", "P3", "P4", "P4") data <- data.frame(countries, products) data$freq <- 1 mat <- get.matrix (data) ## run the function MORt (mat) MORt (mat, steps = 0) MORt (mat, steps = 1) MORt (mat, steps = 2)
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function MORt (mat, RCA = TRUE) MORt (mat, RCA = TRUE, steps = 0) MORt (mat, RCA = TRUE, steps = 1) MORt (mat, RCA = TRUE, steps = 2) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(32) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function MORt (mat) MORt (mat, steps = 0) MORt (mat, steps = 1) MORt (mat, steps = 2) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1","P2", "P3", "P4", "P2", "P3", "P4", "P4") data <- data.frame(countries, products) data$freq <- 1 mat <- get.matrix (data) ## run the function MORt (mat) MORt (mat, steps = 0) MORt (mat, steps = 1) MORt (mat, steps = 2)
This function computes a measure of complexity by normalizing ubiquity of industries. We divide the share of the total count (employment, number of firms, number of patents, ...) in an industry by its share of ubiquity. Ubiquity is given by the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices
norm.ubiquity(mat)
norm.ubiquity(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
diversity
, location.quotient
, ubiquity
, TCI
, MORt
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function norm.ubiquity (mat)
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function norm.ubiquity (mat)
This function computes the prody index of industries from (incidence) regions - industries matrices, as proposed by Hausmann, Hwang & Rodrik (2007). The index gives an associated income level for each industry. It represents a weighted average of per-capita GDPs (but GDP can be replaced by R&D, education...), where the weights correspond to the revealed comparative advantage of each region in a given industry (or sector, technology, ...).
prody(mat, vec)
prody(mat, vec)
mat |
An incidence matrix with regions in rows and industries in columns |
vec |
A vector that gives GDP, R&D, education or any other relevant regional attribute that will be used to compute the weighted average for each industry |
Pierre-Alexandre Balland [email protected]
Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123
Hausmann, R., Hwang, J. & Rodrik, D. (2007) What you export matters, Journal of economic growth 12: 1-25.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## a vector of GDP of regions vec <- c (5, 10, 15, 25, 50) ## run the function prody (mat, vec)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## a vector of GDP of regions vec <- c (5, 10, 15, 25, 50) ## run the function prody (mat, vec)
This function computes an index of revealed comparative advantage (RCA) from (incidence) regions - industries matrices. The numerator is the share of a given industry in a given region. The denominator is the share of a this industry in a larger economy (overall country for instance). This index is also refered to as a location quotient, or the Hoover-Balassa index.
RCA(mat, binary = FALSE)
RCA(mat, binary = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
binary |
Logical; shall the returned output be a dichotomized version (0/1) of the RCA? Defaults to FALSE (the full values of the RCA will be returned), but can be set to TRUE (RCA above 1 will be set to 1 & RCA values below 1 will be set to 0) |
Pierre-Alexandre Balland [email protected]
Balassa, B. (1965) Trade Liberalization and Revealed Comparative Advantage, The Manchester School 33: 99-123.
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function RCA (mat) RCA (mat, binary = TRUE)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function RCA (mat) RCA (mat, binary = TRUE)
This function computes the Hoover coefficient of specialization from regions - industries matrices. The higher the coefficient, the greater the regional specialization. This index is closely related to the Krugman specialisation index.
spec.coeff(mat)
spec.coeff(mat)
mat |
An incidence matrix with regions in rows and industries in columns |
Pierre-Alexandre Balland [email protected]
Hoover, E.M. and Giarratani, F. (1985) An Introduction to Regional Economics. 3rd edition. New York: Alfred A. Knopf (see table 9-4 in particular)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function spec.coeff (mat)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function spec.coeff (mat)
This function computes an index of knowledge complexity of industries using the eigenvector method from regions - industries (incidence) matrices. Technically, the function returns the eigenvector associated with the second largest eigenvalue of the projected industry - industry matrix.
TCI(mat, RCA = FALSE)
TCI(mat, RCA = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
RCA |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
Pierre-Alexandre Balland [email protected]
Hidalgo, C. and Hausmann, R. (2009) The building blocks of economic complexity, Proceedings of the National Academy of Sciences 106: 10570 - 10575.
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
location.quotient
, ubiquity
, diversity
, MORc
, KCI
, MORt
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function TCI (mat, RCA = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function TCI (mat) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1","P2", "P3", "P4", "P2", "P3", "P4", "P4") data <- data.frame(countries, products) data$freq <- 1 mat <- get.matrix (data) ## run the function TCI (mat)
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function TCI (mat, RCA = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function TCI (mat) ## generate the simple network of Hidalgo and Hausmann (2009) presented p.11 (Fig. S4) countries <- c("C1", "C1", "C1", "C1", "C2", "C3", "C3", "C4") products <- c("P1","P2", "P3", "P4", "P2", "P3", "P4", "P4") data <- data.frame(countries, products) data$freq <- 1 mat <- get.matrix (data) ## run the function TCI (mat)
This function computes a simple measure of ubiquity of industries by counting the number of regions in which an industry can be found (location quotient > 1) from regions - industries (incidence) matrices
ubiquity(mat, RCA = FALSE)
ubiquity(mat, RCA = FALSE)
mat |
An incidence matrix with regions in rows and industries in columns |
RCA |
Logical; should the index of relative comparative advantage (RCA - also refered to as location quotient) first be computed? Defaults to FALSE (a binary matrix - 0/1 - is expected as an input), but can be set to TRUE if the index of relative comparative advantage first needs to be computed |
Pierre-Alexandre Balland [email protected]
Balland, P.A. and Rigby, D. (2017) The Geography of Complex Knowledge, Economic Geography 93 (1): 1-23.
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function ubiquity (mat, RCA = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function ubiquity (mat)
## generate a region - industry matrix with full count set.seed(31) mat <- matrix(sample(0:10,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function ubiquity (mat, RCA = TRUE) ## generate a region - industry matrix in which cells represent the presence/absence of a RCA set.seed(31) mat <- matrix(sample(0:1,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## run the function ubiquity (mat)
This function computes a weighted average of regions or industries from (incidence) regions - industries matrices.
weighted.avg(mat, vec, reg = T)
weighted.avg(mat, vec, reg = T)
mat |
An incidence matrix with regions in rows and industries in columns |
vec |
A vector that will be used to compute the weighted average for each industry/region |
reg |
Logical; Shall the weighted average for regions be returned? Default to TRUE (requires a vector of industry value) but can be set to FALSE (requires a vector of region value) if the weighted average for industries should be returned |
Pierre-Alexandre Balland [email protected]
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## a vector for regions will be used to computed the weighted average of industries vec <- c (5, 10, 15, 25, 50) ## run the function weighted.avg (mat, vec, reg = F) ## a vector for industries will be used to computed the weighted average of regions vec <- c (5, 10, 15, 25) ## run the function weighted.avg (mat, vec, reg = T)
## generate a region - industry matrix set.seed(31) mat <- matrix(sample(0:100,20,replace=T), ncol = 4) rownames(mat) <- c ("R1", "R2", "R3", "R4", "R5") colnames(mat) <- c ("I1", "I2", "I3", "I4") ## a vector for regions will be used to computed the weighted average of industries vec <- c (5, 10, 15, 25, 50) ## run the function weighted.avg (mat, vec, reg = F) ## a vector for industries will be used to computed the weighted average of regions vec <- c (5, 10, 15, 25) ## run the function weighted.avg (mat, vec, reg = T)
This function computes the z-score between pairs of technologies from a patent-technology incidence matrix. The z-score is a measure to analyze the co-occurrence of technologies in patent documents (i.e. knowledge combination). It compares the observed number of co-occurrences to what would be expected under the hypothesis that combination is random. A positive z-score indicates a typical co-occurrence which has occurred multiple times before. In contrast, a negative z-socre indicates an atypical co-occurrence. The z-score has been used to estimate the degree of novelty of patents (Kim 2016), scientific publications (Uzzi et al. 2013) or the relatedness between industries (Teece et al. 1994).
zScore(mat)
zScore(mat)
mat |
A patent-technology incidence matrix with patents in rows and technologies in columns |
Lars Mewes [email protected]
Kim, D., Cerigo, D. B., Jeong, H., and Youn, H. (2016). Technological novelty proile and invention's future impact. EPJ Data Science, 5 (1):1–15
Teece, D. J., Rumelt, R., Dosi, G., and Winter, S. (1994). Understanding corporate coherence. Theory and evidence. Journal of Economic Behavior and Organization, 23 (1):1–30
Uzzi, B., Mukherjee, S., Stringer, M., and Jones, B. (2013). Atypical Combinations and Scientific Impact. Science, 342 (6157):468–472
relatedness.density
, co.occurence
## Generate a toy incidence matrix set.seed(2210) techs <- paste0("T", seq(1, 5)) techs <- sample(techs, 50, replace = TRUE) patents <- paste0("P", seq(1, 20)) patents <- sort(sample(patents, 50, replace = TRUE)) dat <- data.frame(patents, techs) dat <- unique(dat) mat <- as.matrix(table(dat$patents, dat$techs)) ## run the function zScore(mat)
## Generate a toy incidence matrix set.seed(2210) techs <- paste0("T", seq(1, 5)) techs <- sample(techs, 50, replace = TRUE) patents <- paste0("P", seq(1, 20)) patents <- sort(sample(patents, 50, replace = TRUE)) dat <- data.frame(patents, techs) dat <- unique(dat) mat <- as.matrix(table(dat$patents, dat$techs)) ## run the function zScore(mat)