The canonical correlation complexity method

Önder Nomaler & Bart Verspagen


A relatively recent, yet rapidly proliferating strand of literature in the so-called econophysics domain, known as 'economic complexity' , introduces a toolkit to analyse the relationship between specialization, diversification, and economic development. Different methods that aim at reducing the high dimensionality in data on the empirical patterns of co-location (be it nations or regions) of specializations have been proposed. In terms of the concepts of machine learning, the existing algorithms follow the framework of 'unsupervised learning'. The competing alternatives (e.g., Hidalgo and Hausmann, 2009 vs. Tacchella et al, 2012) have been based on very different assessments of which products depend on more complex capabilities, and accordingly yield highly different estimations of complexity at the product level. The approach that we developed avoids this algorithmic 'confusion' by drawing on a toolkit of more transparent and long-established methods that follow the 'supervised learning' principle where the data on trade/specialization and development are processed together from the very beginning in order to identify the patterns of mutual association. The first pillar of the toolkit, Principal Component Analysis (PCA), serves dimensionality reduction in co-location information. The second pillar, Canonical Correlation Analysis (CCA), identifies the mutual-association between the various patterns of (co-)specialization and more-than-one dimension of economic development. This way, we are able to identify the products or technologies that can be associated with the level or the growth rate of per capita GDP and CO2 emissions.

Keywords: Economic complexity, economic development, supervised learning, canonical correlation analysis, principal component analysis

JEL Classification: F14, F63, O11

Download the working paper