Despite the unequal representation of ancestry groups in genomic research, some studies in underrepresented populations have been very successful. In this section, we discuss flourishing genomic studies in underrepresented populations, mostly from LMICs in Africa, Asia, Latin America—but the problem of genomic underrepresentation is not restricted to LMICs; therefore, we also highlight a case study from Australia. For each of these exemplary studies, we reflect on factors contributing to their successes.
Large-scale genomics research in Africa has so far been driven mainly by international funding, with very few examples of government-funded national-level initiatives such as the Southern African Human Genome Programme39. MalariaGen40 was among the first studies to be based on a cohort that spanned multiple African countries. The focus of this study on the genetics of both the parasite and the host enabled it to capture snapshots of human genetic diversity, especially in some of the malaria-endemic geographic regions of Africa. However, the H3Africa consortium was the first major pan-African study to have a comprehensive spread across the continent and across a wide variety of diseases and traits25. As well as investigating communicable and noncommunicable diseases, the consortium has contributed to developments in several major aspects of genetics research such as ethics and community engagement, data sharing and governance, and disease awareness, as well as technical developments including dissemination of bioinformatics skills, and design of genotyping array and analysis tools41. Next, we focus on two cohorts, the Uganda Genome Resource (UGR) study and the AWI-Gen study (a collaborative center of the H3Africa consortium), that are cross-sectional in terms of their populations and have been generating key insights into disease genetics.
Strategic collaboration and capacity building: the Uganda Genome Resource
The UGR represents the largest published genomic study of continental Africans to date42. This study leveraged an already existing strategic collaboration between the Uganda Virus Research Institute, and the University of Cambridge and Sanger Institute in the United Kingdom. In 1989, the Uganda General Population Cohort was established by the Uganda Virus Research Institute and partners to examine trends in prevalence and incidence of HIV infection and their determinants43. A genomic study of communicable and noncommunicable diseases was then launched in 2011 with this same cohort. The successful implementation of genomic research here can be attributed to existing local infrastructure in Uganda, long-standing collaborations with genomic centers of excellence in the United Kingdom, and strategic funding that included a research capacity-building component. For example, the author Segun Fatumo is a former H3Africa Bioinformatics Network (H3ABioNet) fellow who was funded to do postdoctoral research training in statistical genetics and bioinformatics at the Sanger Institute and University of Cambridge. During this training, he was strategically positioned to take a lead role in analyses of the UGR. Following this training and research, Segun Fatumo has since continued to maintain the genomic resources locally, in addition to leading other genomic studies10,11,12,44. Furthermore, this resource has enabled significant new insights for population genetics and genetic epidemiology. For example, a genetic variant known to cause the inherited blood disorder alpha thalassemia was significantly associated with glycated hemoglobin, a biomarker commonly used in the diagnosis of diabetes42. This variant is thought to have become more frequent among African populations because it can prevent severe malaria42.
Building on existing resources—Africa Wits-INDEPTH Partnership for Genomic Research
Africa Wits-INDEPTH Partnership for Genomic Research (AWI-Gen) is an NIH-funded cross-sectional population cohort of about 12,000 adults (predominantly 40–60 years) from six centers spanning four African countries—Ghana, Burkina Faso, Kenya and South Africa. It was set up by a strategic regional partnership between the University of the Witwatersrand, Johannesburg and the International Network for the Demographic Evaluation of Populations and Their Health (INDEPTH) study. The existing Health and Demographic Surveillance System centers and the Developmental Pathways for Health Research Unit have longitudinal cohorts that provided the research infrastructure, including long-standing community engagement, trained fieldworkers and detailed longitudinal demographic and phenotype data. This mutually beneficial partnership enabled the project to span Africa with a wide representation of social and genetic variability, resulting in more than 40 publications across disciplines including epidemiology, disease awareness, population genetics, candidate gene studies and gene–environment interaction45,46,47,48,49. Several major GWAS are close to publication and have led to partnerships with large, global consortia such as the Global Lipid Genetics Consortium and Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium. Sustained funding has enabled the transformation of AWI-Gen into a longitudinal cohort. The achievements of the AWI-Gen study are in part attributable to the strategy of building on existing resources and forming long-term partnerships based on benefit sharing among institutions within LMIC settings.
Beyond the research itself, a major achievement of these studies lies in the sharing of bioinformatics and genomics skills across the continent. For example, the annual Introduction to Bioinformatics course run by the H3ABioNet has trained over 3,000 students in the last 8 years50. In addition, the network has hosted more than 30 workshops for basic and advanced training in areas such as GWAS, next-generation sequencing, microbiome analysis and data management41,50. Similarly, the setup and development of several biobanks across the continent in association with these projects could have a catalytic effect for research and development initiatives in future. Finally, as these studies reach completion, we anticipate that some of the outcomes will benefit the communities who participated and will also contribute to the bio-economic landscape of the respective LMICs.
The importance of funding: Pakistan Alliance on genetic RisK factors for Health
South Asians make up one-sixth of the world population, with 1.38 billion people living in India alone. Pakistan and many other countries in the region have a high rate of consanguineous marriages and have been the focus of gene-mapping studies for recessive disorders for the last few decades. There is a long list of disorders for which mutations have been discovered in families from these regions including hearing impairment51, intellectual disability52, microcephaly53 and visual conditions54. These studies have contributed to the global efforts for the study of genetic causes of recessive disorders and their underlying biology. In the process genotyping and sequencing, data have been created that can now be leveraged to address questions about population structure, population-specific allele frequencies and ancestry55. This will require collaborative networks, data storage and access mechanisms that follow ELSI guidelines. The Greater Middle East Variome Project is one such successful example (http://igm.ucsd.edu/gme/).
However, South Asians are particularly underrepresented in genomic research of complex diseases. With a target recruitment of 30,000 patients admitted to a psychiatry clinic and 15,000 control participants, PARKH (Pakistan Alliance on genetic RisK factors for Health) is one of the largest international case–control studies utilizing genetic data. Over a period of 20 years, the team have built extensive links with other institutions across Pakistan through small family-based studies52,56,57, which eventually enabled a sizable pilot sample collection. Local connections, cultural understanding, knowledge of the administrative and regulatory processes, resilience and the flexibility to navigate an ever-changing research landscape have been the key factors in the success of these projects. The collaboration between Pakistani, US- and UK-based researchers was a decisive factor in opening up access to funding resources. For example, one of the three PARKH sister studies, DIVERGE, is funded by a starting grant worth €2.5 million from the European Research Council, for which only researchers in the European Union and a select group of partner countries are eligible. The two other sister studies, the GENetics of SChizophRenia In Pakistan (GEN-SCRIP) and GENetics of BipoLar Disorder In Pakistan (GEN-BLIP), have been funded by the US National Institute of Mental Health (award numbers R01MH112904-01 and R01MH12377, respectively). PARKH demonstrates that building and maintaining infrastructure and a network for data collection as well as international collaborations can be the foundation for repeated funding success and may serve as motivation for ambitious large-scale strategies. In the case of PARKH, none of the funders provided a dedicated capacity-building component. Rather, the investigators implemented their own strategies, which included hiring local researchers for diverse roles.
Study design can also play an important role in enabling sustained research activity. For the DIVERGE study, a dedicated cross-disciplinary working group designed a protocol that captures diverse outcomes and putative risk factors for depression to enable multidisciplinary research on depression genetics, pharmacogenetics, interactions between genes and traumatic life events and epidemiological analyses of socioeconomic factors. Importantly, local investigators took key roles in the study design to ensure that factors relevant to the studied populations were captured in the data collection.
Consortium building for aggregation of large-scale genomic data—The Latin American Genomics Consortium
The term ‘Latin American’ refers to a pan-ethnicity used for the large, diverse group of people who come from Latin American countries. Additionally, people in other countries who identify with Latin American origins are often identified as Hispanic or Latinx American. Latinx populations have complex ancestry including recent admixture. Commonly used analytical approaches may not sufficiently address population stratification in these groups; for example, the use of principal components as covariates (whereby a large set of variables is condensed into a smaller, more simplistic set) only accounts for global ancestry but not for local ancestry for a given genomic region. In addition to the lack of dedicated genomic studies in these groups, individuals with admixed ancestry are systematically excluded from existing studies due to these concerns around population stratification. The recently established Latin American Genomics Consortium aims to address these issues within the field of psychiatric genetics (https://latinamericangenomicsconsortium.org/). This consortium includes over 100 scientists from eight Latin American countries, Puerto Rico and the United States. The group harmonizes data from existing cohorts and has a total of 100,000 samples, mostly from the United States, but there are plans to recruit new participants and establish a biobank.
The development of analytical methods for samples with admixed ancestry is an active field of research. One promising albeit computationally intensive approach is a software framework known as Tractor, which identifies haplotype segments and assigns them to ancestral origins, followed by an ancestry-specific association analysis58.
The importance of the community in setting research priorities—the Tiwi Island Aboriginal population
Aboriginal and Torres Strait Islander people in Australia are one of the largest indigenous populations in the world, comprising hundreds of groups, each with their own distinct language, history and cultural traditions. The Tiwi Land Council signed an historic research agreement to formalize Tiwi control of research priorities, research information, and samples including biobanking in genomic studies59. The Tiwi people have therefore proactively participated and engaged with research into kidney disease and other chronic conditions in their community for more than 30 years or more, with stakeholders providing ethical guidance for researchers and support for communities themselves60. At one point, the Tiwi community raised local financial support and external funds, specifically the Stanley Tipiloura Fund, to support research into kidney disease61.
Crucially, members of the Tiwi community have worked as staff in all research projects conducted within their community61, and have contributed to the application of genetics research to study its origins, migrations, customs, relationships and health issues61. The Tiwi Island Aboriginal population is therefore an example of best practice for indigenous-led initiatives with a substantial proportion of indigenous researchers and leaders. This is further illustrated by the recently launched National Centre for Indigenous Genomics, which not only demonstrates genuine partnerships with community but also is governed by an indigenous-majority board.
Collective lessons or key learnings
The success of the cohorts and studies described above illustrate that, with sufficient funding, it is possible for indigenous groups and those at LMIC institutions to scale up in resources and skills to enable high-quality genomics research in less than a decade. These examples should motivate funders to support both ongoing and new ventures that are led by LMIC researchers. Moreover, publications in top-tier journals and presentations at major conferences have provided them the opportunity to participate in large-scale, global studies. We hope that in future they would be able to not only extend their research to larger cohorts but also move closer to leading some of these large-scale global studies. As an example of this, two key contributors to the AWI-Gen study (including one of the authors of the current paper, T.C.) were recently provided the opportunity to co-lead one of the CHARGE consortium phase 2 studies.