In-Silico Characterization of Pathogenic Missense nsSNPs in AAMDC Human Gene Involved in Onset of Breast Cancer

Research Article

Austin J Comput Biol Bioinform. 2023; 4(1): 1017.

In-Silico Characterization of Pathogenic Missense nsSNPs in AAMDC Human Gene Involved in Onset of Breast Cancer

Muhammad Ali Raza*

Department of Biological Sciences, University of Sialkot, Pakistan

*Corresponding author: Muhammad Ali Raza Department of Biological Sciences, University of Sialkot, Pakistan. Email: [email protected]

Received: October 27, 2023 Accepted: November 28, 2023 Published: December 05, 2023

Abstract

Breast cancer is a type of cancer that occurs when cells in the breast tissue grow uncontrollably. It may be caused by activation of growth receptors and/or mutations in oncogenes. Single Nucleotide Polymorphisms (SNPs) are variations in the DNA sequence that occur when a single nucleotide in the genome is altered. Non-synonymous SNPs (nsSNPs) can alter the amino acid sequence of a protein, potentially affecting its structure, function, and interaction with other proteins. In the context of breast cancer research, in-silico nsSNPs analysis can help identify specific genetic mutations that may be associated with increased breast cancer risk or treatment response. A newly discovered gene, AAMDC, is an oncogene, mutation in which can lead to breast cancer. The purpose of this study is to find out the possible and vulnerable mutational sites in its sequence. Designed study primarily focused in-silico structural analysis and functional analysis of nsSNPs associated with AAMDC gene. 175 most damaging nsSNPs of AAMDC gene were analyzed using different bioinformatics tools. After sequence analysis and protein stability prediction using SIFT, Polyphen-2, CADD, and I-Mutant2.0, 10 nsSNPs were shortlisted using consensus based approach. All nsSNPs were analyzed for disease association prediction using IMutant2.0, PhD-SNP and PANTHER; 3 nsSNPs were shortlisted based on consensus approach. Then structural and functional variation of these damaging SNPs was analyzed using MUTPRED2 and SNAP2. The goal of current study is to check out the damaging effect of nsSNPs associated with structure and function of AAMDC gene. This study can help us in understanding breast cancer genetics and its prevention and treatment.

Keywords: Breast Cancer; nsSNPs; AAMDC; in-silico

Introduction

Cancer is characterized by uncontrolled and abnormal cell division leading to the proliferation of cells. The types of cancer are classified based on the origin of the organ or tissue and the molecular characteristics of the cancer cells [1]. After lung cancer, the most prevalent type of cancer worldwide is breast cancer [2]. Breast cancer can affect individuals of any age, both males and females, but it occurs more frequently in females over the age of 40. Each year, approximately 1 million cases are recorded globally, with 60% of these cases coming from low- and middle-income nations. In Pakistan, breast cancer is the leading cause of death in women due to cancer [3]. Several risk factors have been linked to breast cancer in women, including the presence of estrogen, postmenopause, late menopause, obesity, and high levels of endogenous estradiol [2,4-6]. Breast cancer is not uniformly caused at the molecular level; instead, it can be brought on by one or more factors. These molecular characteristics include activation of the HER2 (human epidermal growth factor receptor 2), activation of the oestrogen and progesterone receptors, and/or BRCA mutations [7].

The oncogene, Adipogenesis associated Mth938 domain containing (AAMDC), is believed to play a crucial role in the regulation of fat cell differentiation. According to NCBI, it is localized at 11q14.1 and functions in the cytoplasm [8]. In addition, AAMDC is known to work in conjunction with RNA polymerase II, positively regulating transcription and negatively regulating the apoptotic process. In situations of metabolic stress, such as estrogen deprivation, AAMDC is known to constitutively activate the PI3K-AKT-mTOR pathway, leading to the survival of ER+ breast cancers [9]. The PI3K/AKT/mTOR pathway is an important intracellular signaling mechanism that regulates the cell cycle, impacting cellular dormancy, proliferation, and cancer [10]. The AAMDC protein provides a protective shield to cancer cells, hindering their ability to be treated with anti-cancer hormone therapy. This protein has the ability to alter the metabolic processes of breast cancer cells, triggering growth pathways and facilitating their growth and division [11].

SNP refers to a variation in a single nucleotide at a specific position in the DNA sequence. It plays a crucial role in understanding the relationship between genetics and diseases [12]. In the human genome, SNPs account for over 90% of sequence variations and are used to identify genetic variations and biomarkers [13]. Owing to their widespread frequency, simplicity in analysis, affordability in genotyping, and the application of statistical and bioinformatics tools, single nucleotide polymorphisms (SNPs) are considered as the most useful biomarkers for the diagnosis of illnesses or prognosis [14]. The goal of SNP research in disease genetics is to identify single nucleotide polymorphisms (SNPs) that alter cellular biological processes and result in diseased states [15].

Materials and Methods

None of the SNPs study has specifically investigated the role of the AAMDC gene in the development of the Breast Cancer. This study aims to fill this gap by conducting a computational analysis of the AAMDC gene to identify any potential Single Nucleotide Polymorphisms (SNPs). The analysis is divided into two domains: sequence analysis of the gene for SNPs and structural and functional analysis of the gene. Both domains involve in-silico methods.

Data Retrieval and Pre-Processing

To gather information about the AAMDC gene, two well-known genomic databases, NCBI and ENSEMBL, were consulted. The data obtained from these sources included the gene's description, chromosomal location, transcripts, genomic segments, and resulting products. Additionally, non-synonymous single nucleotide polymorphisms (nsSNPs) obtained from these databases were analyzed to determine which nsSNPs were the most damaging. To achieve this, several bioinformatics tools were employed, including SIFT, PolyPhen-2, and CADD.

NCBI: NCBI houses numerous databases containing biological information, such as GenBank for DNA sequences and PubMed for biomedical literature [16].

ENSEMBL: Ensembl allows users to retrieve protein sequences, predict missense amino acids, perform multiple sequence alignments, annotate genes, and predict regulatory functions of proteins [17]. The SNPs of all the transcripts of AAMDC gene, including the missense Single Nucleotide Polymorphisms (SNPs), were extracted, processed manually in excel and documented.

Sequence Analysis

Sequence analysis was performed to check the damaging effect of the SNPs in our gene of interest i.e. AAMDC, using following tools.

Prediction of damaging SNPs: SNPs which have a negative impact on protein stability, mRNA, protein structure and function, are considered as "Damaging SNPs" and have been associated with various diseases. To determine the effect of SNPs on protein sequences, a comprehensive analysis was performed on a filtered set of most damaging non-synonymous SNPs using various bioinformatics tools.

SIFT: The SIFT (http://sift.bii.a-star.edu.sg) tool is a computational tool that predicts the functional impact of substitutions by taking into account the sequence homology and physical properties of amino acids, including naturally occurring non-synonymous polymorphisms and laboratory-induced missense mutations [18]. SIFT was used to analyze a set of most damaging 175 filtered non-synonymous single nucleotide polymorphisms (nsSNPs) from Ensembl data, with the aim of identifying nsSNPs that have a damaging effect on protein sequences.

POLYPHEN-2: Polyphen-2, Polymorphism Phenotyping v2, is a computational tool that assesses the impact of amino acid changes on the structure and function of human proteins. It utilizes basic physical and evolutionary principles to predict the functional effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on a given protein [19]. The filtered set of most damaging 175 nsSNPs were added as input into Polyphen-2 to evaluate their potential for damaging effects on protein sequence and identify nsSNPs that have a detrimental impact on protein function.

CADD (Combined Annotation Dependent Depletion): The Combined Annotation Dependent Depletion (CADD) system calculates a unified score that takes into account multiple genomic annotations, by comparing the variations that have survived natural selection with those that are simulated mutations [20]. This system was used to analyze most damaging 175 shortlisted non-synonymous single nucleotide polymorphisms (nsSNPs) by incorporating them as input into the CADD system, to determine the potential deleterious effect of amino acid changes on the protein sequence.

Protein-Stability Prediction

Protein stability is the overall balance of forces that determines whether a protein will be in its natural, folded structure or a denatured (unfolded or stretched) condition. The prediction of protein stability was carried out by using following database.

I-Mutant 2: I-Mutant2.0 program utilizes Support Vector Machine (SVM) algorithms to predict the effect of single point mutations on protein stability. as quantified by DeltaDeltaG values, and classifies the direction of the stability change resulting from the mutation [21]. To evaluate the impact of amino acid changes on protein structure, most damaging 175 nsSNPs with both new and wild type sequences were added as input into the I-Mutant2.0 program

Disease Association Prediction

The process of Disease Association Prediction aims to assess the detrimental effects of nsSNPs on the specified gene using bioinformatics tools.

PhD SNP: PhD SNP (https://snps.biofold.org/phd-snp/phd-snp.html) operates using the FASTA format for protein sequences and was designed to assess the damaging effects of amino acid changes in protein structure and their correlation with various diseases associated with the AAMDC gene [22]. To accomplish this, most damaging 175 nsSNPs, consisting of both new and wild type variations, were incorporated as input into the PhD SNP for analysis.

PANTHER: Panther (Protein Analysis Through Evolutionary Relationships) tool is designed to evaluate the potential impact of nsSNPs on protein function. The score for a given nsSNP is determined through a Hidden Markov Model (HMM) alignment of evolutionary related proteins [23].

Structural and Functional Prediction

The structural and functional changes in the protein due to nsSNP were assessed using the bioinformatics tool given below

MUTPRED2: MutPred2 is a tool that categorizes nsSNPs as either harmful or benign, and predicts their effect on over 50 different protein characteristics [24]. MutPred2 was used to evaluate the structural and functional effects of most damaging 175 nsSNPs.

SNAP2: SNAP2 is a bioinformatics tool that utilizes neural networks to predict the impact of single amino acid variations on a protein's function [25].

Results and Discussion

Data Retrieval and Pre-Processing

The National Center for Biotechnology Information (NCBI) is a highly regarded repository for biotechnology and biomedicine-related resources and databases. The gene known as AAMDC has been assigned alternate designations, including CK067, PTD015, and C11orf67. This gene is located on the 11q14.1 chromosomal region and consists of 12 exonic regions [26].

The Ensembl genome browser was used to retrieve the transcripts and corresponding sequences of the AAMDC gene. The gene has 11 transcript variants and 207 orthologues, as depicted in Table 1. The SNPs of all transcripts were documented processed manually in excel and nsSNPs were shortlisted in excel.