The GPCR 3DM database

Note: Before doing this practical it is best to first do the practical on NR ligand binding, which is the introduction to 3DM.

A: Introduction

Login at 3dm.bio-prodict.nl with your 3DM account. If you don’t have a 3DM account you can request one via the “get 3DM” tab. To be able to do this course you need at least a course login. After you have requested an account you can request a course login by sending an email to Joosten@bio-prodict.nl

Open the GPCR database at 3dm.bio-prodict.nl. By default the GPCR 3DM database is showing data of the GPCR family class A sequences. There are other classes of GPCRs too. You will investigate these later. Unless specifically requested otherwise, always use the GPCR class A protein family to answer the questions.

B: CorNet

Open 3DMs correlated mutation analysis tool CorNet (use the icon). You can see two big networks and a couple of small ones. You wonder if the two bigger networks really are two separate networks or if they are simply disconnected because of the fact that the default cut-off for the number of positions in the network is 25. Now answer the following questions:

Correlated mutations often divide the superfamily into several groups based on sequence motifs. Sometimes this is just a evolutionary separation (e.g bacterial sequences vs eukaryotes), but because 3DM uses superfamily alignments containing proteins with different functions the correlation often are a result of the changes between these different functional groups.

Now select the other big network and color the nodes blue with the “node coloring” option on the right. Select 247,248 and make them yellow. Select 42 and 32 and make them Magenta. Select 60 and 171 (the only two positions of the sub-network that are not light bleu yet) and make them red. Click on “visualize all nodes” and open the resulting Yasara scene with Yasara.

In Yasara use 3DM-> Load from 3DM and load the 1F88A drug

C: Families and numbering schemes

There are different families in the GPCR database. For each different GPCR class an alignment has been generated. These alignments are structure based and all have their own structural conserved core. Because these cores differ in length the different alignments have their own numbering schemes. Via the “families” scrol down menu you can switch between the different families.

Right click on the system management icon to open this in a new tab.

Now select the superfamily 3DM and look at where the conserved residues are (which 3D numbers) using the “data statistics icon”. Now do the same for the group A sequences (this you can do best in the other tab you still have open so you can switch between the tabs).

Select from the “tools” scroll-down menu at the top of 3DM the “select numbers” option and switch to gpcr-A numbering in both alignments.

Realize that between different 3D alignments the 3D numbers cannot be compared (do you understand why you can’t select gprc_a numbers if you have the GPCR class B alignment selected?). Because the size of the structural conserved cores differ between the different families different numbering schemes have been developed for the different subclasses. Each of these numbering schemes indicates for which subclass the numbering is. Although it could have been even better if they had used a common numbering for the transmembrane parts that is shared by all classes, it is a luxury that is only available in the GPCR protein family and has not been developed for any other protein family (as far as we know).

D: Making a structural model with 3DM

3DM contains an automated homology-modeling module. It uses the 3DM alignment between a sequence and a structure to generate models. At the protein detail page of each sequence of the alignment you can find a model tab. Here you can select a suitable template to generate a homology model of the protein. 3DM already makes a pre-selection of suitable templates based on sequence similarity. You can choose any of the pre-selected ones. Which of these templates is best is for you to decide. This decision should be based on what you want to do with your model. For instance, enzymes can be in the open or closed conformation. If you want your model in the closed conformation, then start with a template that is in the closed conformation. Other factors can be: with or without ligand, with an activating/inhibiting ligand bound in the ligand binding pocket, etc. So, investigate the different starting templates to see which one fits best to your needs.

Hirschsprung disease is a disease that is caused by mutations in a GPCR.

Select a template and make a model. Use Yasara to open it. Making the model may take a few minutes. Note that you can either choose “download as PDB” or “load in Yasara”. The first option makes a normal PDB file. The second option makes a yasara scene file that can only be opened by Yasara.

If 3DM wasn’t able to model parts of the sequence, you will see purple dots in Yasara. Usually this is because the alignment between the sequence and the template cannot reliably be made (often the parts outside the core) due to very low sequence similarities. Realize that those parts cannot be modeled using this template because if the sequence similarity is so low the two proteins will likely fold differently in those parts. Sometimes it helps making a model choosing a different template (if available), but usually this means that those parts can simply not be modeled reliably.

E: Visualize data in structures

To visualize data in a structure you can use the “visualize data in structure” option . Here you can select multiple structures. All structures are superimposed. You can select one or more of the templates used to generate the alignment from the first column. You can also select from the second column any of the other structures that could be superimposed on the templates. The third column called “non-aligned pdb files” contains structures that were detected by BLAST but 3DM could not superimpose them on the templates. These structures usually are not belonging to the superfamily or are structurally too distantly related. If you have generated a model with 3DM a column called “models” will be available where you can select this model. The name of the model indicates which protein is modeled and between brackets the template that was used to make the model. Do you see the one you made? From the “organic ligand” column you can select any ligand. These are also superimposed. You can select any combination of protein structure and ligand, so you can insert any ligand into any structure as the ligands are also superimposed in the system.

Click on the “data statistics” icon in 3DM . Make sure you have the GPCR class A family selected (see left top of 3DM) and scroll down to the “Human variation/Position” histogram. You can compare this histogram with other data types using the “compare with” option. Choose here the “transmembrane region” option.

Note that the button can be found at many 3DM plots for direct visualization of data in Yasara.

F: Hotspot Baskets

There are other export buttons too, such as the “export to hotspot basket” button. Let’s see how this works. Open the hotspot basket tool by clicking on the “HOTSPOTS” tab at the right corner of 3DM. At the SNP histogram this sign will appear: . This icon is for the insertion of selected positions in a hotspot basket. Save the SNP positions in a new hotspot basket, give the basket a name, and save the basket.

A hotspot basket is nothing more than a selection of alignment positions. You can generate hotspots for different protein features (e.g. correlated mutations, specificity hotspots, thermostability hotspots, etc) and those can be found using 3DM. At later stage you can open the basket in different 3DM tools. Let’s see how this works.

A good trick for increasing thermostability of enzymes is to make mutations at flexible regions. Go to the “data statistics” page . Here you will find two measures for flexible positions. The RMSD (a measure for how tight the structures superimpose) and the average B-factor. The B-factor is a measure for how sharp the X-ray diffraction was for the atoms of a residue. A very sharp X-ray indicates that the amino acid is tightly positioned in the structure. The average B-factor is the average over all amino acids from all structures at an alignment position. Often these two plots show a similar pattern. If at a position both plots are high usually this is a good hotspot for changing thermostability.

G: Panel design

The Panel design tool allows to group sequences of the superfamily. The idea is of this tool is be able to select sequences from the alignment such that they are maximally distributed over the superfamily. There are two ways of making groups. One is to divide the alignment purely based on phylogenetic information. The other method divides the alignment based on sequence motifs at specific positions. Both methods can be combined.

Select from “tools” at the left top of 3DM the “panel design” option. First we will divide the super-family based on sequence motifs. Because we want maximize the specificity range in the panel, we use the “specificity hotspot” basket you have generated in question 31. The idea is that, if sequences have exactly the same sequence motif at specificity hotspots, then they are likely to have the same specificity. Use the “add hotspots” button to select the hotspot basket you made in Q31.

Once they are selected click on “show sequence groups”.

You can also divide the alignment phylogenetically. This separation can be combined to the motif grouping. In the box under “phylogenetic groups” you can give a number. The superfamily will then be divided into this number of groups. Type 10 in the box and click again on “show sequence groups”.

After defining the groups you have to select from each group the sequences you want in the panel. There are several selection options that can be used to pick sequences from each group. First you select to number of sequences per group with the “proteins per group” option. Then several options can be used to determine which sequences are selected. These options are there to maximize the chance that selected sequences can be expressed. If there is literature available for a sequence, for instance, or if there is a structure available, then the chance that this protein can be expressed is higher since someone else has done it before. The different selection options can be combined using a “must have” or a “prefer” options. “must have” will result in a smaller number of groups, because not all groups will have a structure available, for instance. Usually it is a good idea to start with a larger number of groups that you want to have in your panel and delete some groups with these different options. Now play with the different options and see if you understand the result. To see the effect that the different options have you have to click on “select proteins”. Any surprises? Can you make a panel of approximately 96 sequences for which you require swiss-prott, literature- and structure available.

Note that you can make the total panel smaller by clicking the “panel size” option. This will remove sequences by ensuring maximum diversity in the remaining panel sequences.

H: Literature on thermostability

Go to the “data statistics” page of 3DM and scroll down to the “keyword mutation” histogram. Use the keyword “thermostable mutant”. At position 30 there is one thermostable mutation found in the literature. You can click on the bar at position 30 in the histogram. This will link to a page showing the corresponding literature. From here you can download the paper if you have access. If you don’t have access you can also find the paper here.

The R124Y mutation is in the human Endothelin type B receptor (EDNRB_HUMAN).