Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

ChainParseError: 2 antibody domains in sequence #7

Open
deweihu96 opened this issue Mar 10, 2022 · 4 comments
Open

ChainParseError: 2 antibody domains in sequence #7

deweihu96 opened this issue Mar 10, 2022 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@deweihu96
Copy link

anarci supports 2 domains in one sequence, while abnumber does not

abnumber.exceptions.ChainParseError: Found 2 antibody domains in sequence: "DIQLTQSPSFLSASVGDRVTITCSARSSISFMYWYQQKPGKAPKLLIYDTSNLASGVPSRFSGSGSGTEFTLTISSLEAEDAATYYCQQWSSYPLTFGQGTKLEIKGGGSGGGGEVQLVESGGGLVQPGGSLRLSCAASGFTFSTYAMNWVRQAPGKGLEWVGRIRSKYNNYATYYADSVKDRFTISRDDSKNSLYLQMNSLKTEDTAVYYCVRHGNFGNSYVSWFAYWGQGTLVTVSSGGCGGGEVAALEKEVAALEKEVAALEKEVAALEKGGGDKTHTCPPCPAPEAAGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKISKAKGQPREPQVYTLPPSREEMTKNQVSLWCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK"

@prihoda
Copy link
Owner

prihoda commented Mar 13, 2022

Hi @deweihu96, thanks for reporting this, I would like to support this in the future. A pull request would be welcome.

The current AbNumber Chain object can only hold a single variable domain, with a single CDR3, etc. So probably this cannot be supported using chain = Chain(seq, 'imgt'), but using a separate call like chains = Chain.parse_domains(seq, 'imgt').

So if you have a sequence like Var1Const1Var2Const2, you should get two Chain objects where the chain.tail corresponds to any sequence that immediately follows the variable domain (chain1.tail = "Const1")

@prihoda prihoda added enhancement New feature or request help wanted Extra attention is needed labels Mar 13, 2022
@deweihu96
Copy link
Author

Hi @prihoda ~ Thanks for your reply. The simplest way that I came up with is:

  1. Use anarci to find two domains, and slice the sequences in two domains;
  2. Use abnumber to do numbering on two sequences.

@prihoda
Copy link
Owner

prihoda commented Apr 14, 2022

@deweihu96 sounds good. Can you share the part of the code where you parse the anarci output?

@deweihu96
Copy link
Author

deweihu96 commented Apr 19, 2022

@prihoda

>>> import anarci
>>> seq = 'QIQLVQSGSELKKPGASVKVSCKASGYTFTHYAMNWVRQAPGQGLEWMGWINTNTGEPTYAQGFTGRFVFSLDTSVSTAYLQISSLKAEDTAVYYCAREREPGMDEWGQGTLVTVSSGGGGSSSSSSDVVMTQSPLSLPVTLGQPASISCRSSQSLVHANTNTYLEWYQQRPGQSPRLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCFQGTHVPNTFGQGTKLEIK'
>>> sequences, numbered, alignment_details, hit_tables =  anarci.run_anarci(seq,'kabat',allowed_species='human')
>>> alignment_details                                                         
#[[
#{'id': 'human_H', 'description': '', 'evalue': 1.4e-55, 'bitscore': 178.0, 'bias': 1.0, 'query_start': 0, 'query_end': 117, 'species': 'human', 'chain_type': 'H', 'scheme': 'imgt', 'query_name': 'Input sequence'}, 
#{'id': 'human_K', 'description': '', 'evalue': 1.9e-56, 'bitscore': 180.6, 'bias': 0.1, 'query_start': 127, 'query_end': 239, 'species': 'human', 'chain_type': 'K', 'scheme': 'imgt', 'query_name': 'Input sequence'}]]

Once you have the start and end positions, slice the sequence and parse them with abnumber: )

I noticed that you're also one of the authors of biophi. I want to say that's a really great job!

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants