Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Pip-installability #19

Open
arogozhnikov opened this issue Mar 14, 2024 · 3 comments
Open

Pip-installability #19

arogozhnikov opened this issue Mar 14, 2024 · 3 comments

Comments

@arogozhnikov
Copy link

Hi David,
I've challenged myself to move ANARCI from conda to pip, I've seen you previously have been working on same task in your fork.

you can find result here, currently mimics 2021.02.04, still requires hmmer binary installed
https://github.com/arogozhnikov/microANARCI

It fails AbNumber tests (just as conda version).
Question: why do you think that older version is more correct? Everyone seems to use the latest from conda.

@prihoda
Copy link
Owner

prihoda commented Jun 10, 2024

Hi @arogozhnikov,

I also made some effort to migrate to pyhmmer which enables pure pip install without hmmer dependency, I even created a PR to ANARCI repo but there was no response: oxpig/ANARCI#38

Based on my tests it works, but it would definitely require more testing from their side.

About using version 2020.04.23 - at the time it was due to this bug, not sure if it has been fixed yet: oxpig/ANARCI#17

@arogozhnikov
Copy link
Author

thanks, that's looks like a gross issue for a numbering tool

@arogozhnikov
Copy link
Author

Hey David, I've checked current ANARCI (main branch) and it does not have this exact issue that you posted (oxpig/ANARCI#17).

I've updated my version with what is in the main branch right now, installable with

git+https://github.com/arogozhnikov/microANARCI@ada238704b41eb6a7f51fa12ff32167e94244c8f

However it still fails the same test in AbNumber, and now additionally fails germline assignment

_________________________________________________ test_light_chain_IMGT_position_21 __________________________________________________

    def test_light_chain_IMGT_position_21():
        # Check bug from ANARCI 2021.02.04
        # When numbering full Kappa chains, position IMGT 21 contains a gap
        # When numbering V gene only, position IMGT 21 contains an amino acid as expected
        # Test against this by making sure that same numbering is assigned when numbering V gene and VJ genes concatenated
        # https://github.com/oxpig/ANARCI/issues/17
        for germline in HUMAN_IMGT_IG_V['K']['aligned_sequences']:
            v_seq = HUMAN_IMGT_IG_V['K']['aligned_sequences'][germline].replace('-', '')
            first_j_gene = list(HUMAN_IMGT_IG_J['K']['aligned_sequences'].keys())[0]
            j_seq = HUMAN_IMGT_IG_J['K']['aligned_sequences'][first_j_gene].replace('-', '')
            vj_seq = v_seq + j_seq
            try:
                v_chain = Chain(v_seq, 'imgt')
                vj_chain = Chain(vj_seq, 'imgt')
            except Exception as e:
                print(e)
                continue
            v_positions = [str(p) for p in v_chain.positions]
            vj_positions = [str(p) for p in vj_chain.positions]
    
            len_limit = len(v_seq) - 20
>           assert ','.join(v_positions[:len_limit]) == ','.join(vj_positions[:len_limit])
E           AssertionError: assert 'L1,L2,L3,L4,...8,L89,L90,L91' == 'L1,L2,L3,L4,...9,L90,L91,L92'
E             
E             Skipping 63 identical leading characters in diff, use -v to show
E             - L19,L20,L22,L23,L24,L25,L26,L27,L28,L29,L36,L37,L38,L39,L40,L41,L42,L43,L44,L45,L46,L47,L48,L49,L50,L51,L52,L53,L54,L55,L56,L57,L65,L66,L67,L68,L69,L70,L71,L72,L73,L75,L76,L77,L78,L79,L80,L81,L84,L85,L86,L87,L88,L89,L90,L91,L92
E             + L19,L20,L21,L22,L23,L24,L25,L26,L27,L28,L29,L36,L37,L38,L39,L40,L41,L42,L43,L44,L45,L46,L47,L48,L49,L50,L51,L52,L53,L54,L55,L56,L57,L65,L66,L67,L68,L69,L70,L71,L72,L74,L75,L76,L77,L78,L79,L80,L83,L84,L85,L86,L87,L88,L89,L90,L91

test_bugs.py:66: AssertionError
______________________________________________________ test_germline_assignment ______________________________________________________

    def test_germline_assignment():
        light_seq = 'ELVMTQSPSSLSASVGDRVNIACRASQGISSALAWYQQKPGKAPRLLIYDASNLESGVPSRFSGSGSGTDFTLTISSLQPEDFAIYYCQQFNSYPLTFGGGTKVEIK'
        light_chain = Chain(light_seq, scheme='imgt', assign_germline=True)
        assert light_chain.v_gene == 'IGKV1-13*02'
        assert light_chain.j_gene == 'IGKJ4*01'
        light_chain = Chain(light_seq, scheme='imgt', assign_germline=True, allowed_species='mouse')
>       assert light_chain.v_gene == 'IGKV11-125*01'
E       AssertionError: assert 'IGKV11-106*02' == 'IGKV11-125*01'
E         
E         - IGKV11-125*01
E         ?         ^^  ^
E         + IGKV11-106*02
E         ?         ^^  ^

test_chain.py:228: AssertionError
========================================================== warnings summary ==========================================================

I'm a bit lost here, because ANARCI is widely used, but it is neither actively developed, nor produces coherent results across the versions, and doesn't have any tests that would confirm the absence of regressions - so I don't expect any future version to pass your tests either.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants