Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

pwalign #3361

Closed
10 tasks done
hpages opened this issue Mar 21, 2024 · 17 comments
Closed
10 tasks done

pwalign #3361

hpages opened this issue Mar 21, 2024 · 17 comments
Assignees
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution WARNINGS

Comments

@hpages
Copy link
Contributor

hpages commented Mar 21, 2024

Update the following URL to point to the GitHub repository of
the package you wish to submit to Bioconductor

Confirm the following by editing each check box to '[x]'

  • I understand that by submitting my package to Bioconductor,
    the package source and all review commentary are visible to the
    general public.

  • I have read the Bioconductor Package Submission
    instructions. My package is consistent with the Bioconductor
    Package Guidelines.

  • I understand Bioconductor Package Naming Policy and acknowledge
    Bioconductor may retain use of package name.

  • I understand that a minimum requirement for package acceptance
    is to pass R CMD check and R CMD BiocCheck with no ERROR or WARNINGS.
    Passing these checks does not result in automatic acceptance. The
    package will then undergo a formal review and recommendations for
    acceptance regarding other Bioconductor standards will be addressed.

  • My package addresses statistical or bioinformatic issues related
    to the analysis and comprehension of high throughput genomic data.

  • I am committed to the long-term maintenance of my package. This
    includes monitoring the support site for issues that users may
    have, subscribing to the bioc-devel mailing list to stay aware
    of developments in the Bioconductor community, responding promptly
    to requests for updates from the Core team in response to changes in
    R or underlying software.

  • I am familiar with the Bioconductor code of conduct and
    agree to abide by it.

I am familiar with the essential aspects of Bioconductor software
management, including:

  • The 'devel' branch for new packages and features.
  • The stable 'release' branch, made available every six
    months, for bug fixes.
  • Bioconductor version control using Git
    (optionally via GitHub).

For questions/help about the submission process, including questions about
the output of the automatic reports generated by the SPB (Single Package
Builder), please use the #package-submission channel of our Community Slack.
Follow the link on the home page of the Bioconductor website to #.

@bioc-issue-bot
Copy link
Collaborator

Hi @hpages

Thanks for submitting your package. We are taking a quick
look at it and you will hear back from us soon.

The DESCRIPTION file for this package is:

Package: pwalign
Title: Perform pairwise sequence alignments
Description: The two main functions in the package are pairwiseAlignment() and
	stringDist(). The former solves (Needleman-Wunsch) global alignment,
	(Smith-Waterman) local alignment, and (ends-free) overlap alignment
	problems. The latter computes the Levenshtein edit distance or pairwise
	alignment score matrix for a set of strings.
biocViews: Alignment, SequenceMatching, Sequencing, Genetics
URL: https://bioconductor.org/packages/pwalign
BugReports: https://github.com/Bioconductor/pwalign/issues
Version: 0.99.0
License: Artistic-2.0
Encoding: UTF-8
Authors@R: c(
	person("Patrick", "Aboyoun", role="aut"),
	person("Robert", "Gentleman", role="aut"),
	person("Hervé", "Pagès", role="cre",
	       email="hpages.on.github@gmail.com"))
Depends: BiocGenerics, S4Vectors, IRanges, Biostrings (>= 2.71.5)
Imports: methods, utils
LinkingTo: S4Vectors, IRanges, XVector, Biostrings
Enhances: Rmpi
Suggests: RUnit
LazyLoad: yes
Collate: 00datacache.R
	utils.R
	InDel-class.R
	AlignedXStringSet-class.R
	PairwiseAlignments-class.R
	PairwiseAlignmentsSingleSubject-class.R
	PairwiseAlignments-io.R
	align-utils.R
	pid.R
	substitution_matrices.R
	pairwiseAlignment.R
	stringDist.R
	zzz.R

@bioc-issue-bot bioc-issue-bot added the 1. awaiting moderation submitted and waiting clearance to access resources label Mar 21, 2024
@hpages
Copy link
Contributor Author

hpages commented Mar 21, 2024

pwalign contains the pairwiseAlignment-related stuff taken from Biostrings. The plan is to deprecate this stuff in Biostrings (in BioC 3.19), and to redirect the user to the stuff that is now in pwalign. Then to defunct it in Biostrings (in BioC 3.20), and to finally remove it from Biostrings (in BioC 3.21).

The motivations for this split are:

  • Biostrings is too big and hard to maintain. In particular, the pairwiseAlignments-related stuff in it adds a lot of complexity to the package (via additional specialized classes, generics, and methods, and a lot of complex C code to support them). This split will make Biostrings about 20% smaller. This in turn will make its maintenance easier and will also make R CMD check slightly faster.
  • About 500 Bioconductor packages depend on Biostrings. However, very few of them need the pairwiseAlignment functionality. So this split won't affect most of Biostrings revdeps. However they will now depend on a lighter Biostrings that will be slightly faster to install (faster to download, compile, and load).
  • Separation of responsibilities: The plan is for Aidan Lakshman (@ahl27) to take over maintenance of Biostrings in the near future while I'll remain the maintainer of pwalign. This split will remove the burden of maintaining the pairwiseAlignment-related stuff from Aidan's plate.

H.

@lshep
Copy link
Contributor

lshep commented Mar 22, 2024

I can pass this into building the reports however it will fail until the latest version of Biostrings is available.

@lshep lshep added the pre-check passed pre-review performed and ready to be added to git label Mar 22, 2024
@hpages
Copy link
Contributor Author

hpages commented Mar 22, 2024

Biostrings 2.71.5 (latest version) is already on nebbiolo1 so we should be good to go.

@lshep
Copy link
Contributor

lshep commented Mar 22, 2024

@hpages
Copy link
Contributor Author

hpages commented Mar 22, 2024

It doesn't need to. It's on the machine.

@bioc-issue-bot
Copy link
Collaborator

Your package has been added to git.bioconductor.org to continue the
pre-review process. A build report will be posted shortly. Please
fix any ERROR and WARNING in the build report before a reviewer is
assigned or provide a justification on why you feel the ERROR or
WARNING should be granted an exception.

IMPORTANT: Please read this documentation for setting
up remotes to push to git.bioconductor.org. All changes should be
pushed to git.bioconductor.org moving forward. It is required to push a
version bump to git.bioconductor.org to trigger a new build report.

Bioconductor utilized your github ssh-keys for git.bioconductor.org
access. To manage keys and future access you may want to active your
Bioconductor Git Credentials Account

@bioc-issue-bot bioc-issue-bot added pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean and removed 1. awaiting moderation submitted and waiting clearance to access resources pre-check passed pre-review performed and ready to be added to git labels Mar 22, 2024
@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 22.04.3 LTS): pwalign_0.99.0.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/pwalign to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@hpages
Copy link
Contributor Author

hpages commented Mar 22, 2024

The 2 WARNINGs were expected.

One is about using RMarkdown instead of Sweave for the vignette. Note that the vignette was just taken from Biostrings and put in pwalign. You can see it here. It was written a long time ago by Patrick Aboyoun, the original author of the pairwiseAlignment stuff. Since it contains a lot of mathematical formulae that would be tricky to translate to markdown, I don't intend to make the conversion, at least not for now.

The other WARNING is about "Empty or missing \value sections found in man pages.". This is a false positive that I reported here yesterday.

Let me know if you have questions.

H.

@lshep lshep added 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place and removed pre-review on bioconductor git and access to on demand build but not assigned reviewer until build report clean labels Mar 25, 2024
@bioc-issue-bot
Copy link
Collaborator

A reviewer has been assigned to your package for an indepth review.
Please respond accordingly to any further comments from the reviewer.

@LiNk-NY
Copy link

LiNk-NY commented Apr 3, 2024

Hi Hervé, @hpages

Thank you for your submission.
Please see the review below.

Best regards,
Marcel


pwalign

  • Overall, the package is great for pairwise alignment and calculating
    string distance matrices. The code seems to be robust but some unit tests
    can still increase the coverage. It seems that the useMpi functionality is
    in question. I mostly provided minor notes given that this code is
    established and ported over from Biostrings.

DESCRIPTION

  • Note that LazyLoad field is now ignored. The replacement LazyData should
    be set to false or not included. Users should use data(...) to load a
    dataset rather than have them in the .GlobalEnv.

NAMESPACE

  • Looks good.

vignettes

  • Consider converting the Rnw file to Rmd.
  • Looks good.

R

  • RTobjs does not seem to be used anywhere, consider its removal.
  • Minor: Consider reducing cyclomatic complexity by simply coercing type
    rather than checking and coercing, e.g. in
    mismatchSummary,AlignedXStringSet0-method (and others),
weight <- as.integer(weight)

## instead of

if (!is.integer(weight))
    weight <- as.integer(weight)
  • Minor: To avoid repetition, perhaps use a default
    compareStrings,ANY,ANY-method to coerce both pattern and subject inputs
    to character and dispatch to the compareStrings,character,character-method
setMethod("compareStrings",
          signature = c(pattern = "ANY", subject = "ANY"),
          function(pattern, subject) {
              compareStrings(as.character(pattern), as.character(subject))
          })
  • It looks like useMpi is disabled. Will it work again or should it be
    removed?
  • The arguments should list all possible options e.g., in stringDist the
    default method argument should be the vector of possibilities i.e.,
    c("levenshtein", "hamming", "quality", "substitutionMatrix") and match.arg
    will ensure that one of them is selected
  • Consider promoting functions from Biostrings to exported functions rather
    than using ::: (in R/utils.R).

tests

  • Consider increasing the coverage of some files like
    AlignedXStringSet-class.R:
> covr::package_coverage(type = "all")
pwalign Coverage: 76.46%
R/AlignedXStringSet-class.R: 0.00%
R/align-utils.R: 38.89%
R/PairwiseAlignmentsSingleSubject-class.R: 39.22%
R/pairwiseAlignment.R: 49.49%
R/zzz.R: 50.00%
R/00datacache.R: 66.67%
R/stringDist.R: 71.76%
R/PairwiseAlignments-class.R: 72.22%
R/PairwiseAlignments-io.R: 86.01%
R/substitution_matrices.R: 87.76%
src/align_pairwiseAlignment.c: 89.47%
src/align_utils.c: 99.48%
R/InDel-class.R: 100.00%
src/R_init_pairwiseAlignment.c: 100.00%

@bioc-issue-bot
Copy link
Collaborator

Received a valid push on git.bioconductor.org; starting a build for commit id: be1b36bf4fe64419cbcc64b9316c823fd576bb51

@bioc-issue-bot
Copy link
Collaborator

Dear Package contributor,

This is the automated single package builder at bioconductor.org.

Your package has been built on the Bioconductor Single Package Builder.

On one or more platforms, the build results were: "WARNINGS".
This may mean there is a problem with the package that you need to fix.
Or it may mean that there is a problem with the build system itself.

Please see the build report for more details.

The following are build products from R CMD build on the Single Package Builder:
Linux (Ubuntu 22.04.3 LTS): pwalign_0.99.1.tar.gz

Links above active for 21 days.

Remember: if you submitted your package after July 7th, 2020,
when making changes to your repository push to
git@git.bioconductor.org:packages/pwalign to trigger a new build.
A quick tutorial for setting up remotes and pushing to upstream can be found here.

@hpages
Copy link
Contributor Author

hpages commented Apr 4, 2024

Thanks Marcel for the feedback.

Note that LazyLoad field is now ignored.

Removed.

Consider converting the Rnw file to Rmd.

See my previous comment from 2 weeks ago above.

RTobjs does not seem to be used anywhere, consider its removal.

Removed.

if (x is not the thing we want)
     x <- turn_x_into_the_thing_we_want(x)

I prefer that idiom over an unconditional x <- turn_x_into_the_thing_we_want(x). That's because even if x is already the thing we want, sadly turn_x_into_the_thing_we_want() is not guaranteed to be a no-op. For example as.character(x) will drop the names of character vector x, and as(x, "A") might transform x even if is(x, "A") is TRUE. It might matter (e.g. when the object is big and turn_x_into_the_thing_we_want(x) triggers a copy) or not (like here).

FWIW this has hit me a few times in the past so I got into the habit of systematically using the if (x is not the thing we want) x <- turn_x_into_the_thing_we_want(x) idiom without even thinking about it.

Minor: To avoid repetition, perhaps use a default compareStrings,ANY,ANY-method etc...

I simplified the compareStrings() methods a bit. Minor disavantage of a compareStrings,ANY,ANY-method that blindly coerces anything you throw at it to character is that it might do some weird/unexpected things for some exotic stuff. And the error that will result in that case will probably not be of great help to the end user.

It looks like useMpi is disabled. Will it work again or should it be removed?

I disabled this. Rmpi has been in Enhances (as opposed to Suggests) for the last 15 years or so, and the way things are implemented in pairwiseAlignments() is that it will be used only if the user explicitly loads it before calling the function. This means that the useMpi mode has not been tested on the daily builds for the last 15 years. Furthermore, since this is an undocumented feature, I suppose that nobody has ever used it, except Patrick. Last but not least: it's not covered by the unit tests either.

I might re-enable it at some point in the not too distant future but some serious testing will be required first. Also, this predates BiocParallel so the Rmpi approach might be completely obsolete, I don't know. Will need to revisit, test, assess, and decide what to do with it.

Disclaimer: I've never used Rmpi myself (Patrick Aboyoun implemented this) .

The arguments should list all possible options e.g., in stringDist()

Usually yes. I think maybe the reason Patrick didn't do it in this case is that the list of all possible values for the method argument is a little bit long (c("levenshtein", "hamming", "quality", "substitutionMatrix")) so it could be ugly to see such a long list in the definition of the S4 generic and all its methods, especially in the \usage section of the man page. Also maybe not all the stringDist() methods might support all these options at the moment, or future methods might want to support different options.
As long as the man page for stringDist() lists all the supported method's I can live with that.

Consider promoting functions from Biostrings to exported functions rather than using :::

My understanding is that this is acceptable when the upstream and client packages have the same maintainer, which is why R CMD check doesn't say anything in that case.

H.

@LiNk-NY
Copy link

LiNk-NY commented Apr 5, 2024

Hi Hervé, @hpages
Thanks for making those changes.
The package has been accepted.
Best regards,
Marcel

@LiNk-NY LiNk-NY added 3a. accepted will be ingested into Bioconductor daily builder for distribution and removed 2. review in progress assign a reviewer and a more thorough review of package code and documentation taking place labels Apr 5, 2024
@bioc-issue-bot
Copy link
Collaborator

Your package has been accepted. It will be added to the
Bioconductor nightly builds.

Thank you for contributing to Bioconductor!

Reviewers for Bioconductor packages are volunteers from the Bioconductor
community. If you are interested in becoming a Bioconductor package
reviewer, please see Reviewers Expectations.

@lshep
Copy link
Contributor

lshep commented Apr 16, 2024

The default branch of your GitHub repository has been added to Bioconductor's
git repository as branch devel.

To use the git.bioconductor.org repository, we need an 'ssh' key to associate with your github user name. If your GitHub account already has ssh public keys (https://github.com/hpages.keys is not empty), then no further steps are required. Otherwise, do the following:

  1. Add an SSH key to your github account
  2. Submit your SSH key to Bioconductor

See further instructions at

https://bioconductor.org/developers/how-to/git/

for working with this repository. See especially

https://bioconductor.org/developers/how-to/git/new-package-workflow/
https://bioconductor.org/developers/how-to/git/sync-existing-repositories/

to keep your GitHub and Bioconductor repositories in sync.

Your package will be included in the next nigthly 'devel' build (check-out from git at about 6 pm Eastern; build completion around 2pm Eastern the next day) at

https://bioconductor.org/checkResults/

(Builds sometimes fail, so ensure that the date stamps on the main landing page are consistent with the addition of your package). Once the package builds successfully, you package will be available for download in the 'Devel' version of Bioconductor using BiocManager::install("pwalign"). The package 'landing page' will be created at

https://bioconductor.org/packages/pwalign

If you have any questions, please contact the bioc-devel mailing list (https://stat.ethz.ch/mailman/listinfo/bioc-devel); this issue will not be monitored further.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
3a. accepted will be ingested into Bioconductor daily builder for distribution WARNINGS
Projects
None yet
Development

No branches or pull requests

4 participants