Skip to content

Commit ce6be93

Browse files
committedMay 8, 2016
Seq pages and others, small corrections, updated links etc.
1 parent 89bd5e0 commit ce6be93

File tree

7 files changed

+210
-344
lines changed

7 files changed

+210
-344
lines changed
 

‎wiki/Remove_PDB_disordered_atoms.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ simulations.
2020
Solution
2121
--------
2222

23-
[`Bio.PDB`](http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc118)
23+
[`Bio.PDB`](http://biopython.org/DIST/docs/tutorial/Tutorial.html)
2424
is proficient in dealing with disordered atoms. Each disordered atom has
2525
a property indicating its alternative positions: `atom.altloc`. Usually
2626
there are only two alternative positions labelled *'A'* and *'B'*. The key

‎wiki/Scriptcentral.md

+8-52
Large diffs are not rendered by default.

‎wiki/SearchIO.md

+49-141
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,28 @@
11
---
2-
title: SearchIO
2+
title: Introduction to SearchIO
33
permalink: wiki/SearchIO
44
layout: wiki
55
tags:
66
- Wiki Documentation
77
---
88

9-
Matching the names in BioPerl, Biopython has a [SeqIO](SeqIO "wikilink")
10-
module for sequence file input/output, and [AlignIO](AlignIO "wikilink")
9+
Matching the names in BioPerl, Biopython has a [`SeqIO`](SeqIO "wikilink")
10+
module for sequence file input/output, and [`AlignIO`](AlignIO "wikilink")
1111
for multiple sequence alignment input/output. The third member of the
1212
BioPerl trio is SearchIO, and a Biopython equivalent was written during
1313
summer 2012 by [Google Summer of Code](Google_Summer_of_Code "wikilink")
1414
student Wibowo Arindrarto ([blog](http://bow.web.id/blog/2012/08/back-on-the-main-branch/)).
1515

1616
This covers pairwise sequence search file input/output, for example from
1717
BLAST, HMMER, BLAT, or Bill Pearson's FASTA suite. See the [BioPerl
18-
SearchIO HOWTO](http://www.bioperl.org/wiki/HOWTO:SearchIO) for
18+
SearchIO HOWTO](http://bioperl.org/howtos/SearchIO_HOWTO.html) for
1919
background.
2020

2121
It is included in Biopython 1.61 onwards as an *experimental* module if
22-
you want to test it. An chapter in the
22+
you want to test it. A chapter in the
2323
[Tutorial](http://biopython.org/DIST/docs/tutorial/Tutorial.html)
2424
([PDF](http://biopython.org/DIST/docs/tutorial/Tutorial.pdf)) on
25-
Bio.SearchIO is also published alongside the 1.61 release.
25+
`Bio.SearchIO` is also published alongside the 1.61 release.
2626

2727
This wiki describes the important bits with some small examples. For a
2828
full reference, consult the [API
@@ -31,156 +31,64 @@ documentation](http://biopython.org/DIST/docs/api/Bio.SearchIO-module.html).
3131
Supported File Formats
3232
----------------------
3333

34-
The table below lists all formats supported by Bio.SearchIO. Note that
34+
The table below lists all formats supported by `Bio.SearchIO`. Note that
3535
for writing support, the writer assumes that all the necessary
3636
attributes of the objects being written are present. It is not possible,
3737
for example, to write BLAST XML data to a HMMER 3.0 plain text output
3838
straight away.
3939

40-
<table style="width:100%;">
41-
<caption>Table 1: Bio.SearchIO supported file formats</caption>
42-
<colgroup>
43-
<col width="12%" />
44-
<col width="22%" />
45-
<col width="22%" />
46-
<col width="22%" />
47-
<col width="22%" />
48-
</colgroup>
49-
<thead>
50-
<tr class="header">
51-
<th><p>Format name</p></th>
52-
<th><p>Read</p></th>
53-
<th><p>Write</p></th>
54-
<th><p>Index</p></th>
55-
<th><p>Notes</p></th>
56-
</tr>
57-
</thead>
58-
<tbody>
59-
<tr class="odd">
60-
<td><p>blast-tab</p></td>
61-
<td><p>1.61</p></td>
62-
<td><p>1.61</p></td>
63-
<td><p>1.61</p></td>
64-
<td><p>BLAST+ tabular output (both <code>-m</code> <code>6</code> and <code>-m</code> <code>7</code> flags are supported).</p></td>
65-
</tr>
66-
<tr class="even">
67-
<td><p>blast-text</p></td>
68-
<td><p>1.61</p></td>
69-
<td><p>n/a</p></td>
70-
<td><p>n/a</p></td>
71-
<td><p>BLAST+ plain text output (up to version 2.2.26+). Newer versions may not always work.</p></td>
72-
</tr>
73-
<tr class="odd">
74-
<td><p>blast-xml</p></td>
75-
<td><p>1.61</p></td>
76-
<td><p>1.61</p></td>
77-
<td><p>1.61</p></td>
78-
<td><p>BLAST+ XML output.</p></td>
79-
</tr>
80-
<tr class="even">
81-
<td><p>blat-psl</p></td>
82-
<td><p>1.61</p></td>
83-
<td><p>1.61</p></td>
84-
<td><p>1.61</p></td>
85-
<td><p>BLAT default output (PSL format). Variants with or without header are both supported. PSLX (PSL + sequences) is also supported.</p></td>
86-
</tr>
87-
<tr class="odd">
88-
<td><p>exonerate-text</p></td>
89-
<td><p>1.61</p></td>
90-
<td><p>n/a</p></td>
91-
<td><p>1.61</p></td>
92-
<td><p>Exonerate plain text output. Due to the way Biopython stores its sequences, at the moment support is limited to text outputs without split codons (for protein queries). If you are parsing a text output of protein queries containing split codon alignments (for example, from the <code>protein2genome</code> alignment mode), the parser will fail.</p></td>
93-
</tr>
94-
<tr class="even">
95-
<td><p>exonerate-cigar</p></td>
96-
<td><p>1.61</p></td>
97-
<td><p>n/a</p></td>
98-
<td><p>1.61</p></td>
99-
<td><p>Exonerate cigar string.</p></td>
100-
</tr>
101-
<tr class="odd">
102-
<td><p>exonerate-vulgar</p></td>
103-
<td><p>1.61</p></td>
104-
<td><p>n/a</p></td>
105-
<td><p>1.61</p></td>
106-
<td><p>Exonerate vulgar string.</p></td>
107-
</tr>
108-
<tr class="even">
109-
<td><p>fasta-m10</p></td>
110-
<td><p>1.61</p></td>
111-
<td><p>n/a</p></td>
112-
<td><p>1.61</p></td>
113-
<td><p>Bill Pearson's FASTA <code>-m</code> <code>10</code> output.</p></td>
114-
</tr>
115-
<tr class="odd">
116-
<td><p>hmmer3-domtab</p></td>
117-
<td><p>1.61</p></td>
118-
<td><p>1.61</p></td>
119-
<td><p>1.61</p></td>
120-
<td><p>HMMER3.0 domain table output format. The name <code>hmmer3-domtab</code> per se is in fact not used, since the program name has to be specified. For example, when parsing hmmscan output, the format name would be <code>hmmscan-domtab</code>.</p></td>
121-
</tr>
122-
<tr class="even">
123-
<td><p>hmmer3-tab</p></td>
124-
<td><p>1.61</p></td>
125-
<td><p>1.61</p></td>
126-
<td><p>1.61</p></td>
127-
<td><p>HMMER 3.0 table output format.</p></td>
128-
</tr>
129-
<tr class="odd">
130-
<td><p>hmmer3-text</p></td>
131-
<td><p>1.61</p></td>
132-
<td><p>n/a</p></td>
133-
<td><p>1.61</p></td>
134-
<td><p>HMMER 3.0 plain text output format.</p></td>
135-
</tr>
136-
<tr class="even">
137-
<td><p>hmmer2-text</p></td>
138-
<td><p>1.61</p></td>
139-
<td><p>n/a</p></td>
140-
<td><p>1.61</p></td>
141-
<td><p>HMMER 2.x plain text output format.</p></td>
142-
</tr>
143-
<tr class="odd">
144-
</tr>
145-
</tbody>
146-
</table>
40+
|Format name |Read |Write|Index| Notes |
41+
|---------------|-----|-----|-----|------------------------------------|
42+
|blast-tab |1.61 |1.61 |1.61 |BLAST+ tabular output (both `-m 6` and `-m 7` flags are supported).|
43+
|blast-text |1.61 |n/a |n/a |BLAST+ plain text output (up to version 2.2.26+). Newer versions may not always work.|
44+
|blast-xml |1.61 |1.61 |1.61 |BLAST+ XML output. |
45+
|blat-psl |1.61 |1.61 |1.61 |BLAT default output (PSL format). Variants with or without header are both supported. PSLX (PSL + sequences) is also supported.|
46+
|exonerate-text |1.61 |n/a |1.61 |Exonerate plain text output. Due to the way Biopython stores its sequences, at the moment support is limited to text outputs without split codons (for protein queries). If you are parsing a text output of protein queries containing split codon alignments (for example, from the `protein2genome` alignment mode), the parser will fail.|
47+
exonerate-cigar |1.61 |n/a |1.61 |Exonerate cigar string. |
48+
exonerate-vulgar|1.61 |n/a |1.61 |Exonerate vulgar string. |
49+
fasta-m10 |1.61 |n/a |1.61 |Bill Pearson's FASTA `-m 10` output.|
50+
hmmer3-domtab |1.61 |1.61 |1.61 |HMMER3.0 domain table output format. The name `hmmer3-domtab` per se is in fact not used, since the program name has to be specified. For example, when parsing hmmscan output, the format name would be `hmmscan-domtab`.|
51+
hmmer3-tab |1.61 |1.61 |1.61 |HMMER 3.0 table output format. |
52+
hmmer3-text |1.61 |n/a |1.61 |HMMER 3.0 plain text output format. |
53+
hmmer2-text |1.61 |n/a |1.61 |HMMER 2.x plain text output format. |
54+
14755

14856
Format-specific Arguments
14957
-------------------------
15058

151-
Although mostly similar to Biopython's SeqIO and AlignIO modules, there
152-
is a small difference in the main Bio.SearchIO functions. Depending on
59+
Although mostly similar to Biopython's `SeqIO` and `AlignIO` modules, there
60+
is a small difference in the main `Bio.SearchIO` functions. Depending on
15361
the file format being used, you may pass additional keyword arguments
15462
that determines how the parser / indexer / writer behaves. Shown below
15563
are some formats which accepts extra keyword arguments.
15664

15765
| Format name | Argument name | Default value | Applicable for | Explanation |
15866
|-------------|------------------------------------------|----------------------------|---------------------------------------------------------------------------------------------|-------------------------------------------------------------------------|
159-
| blast-tab | comments | False | reading, writing, indexing | Boolean, whether the input/output file is the commented variant or not. |
160-
| fields | Default BLAST tabular output field names | reading, writing, indexing | Space-separated string, list of fields / columns in the input/output file. |
161-
| blast-xml | encoding | "utf-8" | writing | XML encoding name. |
162-
| indent | " " (empty space) | writing | Character(s) to use for indenting the XML. |
163-
| increment | 2 | writing | How many times the character defined in `indent` are printed when printing a child element. |
164-
| blat-psl | pslx | False | reading, writing, indexing | Boolean, whether the input/output file contains sequences or not. |
67+
| blast-tab | comments | False | Reading, writing, indexing | Boolean, whether the input/output file is the commented variant or not. |
68+
| fields | Default BLAST tabular output field names | reading, writing, indexing | Space-separated string, list of fields / columns in the input/output file. | |
69+
| blast-xml | encoding | "utf-8" | Writing | XML encoding name. |
70+
| indent | " " (empty space) | writing | Character(s) to use for indenting the XML. | |
71+
| increment | 2 | writing | How many times the character defined in `indent` are printed when printing a child element. | |
72+
| blat-psl | pslx | False | Reading, writing, indexing | Boolean, whether the input/output file contains sequences or not. |
16573
| header | False | writing | Boolean, whether to write PSL header or not. |
166-
||
74+
16775

16876
Conventions
16977
-----------
17078

171-
The main goal of creating Bio.SearchIO is to have a common, easy to use
79+
The main goal of creating `Bio.SearchIO` is to have a common, easy to use
17280
interface across different search output files. As such, we have also
173-
created some conventions / standards for Bio.SearchIO that extend beyond
81+
created some conventions / standards for `Bio.SearchIO` that extend beyond
17482
the common object model. These conventions apply to all files parsed by
175-
Bio.SearchIO, regardless of their individual formats.
83+
`Bio.SearchIO`, regardless of their individual formats.
17684

17785
### Python-style sequence coordinates
17886

179-
When storing sequence coordinates (start and end values), Bio.SearchIO
87+
When storing sequence coordinates (start and end values), `Bio.SearchIO`
18088
uses the Python-style slice convention: zero-based and half-open
18189
intervals. For example, if in a BLAST XML output file the start and end
18290
coordinates of an HSP are 10 and 28, they would become 9 and 28 in
183-
Bio.SearchIO. The start coordinate becomes 9 because Python indices
91+
`Bio.SearchIO`. The start coordinate becomes 9 because Python indices
18492
start from zero, while the end coordinate remains 28 as Python slices
18593
omit the last item in an interval.
18694

@@ -191,7 +99,7 @@ use the coordinates to extract part of the query sequence that results
19199
in the database hit.
192100

193101
When these objects are written to an output file using
194-
Bio.SearchIO.write, the coordinate values are restored to their
102+
`Bio.SearchIO.write`, the coordinate values are restored to their
195103
respective format's convention. Using the example above, if the HSP
196104
would be written to an XML file, the start and end coordinates would
197105
become 10 and 28 again.
@@ -203,7 +111,7 @@ sequences according to the sequence's strand. For example, in BLAST
203111
plain text format if the matching strand lies in the minus orientation,
204112
then the start coordinate will always be bigger than the end coordinate.
205113

206-
In Bio.SearchIO, start coordinates are always smaller than the end
114+
In `Bio.SearchIO`, start coordinates are always smaller than the end
207115
coordinates, regardless of their originating strand. This ensures
208116
consistency when using the coordinates to slice full sequences.
209117

@@ -215,29 +123,29 @@ file uses.
215123

216124
Similar to the coordinate style convention, the start and end
217125
coordinates' order are restored to their respective formats when the
218-
objects are written using Bio.SearchIO.write.
126+
objects are written using `Bio.SearchIO.write`.
219127

220128
### Frames and strand values
221129

222-
Bio.SearchIO only allows -1, 0, 1 and None as strand values. For frames,
223-
the only allowed values are integers from -3 to 3 (inclusive) and None.
130+
`Bio.SearchIO` only allows *-1*, *0*, *1* and `None` as strand values. For frames,
131+
the only allowed values are integers from -3 to 3 (inclusive) and `None`.
224132
Both of these are standard Biopython conventions.
225133

226134
FAQ
227135
---
228136

229-
- *How does Bio.SearchIO differ from Bio.Blast.NCBIXML*?
137+
- *How does `Bio.SearchIO` differ from `Bio.Blast.NCBIXML`*?
138+
230139
Both modules are based on completely different object models and are
231140
not compatible with each other. Not only that, the underlying
232141
parsers and writers are also different (indexing is not possible
233-
with Bio.Blast.NCBIXML). Finally, Bio.SearchIO is planned to be the
234-
replacement of Bio.Blast.NCBIXML.
142+
with `Bio.Blast.NCBIXML`). Finally, `Bio.SearchIO` is planned to be the
143+
replacement of `Bio.Blast.NCBIXML`.
235144

236145
<!-- -->
237146

238-
- *How does Bio.SearchIO differ from Bio.Blast.NCBIStandalone*?
239-
Again, they provide different object models. However, Bio.SearchIO
240-
currently uses the parser from Bio.Blast.NCBIStandalone internally,
241-
but that old module will be deprecated.
242-
147+
- *How does `Bio.SearchIO` differ from `Bio.Blast.NCBIStandalone`*?
243148

149+
Again, they provide different object models. However, `Bio.SearchIO`
150+
currently uses the parser from `Bio.Blast.NCBIStandalone` internally,
151+
but that old module will be deprecated.

‎wiki/Seq.md

+13-13
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,35 @@
11
---
2-
title: Seq
2+
title: The Seq Object
33
permalink: wiki/Seq
44
layout: wiki
55
tags:
66
- Wiki Documentation
77
---
88

9-
In Biopython, sequences are usually held as **Seq** objects, which hold
9+
In Biopython, sequences are usually held as ` Seq` objects, which hold
1010
the sequence string and an associated alphabet.
1111

12-
This page describes the Biopython **Seq** object, defined in the Bio.Seq
13-
module (together with related objects like the **MutableSeq**, plus some
12+
This page describes the Biopython `Seq` object, defined in the `Bio.Seq`
13+
module (together with related objects like the `MutableSeq`, plus some
1414
general purpose sequence functions). In addition to this wiki page,
1515
there is a whole chapter in the
1616
[Tutorial](http://biopython.org/DIST/docs/tutorial/Tutorial.html)
1717
([PDF](http://biopython.org/DIST/docs/tutorial/Tutorial.pdf)) on the
18-
**Seq** object - plus its [API
18+
`Seq` object - plus its [API
1919
documentation](http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html)
2020
(which you can read online, or from within Python with the help
2121
command).
2222

2323
If you need to store additional information like a sequence identifier
2424
or name, or even more details like a description or annotation, then we
25-
use a [SeqRecord](SeqRecord "wikilink") object instead. These are the
26-
sequence records used by the [SeqIO](SeqIO "wikilink") module for
25+
use a [`SeqRecord`](SeqRecord "wikilink") object instead. These are the
26+
sequence records used by the [`SeqIO`](SeqIO "wikilink") module for
2727
reading and writing sequence files.
2828

2929
The Seq Object
3030
==============
3131

32-
The Seq object essentially combines a Python string with an (optional)
32+
The `Seq` object essentially combines a Python string with an (optional)
3333
biological alphabet. For example:
3434

3535
``` python
@@ -44,7 +44,7 @@ Alphabet()
4444
In the above example, we haven't specified an alphabet so we end up with
4545
a default generic alphabet. Biopython doesn't know if this is a
4646
nucleotide sequence or a protein rich in alanines, glycines, cysteines
47-
and threonines. If you know, you should supply this information:
47+
and threonines. If *you* know, you should supply this information:
4848

4949
``` python
5050
>>> from Bio.Seq import Seq
@@ -62,7 +62,7 @@ Seq('AGTACACTGGT', ProteinAlphabet())
6262

6363
Why is this important? Well it can catch some errors for you - you
6464
wouldn't want to accidentally try and combine a DNA sequence with a
65-
protein would you:
65+
protein, would you?
6666

6767
``` python
6868
>>> my_protein + my_dna
@@ -77,7 +77,7 @@ methods like translation (see below) on a protein sequence.
7777
General methods
7878
---------------
7979

80-
The Seq object has a number of methods which act just like those of a
80+
The `Seq` object has a number of methods which act just like those of a
8181
Python string, for example the find method:
8282

8383
``` python
@@ -124,7 +124,7 @@ available in Biopython 1.49 onwards.
124124

125125
### Complement and reverse complement
126126

127-
These are very simple - the methods return a new Seq object with the
127+
These are very simple - the methods return a new `Seq` object with the
128128
appropriate sequence and the same alphabet:
129129

130130
``` python
@@ -248,7 +248,7 @@ Traceback (most recent call last):
248248
ValueError: Proteins do not have complements!
249249
```
250250

251-
You can use them on Seq objects with a generic alphabet:
251+
You can use them on `Seq` objects with a generic alphabet:
252252

253253
``` python
254254
>>> my_seq.complement()

0 commit comments

Comments
 (0)