Posts tagged 'gsoc'

Summer's Over

August2012 | biopython, gsoc, searchio | comment

Google Summer of Code 2012 has finally drawn to a close. It's been a great learning experience, one I would not hesitate to recommend to anyone. I've learned a lot in the past few months, not just about writing open source software but also about several bioinformatics applications (that I hope to continue use in the foreseeable future) and even about myself. I'm deeply thankful first and ...

Back on the Main Branch

August2012 | biopython, gsoc, oop, python, searchio | comment

It's been a while since I posted my GSoC updates. The main reason was a considerable change to the main SearchIO object model. It turns out that the trio of QueryResult, Hit, HSP I had been using objects was not sufficient to consistently model outputs from all the search program I have encountered. So with Peter's guide, I've spent most of my time writing and rewriting several different models, ...

Exonerate in SearchIO

July2012 | biopython, exonerate, gsoc, python, searchio | comment

One of the things I enjoy during my time developing SearchIO in the past few weeks is that I get to play with many different programs and see how they behave. Even for programs that I thought I'm familiar with, I sometimes still see unanticipated behaviors (hint: sequence coordinates). It's like the old days of Windows 95, when you would try to delete a file and see how that affects your computer ...

Parsing BLAST Plain Text Files in SearchIO

July2012 | biopython, blast, gsoc, python, searchio | comment

BLAST plain text output is a tricky beast. It's the output format easiest to read for us, humans, but it's arguably harder for computers to read compared to its XML our tabular counterparts. One reason is because NCBI themselves give no guarantee that the output stays the same between different BLAST versions. This means that for each different BLAST version, there is a chance that a given parser ...

Initial Blat Support

July2012 | biopython, blat, gsoc, python, searchio | comment

For the past week, I have been working on two similar formats: PSL and PSLX (spec). The PSL format is the default output of BLAT, but has found many uses across different programs. These formats themselves are simple; with 21 (PSL) or 23 (PSLX) tab-separated columns and an optional header. PSLX itself is basically PSL plus two extra columns that contain the hit and query sequences. In this post, ...