In Linux we trust!

BLAST practice part (PHP)

Recently i described theoretical aspect of BLAST algorithm. In this article i'm going to provide some practice examples of BLAST in PHP.

Before we start, let's prepare workspace:

$ mkdir test-blast
$ cd test-blast
$ curl -sS 'https://getcomposer.org/installer' | php
$ chmod a+x *.phar
$ ./composer.phar require ejz/functions

This package includes 1-threaded implementation of BLAST. Every test script should include vendor/autoload.php file in order to load all necessary dependencies.

Simple string search:

$s1 = 'Lazy fox and quick dog.';
$s2 = 'The quick brown fox jumps over the lazy dog.';
$results = quick_blast([$s1, $s2], 5);
print_r($results);
Array
(
    [0] => Array
        (
            [0] => 7
            [1] => 12
            [2] => 3
        )
    [1] => Array
        (
            [0] => 5
            [1] => 4
            [2] => 15
        )
    [2] => Array
        (
            [0] => 5
            [1] => 18
            [2] => 39
        )
)

Multiple strings:

$s1 = 'Lazy fox and quick dog.';
$s2 = 'The quick brown fox jumps over the lazy dog.';
$s3 = 'quick';
$results = quick_blast([$s1, $s2, $s3], 5);
print_r($results);
Array
(
    [0] => Array
        (
            [0] => 5
            [1] => 13
            [2] => 4
            [3] => 0
        )
)

Highlight BLAST results:

If you wish to highlight BLAST results in a human-readable way, use the following snippet:

$s1 = 'Lazy fox and quick dog.';
$s2 = 'The quick brown fox jumps over the lazy dog.';
$results = quick_blast([$s1, $s2], 5);
$tags = [['<u>', '</u>'], ['<s>', '</s>']];
$html = highlight_quick_blast_results($s1, 1, $results, 3, $tags);
echo $html, "\n";
$html = highlight_quick_blast_results($s2, 2, $results, 3, $tags);
echo $html, "\n";
Lazy<u> fox </u>and<u> quick<s> </s></u><s>dog.</s>
The<u> quick </u>brown<u> fox </u>ju..y<u> dog.</u>

BLAST results

First argument is string you want highlight. Second - its index inside $results. Third - $results (i.e. output of quick_blast). Fourth - length of context adjacent to BLAST match. Fifth - array of items (in most cases HTML tags) you want to use in highlighting process.

Tokenize the strings:

The most easy way to use regular expression to speed up searching large strings:

$s1 = file_get_contents('http://github.com');
$s2 = file_get_contents('http://wikipedia.org');
$tokenizer = '~(?<!<|</)\b\w+\b(?!>)~';
$results = quick_blast([$s1, $s2], 5, compact('tokenizer'));
$tags = [["\x00\x00", "\x00\x01"]];
$tr = ["\x00\x00" => '<u>', "\x00\x01" => '</u>'];
$html = highlight_quick_blast_results($s1, 1, $results, 15, $tags);
echo strtr(esc($html), $tr), "\n";
$html = highlight_quick_blast_results($s2, 2, $results, 15, $tags);
echo strtr(esc($html), $tr), "\n";

BLAST HTML results

Of course you can provide your own tokenizer instead of regular expression. Custom tokenizer is a callable function that returns a zero-based array of tokenized elements. Example of an element:

Array
(
    [token] => html
    [pos] => 16
)

That's all! Happy BLAST'ing!