
Here is what the reads look like before Filtlong. Each dot is a read and the marginal histograms show the length distribution (top) and identity distribution (right).
The length N50 is 24,077 bp (i.e. half the bases are in a read 24,077 bp long or longer). The identity N50 is 85.60% (i.e. half the bases are in a read with 85.60% or higher identity). |
Filtlong has cut the original 1.3 Gbp of reads down to a much better 500 Mbp subset. Short reads and low identity reads have been mostly removed.
Length N50 = 36,827 bp Identity N50 = 88.53% |
With an external reference, Filtlong is better able to judge read quality, and now most remaining reads are 85% identity or better. The length distribution has suffered a bit, however, because when outputting a fixed amount of reads (500 Mbp in this case), there is a trade-off between length and quality.
Length N50 = 28,713 bp Identity N50 = 88.94% |
Trimming and splitting has further improved the read identity. This is especially apparent at the short side of the length distribution where a lot more reads now exceed 92% identity. Some of these high-identity shorter reads will be parts of longer reads which were split.
Length N50 = 28,407 bp Identity N50 = 89.37% |
These settings greatly improve the length distribution, but the length-quality trade-off results in more low-identity reads.
Length N50 = 43,877 bp Identity N50 = 87.89% |
These settings produce the best identity distribution, with most reads now 87% identity or better. Length now has a relatively lower weight in the score function, so many shorter reads are kept.
Length N50 = 14,127 bp Identity N50 = 89.83% |





