Based on the file generated with prepare_data()
, which contains the total depth and sequencing coverage for each
nucleotide (A, C, G, and T), this function remove all but single nucelotide polymorphisms.
When supplied, this function will filter on coverage or allele frequency. To filter by total coverage,
a user must supply the min.depth
and max.depth.quantile.prob
. If an error
is provided,
sites will be retained where allele coverage is greater than the sequencing error rate times
the total coverage, but less than one minus the sequencing error rate times the total coverage.
Lastly, based on trunc
, allele frequencies will be filtered based on a provided lower
and upper bound. Finally, the function samples a single allele frequency per site to avoid data duplication.
Usage
process_data(
file,
min.depth = 2,
max.depth.quantile.prob = 0.9,
error = 0.01,
trunc = c(0, 0)
)
Arguments
- file
Output txt file created with
prepare_data()
.- min.depth
Minimum sequencing depth, default as 2.
- max.depth.quantile.prob
Maximum sequencing depth quantile cut off, default = 0.9.
- error
Sequencing error rate. If an
error
is provided, sites will be retained where allele coverage is greater than the sequencing error rate times the total coverage, but less than one minus the sequencing error rate times the total coverage.- trunc
List of two values representing the lower and upper bounds, \(c_{L}\) and \(c_{U}\) which are used to filter allele frequencies.