NAME
    polyester_polyA.pl - A Perl application for enhancing the polyester RNA
    sequencing simulation tool.

VERSION
    version 0.02

SYNOPSIS
    polyester_polyA.pl [options]

DESCRIPTION
    The purpose of the application is to enhance the polyester RNA
    sequencing simulation tool by including polyA tails in the reference RNA
    being used to generate the simulated sequencing data. The application is
    a wrapper around the R package polyester, which only accounts for the
    processes of fragmentation, reverse complementation and sequencing when
    generating data. Note that the Perl application does not (at this
    moment) include the possibility of passing logspline R objects as
    parameters to the R script and the the polyester "simulate_experiment"
    function. The command line options are the same as the ones in the
    polyester R package, with the exception of: * The addition of the
    --taildist option, which is mandatory and specifies the tail
    distribution to be used. * The addition of the --distparams option,
    which is mandatory and specifies the parameters of the distribution. *
    The addition of the --maxseqs option, which is optional and specifies
    whether to break the single fasta file generated by the application into
    multiple files of a specified maximum number of sequences. * The
    addition of the --modformat option, which is optional and specifies the
    format for storing modifications (one of JSON, YAML, or MessagePack). *
    BONUS: provide a R script that can be used to control the polyester
    simulation process from the command line (polyester.R) All other
    parameters have the same interpretation and semantics as in the
    polyester R package.

OPTIONS
    --bias, -b [STRING]
        Fragment selection bias (optional).

    --distparams, -P [FLOAT1 FLOAT2 ...]
        Distribution parameters (mandatory, list of numeric values).

    --errormodel, -e [STRING]
        Error model (optional).

    --errorrate, -E [FLOAT]
        Error probability (optional).

    --fastafile, -f [PATH]
        Fasta file path (mandatory).

    --fcfile, -c [PATH]
        Fold change file path (optional).

    --fraglen, -F [INTEGER]
        Fragment length (average) (optional).

    --fragsd, -S [INTEGER]
        Fragment length (standard deviation) (optional).

    --gcbias, -g [INTEGER]
        GC bias (optional).

    --modformat, -m [INTEGER]
        Case insensitive format for storing modifications (one of JSON,
        YAML, or MessagePack) (optional).

    --maxseqs, -m [INTEGER]
        Maximum sequences per file (optional).

    --numreps, -n [INTEGER1 INTEGER2 ...]
        Number of replicates in each group (optional, list).

    --outdir, -o [PATH]
        Path to output directory (optional).

    --paired, -p [TRUE|FALSE]
        Paired reads (optional).

    --readlen, -R [INTEGER]
        Read length (optional).

    --readsfile, -r [PATH]
        Reads per transcript file path (optional).

    --seed, -d [INTEGER]
        Random seed (optional).

    --strandspec, -s [TRUE|FALSE]
        Strand specificity (optional).

    --taildist, -t [STRING]
        Tail distribution (mandatory).

    --writeinfo, -w [INTEGER]
        Save simulation info (optional).

EXAMPLES
      polyester_polyA.pl --fastafile myseq.fasta --taildist gamma \
      --distparams 125.0 1.0 0.0 250.0 --fraglen 100 --fragsd 10 \
      --numreps 1 --strandspec TRUE --readlen 75 --paired F \
      --maxseqs 1000 --modformat YAML --outdir /path/to/output

TODO
    *   Add the possibility of passing logspline R objects as parameters to
        the R script and the polyester "simulate_experiment" function.

    *   Add the possibility of adding UMI tags to sequences.

    *   Add the possibility of adding sequencing adapters to sequences.

SEE ALSO
    *   polyester <https://github.com/alyssafrazee/polyester>

        Polyester is an R package designed to simulate RNA sequencing
        experiments with differential transcript expression.Given a set of
        annotated transcripts, Polyester will simulate the steps of an
        RNA-seq experiment (fragmentation, reverse-complementing, and
        sequencing) and produce files containing simulated RNA-seq reads.
        Simulated reads can be analyzed using your choice of downstream
        analysis tools. Polyester has a built-in wrapper function to
        simulate a case/control experiment with differential transcript
        expression and biological replicates. Users are able to set the
        levels of differential expression at transcripts of their choosing.
        This means they know which transcripts are differentially expressed
        in the simulated dataset, so accuracy of statistical methods for
        differential expression detection can be analyzed.

        Polyester offers several unique features:

        * Built-in functionality to simulate differential expression at the
        transcript level * Ability to explicitly set differential expression
        signal strength * Simulation of small datasets, since large RNA-seq
        datasets can require lots of time and computing resources to analyze
        * Generation of raw RNA-seq reads, as opposed to alignments or
        transcript-level abundance estimates * Transparency/open-source code

AUTHOR
    Christos Argyropoulos <chrisarg@cpan.org>

COPYRIGHT AND LICENSE
    This software is copyright (c) 2024 by Christos Argyropoulos.

    This is free software; you can redistribute it and/or modify it under
    the same terms as the Perl 5 programming language system itself.