package Statistics::Data;
use strict;
use warnings FATAL => 'all';
use Carp qw(croak);
use List::AllUtils qw(all);
use Number::Misc qw(is_even);
use Scalar::Util qw(looks_like_number);
use String::Util qw(hascontent nocontent);
our $VERSION = '0.09';

=head1 NAME

Statistics::Data - Load, access, update, check and save one or more sequences of data for statistical analysis

=head1 VERSION

This is documentation for Version 0.09 of Statistics/Data.pm, released February 2015.

=head1 SYNOPSIS

 use Statistics::Data 0.09;
 my $dat = Statistics::Data->new();
 
 # managing labelled sequences:
 $dat->load({'aname' => \@data1, 'anothername' => \@data2}); # labels are arbitrary
 $aref = $dat->access(label => 'aname'); # gets back a copy of @data1
 $dat->add(aname => [2, 3]); # pushes new values onto loaded copy of @data1
 $dat->dump_list(); # print to check if both arrays are loaded and their number of elements
 $dat->unload(label => 'anothername'); # only 'aname' data remains loaded
 $aref = $dat->access(label => 'aname'); # $aref is a reference to a copy of @data1
 $dat->dump_vals(label => 'aname', delim => ','); # proof in print it's back 

 # managing multiple anonymous sequences:
 $dat->load(\@data1, \@data2); # any number of anonymous arrays
 $dat->add([2], [6]); # pushes a single value apiece onto copies of @data1 and @data2
 $aref = $dat->access(index => 1); # returns reference to copy of @data2, with its new values
 $dat->unload(index => 0); # only @data2 remains loaded, and its index is now 0

 # managing single anonymous sequence, over time:
 $dat->load(1, 2, 2);
 $dat->add(1); # loaded sequence is now 1, 2, 2, 1
 $dat->dump_vals(); # same as: print @{$dat->access()}, "\n";
 $dat->save_to_file(path => 'variables1.dat'); # write to disk
 $dat->unload(); # all gone - go wild, knowing that you can ...
 $dat->load_from_path(path => 'variables1.dat'); # back again - go to work 
 $dat->dump_vals(); # proof: same printed output as before
 # do more work

=head1 DESCRIPTION

Handles data for some other statistics modules, as in loading, updating and retrieving data for analysis. Performs no actual statistical analysis itself.

Rationale is not wanting to write the same or similar load, add, etc. methods for every statistics module, not to provide an omnibus API for Perl stat modules. It, however, encompasses much of the variety of how Perl stats modules do the basic handling their data. Used for L<Statistics::Sequences|Statistics::Sequences> (and its sub-tests). 

=head1 SUBROUTINES/METHODS

Manages caches of one or more lists of data for use by some other statistics modules. The lists are ordered arrays comprised of literal scalars (numbers, strings). They can be loaded, added to (updated), accessed or unloaded by referring to the index (order) in which they have been loaded (or previously added to), or by a particular label. The lists are cached within the class object's '_DATA' aref as an aref itself, optionally associated with a 'label'. The particular structures supported here to load, update, retrieve, unload data are specified under L<load|Statistics::Data/load>. Any module that uses this one as its base can still use its own rules to select the appropriate sequence, or provide the appropriate sequence within the call to itself.

=head2 Constructors

=head3 new

 $dat = Statistics::Data->new();

Returns a new Statistics::Data object.

=cut

sub new {
    my $class = shift;
    my $self = bless {}, ref($class) ? ref($class) : $class;
    $self->{_DATA} = [];
    return $self;
}

=head3 clone

 $new_self = $dat->clone();

I<Alias>: B<clone>

Returns a copy of the class object with its data loaded (if any). Note this is not a copy of any particular data but the whole blessed hash. Alternatively, use L<pass|Statistics::Data/pass> to get all the data added to a new object, or use L<access|Statistics::Data/access> to load/add particular arrays of data into another object. Nothing modified in this new object affects the original.

=cut

sub clone {
    my $self = shift;
    require Clone;
    return Clone::clone($self);
}
*copy = \*clone;

=head2 Setting data

Methods to cache and uncache data into the data-object.

=head3 load

 $dat->load(@data);             # CASE 1 - can be updated/retrieved anonymously, or as index => i (load order)
 $dat->load(\@data);            # CASE 2 - same, as aref
 $dat->load(data => \@data);    # CASE 3 - updated/retrieved as label => 'data' (arbitrary name, not just 'data'); or by index (order)
 $dat->load({ data => \@data }) # CASE 4 - same as CASE 4, as hashref
 $dat->load(blues => \@blue_data, reds => \@red_data);      # CASE 5 - same as CASE 3 but with multiple named loads
 $dat->load({ blues => \@blue_data, reds => \@red_data });  # CASE 6 - same as CASE 5 bu as hashref
 $dat->load(\@blue_data, \@red_data);  # CASE 7 - same as CASE 2 but with multiple aref loads

 # Not supported:
 #$dat->load(data => @data); # not OK - use CASE 3 instead
 #$dat->load([\@blue_data, \@red_data]); # not OK - use CASE 7 instead
 #$dat->load([ [blues => \@blue_data], [reds => \@red_data] ]); # not OK - use CASE 5 or CASE 6 instead
 #$dat->load(blues => \@blue_data, reds => [\@red_data1, \@red_data2]); # not OK - too mixed to make sense

I<Alias>: B<load_data>

Cache a list of data as an array-reference. Each call removes previous loads, as does sending nothing. If data need to be cached without unloading previous loads, try L<add|Statistics::Data/add>. Arguments with the following structures are acceptable as data, and will be L<access|Statistics::Data/access>ible by either index or label as expected:

=over 4

=item load ARRAY

Load an anonymous array that has no named values. For example:

 $dat->load(1, 4, 7);
 $dat->load(@ari);

This is loaded as a single sequence, with an undefined label, and indexed as 0. Note that trying to load a labelled dataset with an unreferenced array is wrong for it will be treated like this case - the label will be "folded" into the sequence itself.

=item load AREF

Load a reference to a single anonymous array that has no named values, e.g.: 

 $dat->load([1, 4, 7]);
 $dat->load(\@ari);

This is loaded as a single sequence, with an undefined label, and indexed as 0.

=item load ARRAY of AREF(s)

Same as above, but note that more than one unlabelled array-reference can also be loaded at once, e.g.:

 $dat->load([1, 4, 7], [2, 5, 9]);
 $dat->load(\@ari1, \@ari2);

Each sequence can be accessed, using L<access|Statistics::Data/access>, by specifying B<index> => index, the latter value representing the order in which these arrays were loaded.

=item load HASH of AREF(s)

Load one or more labelled references to arrays, e.g.:

 $dat->load('dist1' => [1, 4, 7]);
 $dat->load('dist1' => [1, 4, 7], 'dist2' => [2, 5, 9]);

This loads the sequence(s) with a label attribute, so that when calling L<access|Statistics::Data/access>, they can be retrieved by name, e.g., passing B<label> => 'dist1'. The load method involves a check that there is an even number of arguments, and that, if this really is a hash, all the keys are defined and not empty, and all the values are in fact array-references.

=item load HASHREF of AREF(s)

As above, but where the hash is referenced, e.g.:

 $dat->load({'dist1' => [1, 4, 7], 'dist2' => [2, 5, 9]});

=back

This means that using the following forms will produce unexpected results, if they do not actually croak, and so should not be used:

 $dat->load(data => @data); # no croak but wrong - puts "data" in @data - use \@data
 $dat->load([\@blue_data, \@red_data]); # use unreferenced ARRAY of AREFs instead
 $dat->load([ [blues => \@blue_data], [reds => \@red_data] ]); # treated as single AREF; use HASH of AREFs instead
 $dat->load(blues => \@blue_data, reds => [\@red_data1, \@red_data2]); # mixed structures not supported

=cut

sub load
{ # load single aref: cannot load more than one sequence; keeps a direct reference to the data: any edits creep back.
    my ( $self, @args ) = @_;
    $self->unload();
    $self->add(@args);
    return 1;
}
*load_data = \&load;

=head3 add

I<Alias>: B<add_data>, B<append_data>, B<update>

Same usage as above for L<load|Statistics::Data/load>. Just push any value(s) or so along, or loads an entirely labelled sequence, without clobbering what's already in there (as L<load|Statistics::Data/load> would). If data have not been loaded with a label, then appending data to them happens according to the order of array-refs set here, see L<EXAMPLES|EXAMPLES> could even skip adding something to one previously loaded sequence by, e.g., going $dat->add([], \new_data) - adding nothing to the first loaded sequence, and initialising a second array, if none already, or appending these data to it.

=cut

sub add {
    my ( $self, @args ) = @_;
    my $tmp = _init_data( $self, @args )
      ; # hashref of data sequence(s) keyed by index to use for loading or adding
    while ( my ( $i, $val ) = each %{$tmp} ) {
        if ( defined $val->{'lab'} ) {    # newly labelled data
            $self->{_DATA}->[$i] =
              { seq => $val->{'seq'}, lab => $val->{'lab'} };
        }
        else
        { # data to be added to existing cache, or an anonymous load, indexed only
            push @{ $self->{_DATA}->[$i]->{'seq'} }, @{ $val->{'seq'} };
        }
    }
    return;
}
*add_data    = \&add;
*append_data = \&add;
*update      = \&add;

=head3 unload

 $dat->unload(); # deletes all cached data, named or not
 $dat->unload(index => integer); # deletes the aref named 'data' whatever
 $dat->unload(label => 'a name'); # deletes the aref named 'data' whatever

Empty, clear, clobber what's in there. Croaks if given index or label does not refer to any loaded data. This should be used whenever any already loaded or added data are no longer required ahead of another L<add|Statistics::Data/add>, including via L<copy|Statistics::Data/copy> or L<share|Statistics::Data/share>.

=cut

sub unload {
    my ( $self, @args ) = @_;
    if ( !$args[0] ) {
        $self->{_DATA} = [];
    }
    else {
        splice @{ $self->{_DATA} }, _index_by_args( $self, @args ), 1;
    }
    return;
}

=head3 share

 $dat_new->share($dat_old);

Adds all the data from one Statistics::Data object to another. Changes in the new copies do not affect the originals.

=cut

sub share {
    my ( $self, $other ) = @_;
    _add_from_object_aref( $self, $other->{_DATA} );
    return 1;
}

=head3 load_from_file

 $dat->load_from_file(path => 'medata.csv', format => 'xml|csv');
 $dat->load_from_file(path => 'mysequences.csv', serializer => 'XML::Simple', compress => 1, secret => '123'); # serialization options

Loads data from a file, assuming there are data in the given path that have been saved in the format used in L<save_to_file|Statistics::Data/save_to_file>. Basically a wrapper to L<access|Statistics::Data/access> method in L<Data::Serializer|Data::Serializer/retrieve>; cf. for options; and then to L<load|Statistics::Data/load>. If the data retreived are actually to be added to any data already cached via a previous load or add, define the optional parameter B<keep> => 1.

=cut

sub load_from_file {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    croak 'There is no path for loading data'
      if nocontent( $args->{'path'} )
      || !-e $args->{'path'};    # is filepath valid?
    require Data::Serializer;
    my $serializer = Data::Serializer->new( %{$args} );
    my $href       = $serializer->retrieve( $args->{'path'} );
    $self->unload() unless $args->{'keep'};
    _add_from_object_aref( $self, $href->{DATA} );
    return 1;
}

=head2 Getting data

To retrieve what has been previously loaded, simply call L<access|Statistics::Data/access>, specifying the "label" or "index" that was used to load/add the data - i.e., when loaded as a hashref or an arrayref, respectively; specifying the list by B<label> (as loaded hash-wise) or B<index> (as loaded array-wise).

For retrieving more than one previously loaded dataset, use one of the "get" methods, choosing between getting back a hash- or an array-ref, or to get back a single list, as by L<access|Statistics::Data/access>, after all. These "get" methods only support retrieving data loaded as hashrefs for now; use L<access|Statistics::Data/access> to get back index-specific loads. They might be folded within L<access|Statistics::Data/access> later on.

=head3 access

 $aref = $dat->access(); #returns the first and/or only sequence loaded, if any
 $aref = $dat->access(index => integer); #returns the ith sequence loaded
 $aref = $dat->access(label => 'a_name'); # returns a particular named cache of data

I<Alias>: B<get_data>

Returns one referenced array being previously loaded/added to data by the given B<index> (in a flat-list load) or B<label> (in a hash-wise load). Same as calling L<get_aref_by_lab|Statistics::Data/get_aref_by_lab>.

=cut

sub access {
    my ( $self, @args ) = @_;
    return $self->{_DATA}->[ _index_by_args( $self, @args ) ]->{'seq'};
}
*read = \&access;    # legacy only

=head3 get_hoa, get_hoa_by_lab

  $href = $data->get_hoa(lab => [qw/fez boa/]); # retrieve 1 or more named data
  $href = $data->get_hoa(); # retrieve all named data 

Returns a hashref of arefs, where the keys are the names of the data, as previously given in a load, and the values are arefs of the list of data that has been loaded for that name. 

The optional argument B<lab> should be a reference to a list of one or more data that have been given as keys in a hash-wise L<load|Statistics::Data/load>. Any elements in this list that have not been used as names in a load are ignored. If none of the names has been used, an empty list is returned. If there is no B<lab> argument, then all of the loaded data are returned as a hashref of arefs; if there were no named data, this a reference to an empty hash.

This is useful in a module like L<Statistics::ANOVA::JT> that needs to continuously cross-refer to multiple variables to make a single calculation while also being able to distinguish them by some meaningful key other than simply an index number.

For working with numerical data in particular, see the following two methods.

=cut

sub get_hoa_by_lab {
    my ( $self, %args ) = @_;
    $args{'lab'} = [ $args{'lab'} ]
      if hascontent( $args{'lab'} )
      and not ref $args{'lab'};
    my %data = ();
    if ( !ref $args{'lab'} ) {    # get all data
        for my $i ( 0 .. $self->ndata() - 1 ) {
            if ( hascontent( $self->{_DATA}->[$i]->{'lab'} ) ) {
                $data{ $self->{_DATA}->[$i]->{'lab'} } =
                  $self->{_DATA}->[$i]->{'seq'};
            }
        }
    }
    else {                        # get named data
        for my $i ( 0 .. scalar @{ $args{'lab'} } - 1 )
        {                         # assume ref eq 'ARRAY'
            my $j = _seq_index_by_label( $self, $args{'lab'}->[$i] )
              ;                   # is name loaded with data?
            if ( defined $j ) {
                $data{ $args{'lab'}->[$i] } = $self->{_DATA}->[$j]->{'seq'};
            }                     # else ignore the given name
        }
    }
    return wantarray ? %data : \%data;
}
*get_hoa = \&get_hoa_by_lab;

=head3 get_hoa_by_lab_numonly_indep

 $hoa = $dat->get_hoa_by_lab_numonly_indep(); # same as get_hoa but each list culled of NaNs

Returns the variables given in the argument B<lab> (an aref of strings) culled on any empty or non-numeric values. This is done by treating each variable indpendently, with culls on one "list" not creating a cull on any other. This is the type of data useful for an independent ANOVA.

=cut

sub get_hoa_by_lab_numonly_indep {
    my ( $self, %args ) = @_;
    return _cull_hoa_indep( scalar $self->get_hoa_by_lab(%args),
        \$self->{'purged'} );
}

=head3 get_hoa_by_lab_numonly_across

 $hoa = $dat->get_hoa_by_lab_numonly_across(); # same as get_hoa but each list culled of NaNs at same i across lists

Returns hashref of previously loaded variable data (as arefs) culled of an empty or non-numerical values whereby even a valid value in one list is culled if it is at an index that is invalid in another list. This is the type of data useful for a dependent ANOVA.

=cut

sub get_hoa_by_lab_numonly_across {
    my ( $self, %args ) = @_;
    return _cull_hoa_across( scalar $self->get_hoa_by_lab(%args),
        \$self->{'purged'} );
}

=head3 get_aoa, get_aoa_by_lab

 $aref_of_arefs = $dat->get_aoa_by_lab();

Returns a reference to an array where each value is itself an array of data, as separately loaded under a different name or anonymously, in the order that they were loaded. If no B<lab> value is defined, all the loaded data are returned as a list of arefs.

=cut

sub get_aoa_by_lab {
    my ( $self, %args ) = @_;
    $args{'lab'} = [ $args{'lab'} ]
      if hascontent( $args{'lab'} )
      and not ref $args{'lab'};
    my @data = ();
    if ( !ref $args{'lab'} ) {    # get all data
        for my $i ( 0 .. $self->ndata() - 1 ) {
            $data[$i] = $self->{_DATA}->[$i]->{'seq'};
        }
    }
    else {                        # get named data
        for my $i ( 0 .. scalar @{ $args{'lab'} } - 1 )
        {                         # assume ref eq 'ARRAY'
            my $j = _seq_index_by_label( $self, $args{'lab'}->[$i] )
              ;                   # is name loaded with data?
            if ( defined $j ) {
                $data[$i] = $self->{_DATA}->[$j]->{'seq'};
            }                     # else ignore the given name
        }
    }
    return wantarray ? @data : \@data;  # unreferenced for chance legacy for now
}
*get_aoa = \&get_aoa_by_lab;

=head3 get_aref_by_lab

 $aref = $dat->get_aref_by_lab();

Returns a reference to a single, previously loaded hashref of arrayed of data, as specified in the named argument B<lab>. The array is empty if no data have been loaded, or if there is none with the given B<lab>. If B<lab> is not defined, the the last-loaded data, if any, is returned (as aref).

=cut

sub get_aref_by_lab {
    my ( $self, %args ) = @_;
    my $aref = [];
    if ( nocontent( $args{'lab'} ) && ref $self->{_DATA}->[-1]->{'seq'} ) {
        $aref = $self->{_DATA}->[-1]->{'seq'};
    }
    else {
        my $j = _seq_index_by_label( $self, $args{'lab'} )
          ;    # is name loaded with data?
        if ( defined $j ) {
            $aref = $self->{_DATA}->[$j]->{'seq'};
        }
    }
    return $aref;
}

=head3 ndata

 $n = $dat->ndata();

Returns the number of loaded variables.

=cut

sub ndata {
    my $self = shift;
    return scalar( @{ $self->{'_DATA'} } );
}

=head3 labels

 $aref = $dat->labels();

Returns a reference to an array of all the datanames (labels), if any.

=cut

sub labels {
    my $self  = shift;
    my @names = ();
    for ( 0 .. scalar @{ $self->{'_DATA'} } - 1 ) {
        push @names, $self->{'_DATA'}->[$_]->{'lab'}
          if hascontent( $self->{'_DATA'}->[$_]->{'lab'} );
    }
    return \@names;
}

=head2 Checking data

=head3 all_full

 $bool = $dat->all_full(\@data); # test data are valid before loading them
 $bool = $dat->all_full(label => 'mydata'); # checking after loading/adding the data (or key in 'index')

Checks not only if the data sequence, as named or indexed, exists, but if it is non-empty: has no empty elements, with any elements that might exist in there being checked with L<hascontent|String::Util/hascontent>.

=cut

sub all_full {
    my ( $self, @args ) = @_;
    my $data = ref $args[0] ? shift @args : $self->access(@args);
    my ( $bool, @vals ) = ();
    foreach ( @{$data} ) {
        $bool = nocontent($_) ? 0 : 1;
        if (wantarray) {
            push @vals, $_ if $bool;
        }
        else {
            last if $bool == 0;
        }
    }
    return wantarray ? ( \@vals, $bool ) : $bool;
}

=head3 all_numeric

 $bool = $dat->all_numeric(); # test data first-loaded, if any
 $bool = $dat->all_numeric(\@data); # test these data are valid before loading them
 $bool = $dat->all_numeric(label => 'mydata'); # check specific data after loading/adding them by a 'label' or by their 'index' order
 ($aref, $bool) = $dat->all_numeric([3, '', 4.7, undef, 'b']); # returns ([3, 4.7], 0); - same for any loaded data

Given an aref of data, or reference to data previously loaded (see L<access|Statistics::Data/access>), tests numeracy of each element, and return, if called in scalar context, a boolean scalar indicating if all data in this aref are defined and not empty (using C<nocontent> in L<String::Util::String::Util/nocontent>), and, if they have content, if these are all numerical, using C<looks_like_number> in L<Scalar::Util|Scalar::Util/looks_like_number>. Alternatively, if called in array context, returns the data (as an aref) less any values that failed this test, followed by the boolean.

=cut

sub all_numeric {
    my ( $self, @args ) = @_;
    my ( $data, $bool ) = ();
    if ( ref $args[0] eq 'ARRAY' ) {
        $data = shift @args;
    }
    else {
        $data = $self->{_DATA}->[ _index_by_args( $self, @args ) ]->{'seq'};
    }
    my (@vals) = ();
    foreach ( @{$data} ) {
        $bool = ( nocontent($_) or not looks_like_number($_) ) ? 0 : 1;
        if (wantarray) {
            push @vals, $_ if $bool;
        }
        else {
            last if $bool == 0;
        }
        $data = \@vals;
    }
    return wantarray ? ( $data, $bool ) : $bool;
}
*all_numerical = \&all_numeric;

=head3 all_proportions

 $bool = $dat->all_proportions(\@data); # test data are valid before loading them
 $bool = $dat->all_proportions(label => 'mydata'); # checking after loading/adding the data  (or key in 'index')

Ensure data are all proportions. Sometimes, the data a module needs are all proportions, ranging from 0 to 1 inclusive. A dataset might have to be cleaned 

=cut

sub all_proportions {
    my ( $self, @args ) = @_;
    my $data = ref $args[0] ? shift @args : $self->access(@args);
    my ( $bool, @vals ) = ();
    foreach ( @{$data} ) {
        if ( nocontent($_) ) {
            $bool = 0;
        }
        elsif ( looks_like_number($_) ) {
            $bool = ( $_ < 0 || $_ > 1 ) ? 0 : 1;
        }
        if (wantarray) {
            push @vals, $_ if $bool;
        }
        else {
            last if $bool == 0;
        }
    }
    return wantarray ? ( \@vals, $bool ) : $bool;
}

=head3 all_counts

 $bool = $dat->all_counts(\@data); # test data are valid before loading them
 $bool = $dat->all_counts(label => 'mydata'); # checking after loading/adding the data  (or key in 'index')
 ($aref, $bool) = $dat->all_counts(\@data);

Returns true if all values in given data are real positive integers or zero, as well as satisfying "hascontent" and "looks_like_number" methods; false otherwise. Called in array context, returns aref of data culled of any values that are false on this basis, and then the boolean. For example, [2.2, 3, 4] and [-1, 3, 4] both fail, but [1, 3, 4] is true. Integer test is simply if $v == int($v).

=cut

sub all_counts {
    my ( $self, @args ) = @_;
    my $data = ref $args[0] ? shift @args : $self->access(@args);
    my ( $bool, @vals ) = ();
    foreach ( @{$data} ) {
        if ( nocontent($_) ) {
            $bool = 0;
        }
        elsif ( looks_like_number($_) ) {
            $bool = $_ >= 0 && $_ == int($_) ? 1 : 0;
        }
        else {
            $bool = 0;
        }
        if (wantarray) {
            push @vals, $_ if $bool;
        }
        else {
            last if $bool == 0;
        }
    }
    return wantarray ? ( \@vals, $bool ) : $bool;
}

=head3 all_pos

 $bool = $dat->all_pos(\@data); # test data are valid before loading them
 $bool = $dat->all_pos(label => 'mydata'); # checking after loading/adding the data  (or key in 'index')
 ($aref, $bool) = $dat->all_pos(\@data);

Returns true if all values in given data are greater than zero, as well as "hascontent" and "looks_like_number"; false otherwise. Called in array context, returns aref of data culled of any values that are false on this basis, and then the boolean. 

=cut

sub all_pos {
    my ( $self, @args ) = @_;
    my $data = ref $args[0] ? shift @args : $self->access(@args);
    my ( $bool, @vals ) = ();
    foreach ( @{$data} ) {
        if ( nocontent($_) ) {
            $bool = 0;
        }
        elsif ( looks_like_number($_) ) {
            $bool = $_ > 0 ? 1 : 0;
        }
        if (wantarray) {
            push @vals, $_ if $bool;
        }
        else {
            last if $bool == 0;
        }
    }
    return wantarray ? ( \@vals, $bool ) : $bool;
}

=head3 equal_n

If the given data or aref of variable names all have the same number of elements, then that number is returned; otherwise 0.

=cut

sub equal_n {
    my ( $self, %args ) = ( shift, @_ );
    my $data =
      $args{'data'} ? delete $args{'data'} : $self->get_hoa_by_lab(%args);
    my @data = values %{$data} if ref $data eq 'HASH';
    my $n = scalar @{ $data[0] };
    for ( 1 .. scalar @data - 1 ) {
        my $tmp_count = scalar @{ $data[$_] };
        if ( $tmp_count != $n ) {
            $n = 0;
            last;
        }
        else {
            $n = $tmp_count;
        }
    }
    return $n;
}

=head3 idx_anumeric

Given an aref, returns a reference to an array of indices for a particular dataset (list) where the values are either undefined, empty or non-numerical.

=cut

sub idx_anumeric
{    # List keyed by sample-names of their indices where invalid values lie
    my ( $self, $dat ) = @_;
    my %purge = ();
    for my $i ( 0 .. scalar @{$dat} - 1 ) {
        $purge{$i}++ if !looks_like_number( $dat->[$i] );
    }
    return \%purge;
}

=head2 Dumping data

=head3 dump_vals

 $seq->dump_vals(delim => ", "); # assumes the first (only?) loaded sequence should be dumped
 $seq->dump_vals(index => I<int>, delim => ", "); # dump the i'th loaded sequence
 $seq->dump_vals(label => 'mysequence', delim => ", "); # dump the sequence loaded/added with the given "label"

Prints to STDOUT a space-separated line (ending with "\n") of a loaded/added data's elements. Optionally, give a value for B<delim> to specify how the elements in each sequence should be separated; default is a single space.

=cut

sub dump_vals {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    my $delim = $args->{'delim'} || q{ };
    print {*STDOUT} join( $delim, @{ $self->access($args) } ), "\n"
      or croak 'Could not print line to STDOUT';
    return 1;
}

=head3 dump_list

Dumps a list (using L<Text::SimpleTable|Text::SimpleTable>) of the data currently loaded, without showing their actual elements. List is firstly by index, then by label (if any), then gives the number of elements in the associated sequence.

=cut

sub dump_list {
    my ( $self, $i, $lim, $lab, $N, $len_lab, $len_n, $tbl, @rows, @maxlens ) =
      (shift);
    $lim = $self->ndata();
    @maxlens = ( ( $lim > 5 ? $lim : 5 ), 5, 1 );
    for my $i ( 0 .. $lim - 1 ) {
        $lab =
          defined $self->{_DATA}->[$i]->{lab}
          ? $self->{_DATA}->[$i]->{lab}
          : q{-};
        $N       = scalar @{ $self->{_DATA}->[$i]->{seq} };
        $len_lab = length $lab;
        $len_n   = length $N;
        $maxlens[1] = $len_lab if $len_lab > $maxlens[1];
        $maxlens[2] = $len_n   if $len_n > $maxlens[2];
        $rows[$i] = [ $i, $lab, $N ];
    }
    require Text::SimpleTable;
    $tbl = Text::SimpleTable->new(
        [ $maxlens[0], 'index' ],
        [ $maxlens[1], 'label' ],
        [ $maxlens[2], 'N' ]
    );
    $tbl->row( @{$_} ) foreach @rows;
    print {*STDOUT} $tbl->draw or croak 'Could not print list of loaded data';
    return 1;
}

=head3 save_to_file

  $dat->save_to_file(path => 'mysequences.csv');
  $dat->save_to_file(path => 'mysequences.csv', serializer => 'XML::Simple', compress => 1, secret => '123'); # serialization options

Saves the data presently loaded in the Statistics::Data object to a file, with the given B<path>. This can be retrieved, with all the data added to the Statistics::Data object, via L<load_from_file|Statistics::Data/load_from_file>. Basically a wrapper to C<store> method in L<Data::Serializer|Data::Serializer/store>; cf. for options.

=cut

sub save_to_file {
    my ( $self, @args ) = @_;
    my $args = ref $args[0] ? $args[0] : {@args};
    croak 'There is no path for saving data' if nocontent( $args->{'path'} );
    require Data::Serializer;
    my $serializer = Data::Serializer->new( %{$args} );
    $serializer->store( { DATA => $self->{_DATA} }, $args->{'path'} );
    return 1;
}
*save = \&save_to_file;

# PRIVATE METHODS:

sub _cull_hoa_indep {
    my $hoa         = shift;
    my $purged_n    = shift;
    my $purged      = 0;
    my %purged_data = ();
    foreach my $name ( keys %{$hoa} ) {
        my @clean = ();
        for my $i ( 0 .. scalar( @{ $hoa->{$name} } ) - 1 ) {
            if ( nocontent( $hoa->{$name}->[$i] )
                or not looks_like_number( $hoa->{$name}->[$i] ) )
            {
                $purged++;
            }
            else {
                push @clean, $hoa->{$name}->[$i];
            }
        }
        croak
"Empty data for ANOVA following purge of invalid value(s) in list < $name >"
          if !scalar @clean;
        $purged_data{$name} = [@clean];
    }
    ${$purged_n} = $purged if ref $purged_n;
    return \%purged_data;
}

sub _cull_hoa_across {
    my $hoa      = shift;
    my $purged_n = shift;
    my ( $purged, %inv_1, %invalid_ids, %clean, %purged_data ) = ();

    foreach my $name ( keys %{$hoa} ) {
        for my $i ( 0 .. scalar( @{ $hoa->{$name} } ) - 1 ) {
            $inv_1{$name}->{$i} = 1
              if !looks_like_number( $hoa->{$name}->[$i] );
        }
    }

# List of all indices in all lists with invalid values; and copy of each group of data:
    foreach my $name ( keys %{$hoa} ) {
        $clean{$name} = $hoa->{$name};  #$self->get_aref_by_lab( lab => $name );
        while ( my ( $key, $val ) = each %{ $inv_1{$name} } )
        {    #each %{ $self->{'_purge'}->{$name} } ) {
            $invalid_ids{$key} += $val;
        }
    }
    $purged = scalar( keys(%invalid_ids) ) || 0;

    # Purge by index (from highest to lowest):
    my @invalid_ids = reverse( sort { $a <=> $b } keys(%invalid_ids) );
    foreach my $cull (@invalid_ids) {
        foreach my $name ( keys %clean ) {
            splice( @{ $clean{$name} }, $cull, 1 );
        }
    }

    foreach my $c ( keys %clean ) {
        $purged_data{$c} = $clean{$c};
    }
    ${$purged_n} = $purged if ref $purged_n;
    return \%purged_data;
}

sub _init_data {
    my ( $self, @args ) = @_;
    my $tmp = {};
    if ( _isa_hashref_of_arefs( $args[0] ) ) {    # cases 4 & 6
        $tmp = _init_labelled_data( $self, $args[0] );
    }
    elsif ( _isa_hash_of_arefs(@args) ) {         # cases 3 & 5
        $tmp = _init_labelled_data( $self, {@args} );
    }
    elsif ( _isa_array_of_arefs(@args) ) {        # case 2 & 7
        $tmp = _init_unlabelled_data(@args);
    }
    elsif ( ref $args[0] ) {
        croak 'Don\'t know how to load/add data';
    }
    else {    # assume @args is just a list of nude strings - case 1
        $tmp->{0} = { seq => [@args], lab => undef };
    }
    return $tmp;
}

sub _isa_hashref_of_arefs {
    my $arg = shift;
    if ( not ref $arg or ref $arg ne 'HASH' ) {
        return 0;
    }
    else {
        return _isa_hash_of_arefs( %{$arg} );
    }
}

sub _isa_hash_of_arefs {

    # determines that:
    # - scalar @args passes Number::Misc is_even, then that:
    # - every odd indexed value 'hascontent' via String::Util
    # - every even indexed value is aref
    my @args = @_;
    my $bool = 0;
    if ( is_even( scalar @args ) )
    {    # Number::Misc method - not odd number in assignment
        my %args = @args;    # so assume is hash
      HASHCHECK:
        while ( my ( $lab, $val ) = each %args ) {
            if ( hascontent($lab) && ref $val eq 'ARRAY' ) {
                $bool = 1;
            }
            else {
                $bool = 0;
            }
            last HASHCHECK if $bool == 0;
        }
    }
    else {
        $bool = 0;
    }
    return $bool;
}

sub _isa_array_of_arefs {
    my @args = @_;
    if ( all { ref($_) eq 'ARRAY' } @args ) {
        return 1;
    }
    else {
        return 0;
    }
}

sub _init_labelled_data {
    my ( $self, $href ) = @_;
    my ( $i,    %tmp )  = ( scalar @{ $self->{_DATA} } );
    while ( my ( $lab, $seq ) = each %{$href} ) {
        my $j = _seq_index_by_label( $self, $lab );
        if ( defined $j )
        { # there is already a label for these data, so don't need to define it for this init
            $tmp{$j} = { seq => [ @{$seq} ], lab => undef };
        }
        else {    # no aref labelled $lab yet: define for seq and label
            $tmp{ $i++ } = { seq => [ @{$seq} ], lab => $lab };
        }
    }
    return \%tmp;
}

sub _init_unlabelled_data {
    my @args = @_;
    my %tmp  = ();
    for my $i ( 0 .. scalar @args - 1 ) {
        $tmp{$i} = { seq => [ @{ $args[$i] } ], lab => undef };
    }
    return \%tmp;
}

sub _index_by_args {
    my ( $self, @args ) = @_;
    my $i;
    if ( !$args[0] ) {
        $i = 0;
    }
    else {
        my $args = ref $args[0] ? $args[0] : {@args};
        if ( hascontent( $args->{'index'} ) ) {
            $i = $args->{'index'};
        }
        elsif ( hascontent( $args->{'label'} ) ) {
            $i = _seq_index_by_label( $self, $args->{'label'} );
        }
        else {
            $i = 0;
        }
    }    #    print "index by args = $i\n";
    return ( defined $i and ref $self->{_DATA}->[$i]->{'seq'} )
      ? $i
      : croak __PACKAGE__, ' Data for accessing need to be loaded';
}

sub _seq_index_by_label {
    my ( $self, $label ) = @_;
    my ( $i, $k ) = ( 0, 0 );
    for ( ; $i < scalar( @{ $self->{_DATA} } ) ; $i++ ) {
        do { $k++; last; }
          if $self->{_DATA}->[$i]->{lab}
          and $self->{_DATA}->[$i]->{lab} eq $label;
    }
    return $k ? $i : undef;
}

sub _add_from_object_aref {
    my ( $self, $aref ) = @_;
    foreach my $dat ( @{$aref} ) {
        if ( hascontent( $dat->{'lab'} ) ) {
            $self->add( $dat->{'lab'} => $dat->{'seq'} );
        }
        else {
            $self->add( $dat->{'seq'} );
        }
    }
    return 1;
}

=head1 EXAMPLES

B<1. Multivariate data (a tale of horny frogs)>

In a study of how doing mental arithmetic affects arousal in self and others (i.e., how mind, body and world interact), three male frogs were maths-trained and then, as they did their calculations, were measured for pupillary dilation and perceived attractiveness. After four runs, average measures per frog can be loaded: 

 $frogs->load(Names => [qw/Freddo Kermit Larry/], Pupil => [59.2, 77.7, 56.1], Attract => [3.11, 8.79, 6.99]);

But one more frog still had to graudate from training, and data are now ready for loading:

 $frogs->add(Names => ['Sleepy'], Pupil => [83.4], Attract => [5.30]);
 $frogs->dump_data(label => 'Pupil'); # prints "59.2 77.7 56.1 83.4" : all 4 frogs' pupil data for analysis by some module

Say we're finished testing for now, so:

 $frogs->save_to_file(path => 'frogs.csv');
 $frogs->unload();

But another frog has been trained, measures taken:

 $frogs->load_from_file(path => 'frogs.csv');
 $frogs->add(Pupil => [93], Attract => [6.47], Names => ['Jack']); # add yet another frog's data
 $frogs->dump_data(label => 'Pupil'); # prints "59.2 77.7 56.1 83.4 93": all 5 frogs' pupil data

Now we run another experiment, taking measures of heart-rate, and can add them to the current load of data for analysis:

 $frogs->add(Heartrate => [.70, .50, .44, .67, .66]); # add entire new sequence for all frogs
 print "heartrate data are bung" if ! $frogs->all_proportions(label => 'Heartrate'); # validity check (could do before add)
 $frogs->dump_list(); # see all four data-sequences now loaded, each with 5 observations (1 per frog), i.e.:
 .-------+-----------+----.
 | index | label     | N  |
 +-------+-----------+----+
 | 0     | Names     | 5  |
 | 1     | Attract   | 5  |
 | 2     | Pupil     | 5  |
 | 3     | Heartrate | 5  |
 '-------+-----------+----'

B<2. Using as a base module>

As L<Statistics::Sequences|Statistics::Sequences>, and so its sub-modules, use this module as their base, it doesn't have to do much data-managing itself:

 use Statistics::Sequences;
 my $seq = Statistics::Sequences->new();
 $seq->load(qw/f b f b b/); # using Statistics::Data method
 say $seq->p_value(stat => 'runs', exact => 1); # using Statistics::Sequences::Runs method

Or if these data were loaded directly within Statistics::Data, the data can be shared around modules that use it as a base:

 use Statistics::Data;
 use Statistics::Sequences::Runs;
 my $dat = Statistics::Data->new();
 my $runs = Statistics::Sequences::Runs->new();
 $dat->load(qw/f b f b b/);
 $runs->pass($dat);
 say $runs->p_value(exact => 1);

=head1 DIAGNOSTICS

=over 4

=item Don't know how to load/add data

Croaked when attempting to load or add data with an unsupported data structure where the first argument is a reference. See the examples under L<load|Statistics::Data/load> for valid (and invalid) ways of sending data to them.

=item Data for accessing need to be loaded

Croaked when calling L<access|Statistics::Data/access>, or any methods that use it internally -- viz., L<dump_vals|Statistics::Data/dump_vals> and the validity checks L<all_numeric|Statistics::Data/all_numeric> -- when it is called with a label for data that have not been loaded, or did not load successfully.

=item Data for unloading need to be loaded

Croaked when calling L<unload|Statistics::Data/unload> with an index or a label attribute and the data these refer to have not been loaded, or did not load successfully.

=item There is no path for saving (or loading) data

Croaked when calling L<save_to_file|Statistics::Data/save_to_file> or L<load_from_file|Statistics::Data/load_from_file> without a value for the required B<path> argument, or if it does not exist when it's touched for a load.

=back

=head1 DEPENDENCIES

L<List::AllUtils|List::AllUtils> - used for its C<all> method when testing loads

L<Number::Misc|Number::Misc> - used for its C<is_even> method when testing loads

L<String::Util|String::Util> - used for its C<hascontent> and C<nocontent> methods

L<Data::Serializer|Data::Serializer> - required for L<save_to_file|Statistics::Data/save_to_file> and L<load_from_file|Statistics::Data/load_from_file>

L<Scalar::Util|Scalar::Util> - required for L<all_numeric|Statistics::Data/all_numeric>

L<Text::SimpleTable|Text::SimpleTable> - required for L<dump_list|Statistics::Data/dump_list>

=head1 BUGS AND LIMITATIONS

Some methods rely on accessing previously loaded data but should permit performing their operations on data submitted directly to them, just like, e.g., $dat->all_numeric(\@data) is ok. This is handled for now internally, but should be handled in the same way by modules using this one as its base - for at the moment they have to check for an aref to their data-manipulating methods ahead of accessing any loaded data by this module.

Please report any bugs or feature requests to C<bug-statistics-data-0.01 at rt.cpan.org>, or through the web interface at L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Data-0.01>. This will notify the author, and then you'll automatically be notified of progress on your bug as any changes are made.

=head1 SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Statistics::Data

You can also look for information at:

=over 4

=item * RT: CPAN's request tracker

L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Data-0.09>

=item * AnnoCPAN: Annotated CPAN documentation

L<http://annocpan.org/dist/Statistics-Data-0.09>

=item * CPAN Ratings

L<http://cpanratings.perl.org/d/Statistics-Data-0.09>

=item * Search CPAN

L<http://search.cpan.org/dist/Statistics-Data-0.09/>

=back

=head1 AUTHOR

Roderick Garton, C<< <rgarton at cpan.org> >>

=head1 LICENSE AND COPYRIGHT

Copyright 2009-2015 Roderick Garton

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published
by the Free Software Foundation; or the Artistic License. See L<perl.org|http://dev.perl.org/licenses/> for more information.

=cut

1;    # End of Statistics::Data
