An assessment of the complexity of 3' UTRs relative to that of protein-coding sequences: models selected using two procedures
The dataset comes from a study which assessed the complexity of 3′ UTRs (three prime untranslated regions) relative to that of protein-coding sequences, by comparing the extent to which segmental substructures can be detected within these two genomic fractions based on sequence composition and conservation.
For the dataset, two different procedures were applied to select the number of classes for each alignment; investigating Deviance Information Criterion V (DICV) values (Procedure 1) and investigating the stability of the classes (Procedure 2). The numbers of classes selected for each sequence by each procedure are summarised.
The data indicates that twelve to fourteen segment classes with distinct character frequencies can be distinguished in each of the three coding sequence alignments, using Procedure 1 or Procedure 2.