RegExPlus

Natural Sort

The Pattern class provides a static function, naturalCompareTo that can be used to sort numeric based data such as versions and dates.

The Pattern class provides a static function, naturalCompareTo that can be used to sort numeric based data such as versions and dates.

The method works by comparing embedded numbers numerically, instead of lexigraphically. For example, 1.2.9.1 is less than 1.2.10.5, which is what one would expect if the two strings are versions of a program.

Basics

Ignores leading zeros

When comparing, leading zeros are ignored, unless the inputted sequences are otherwise equivalent.

In this case, if the two inputs are identical, then 0 is returned. Otherwise, the left-most number where the number of leading zeros differs is used to determine the ordering - the one with more leading zeros is first.

For example, the below list is sorted in increasing order:

  • 2009-1-2
  • 2009-01-05
  • 2009-01-5
  • 2009-1-05
  • 2009-1-5

Usage

Template code

The below code is a template for creating a Comparator to use to sort numeric based data (such as a date, a time, a program version).

To make your own comparator, change regex to a regular expression that matches your input. Then, set replacement to a replacement string with each part separated by spaces, and with the most significant part first.

An example on how use this template to sort dates can be found after the template.

   Comparator<String> comparator = new Comparator<String>() {

      /**
       * Regular expression that describes the input format
       */

      final String regex = "(?<part1>regex)[separators]"
            + "(?<part2>regex)";

     
      /**
       * Replacement string used to reformat input (if necessary)
       * to convert to a consistent format, with most significant part
       * first.
       *
       * <p><b>Note</b>: use spaces to separate parts</p>
       */

      final String replacement = "$<part1> $<part2>";

      /**
       * Flags used when compiling the <code>Pattern</code>
       */

      final int flags = 0;
     
      /**
       * Pattern that describes the input format
       */

      final Pattern pattern = Pattern.compile(regex, flags);

      public int compare(String o1, String o2)
      {
        // matchers to match the input
        Matcher matcher1 = pattern.matcher(o1);
        Matcher matcher2 = pattern.matcher(o2);

        // change format (for proper sorting)
        String value1 = matcher1.replaceFirst(replacement);
        String value2 = matcher2.replaceFirst(replacement);

        // call the method with the correctly formatted values
        return naturalCompareTo(value1, value2);
      }
    };
MMDDYYYY sorting (example)

The below Comparator and test code apply the above template to sort a date in MMDDYYYY form.

By modifying the regex and replacement values, you can sort values in the format of your choice.

   Comparator<String> comparator = new Comparator<String>() {

      /**
       * Date format (mmddyyyy)
       *
       * <ul>
       *
       * <li>For month and day, leading zeros are optional.</li>
       *
       * <li>Year must be four digits</li>
       *
       * <li>The separator between month/day and day/year can be one of
       * the following: - . /</li>
       *
       * <li>The separator between month/day and day/year can be
       * different.</li>
       *
       * </ul>
       */

      final String regex = "(?<month>\\d{1,2})[-/.]"
          + "(?<day>\\d{1,2})[-/.]"
          + "(?<year>\\d{4})";


      /**
       * Replacement string used to reformat input (if necessary)
       * to convert to a consistent format, with most significant part
       * first.
       *
       * <p><b>Note</b>: use spaces to separate parts</p>
       */

      final String replacement = "$<year> $<month> $<day>";

      /**
       * Flags used when compiling the <code>Pattern</code>
       */

      final int flags = 0;

      /**
       * Pattern that describes the input format
       */

      final Pattern pattern = Pattern.compile(regex, flags);

      public int compare(String o1, String o2)
      {
        // matchers to match the input
        Matcher matcher1 = pattern.matcher(o1);
        Matcher matcher2 = pattern.matcher(o2);

        // change format (for proper sorting)
        String value1 = matcher1.replaceFirst(replacement);
        String value2 = matcher2.replaceFirst(replacement);

        // call the method with the correctly formatted values
        return naturalCompareTo(value1, value2);
      }
    };

Test code:

   SortedSet<String> dates = new TreeSet<String>(comparator);

    dates.add("1.2.2009");
    dates.add("1-5-2010");
    dates.add("1-05-2009");
    dates.add("01.2.2010");

    /*
     * Outputs the dates in the correct order:
     *
     * 1) 1.2.2009
     * 2) 1-05-2009
     * 3) 01.2.2010
     * 4) 1-5-2010
     */
    System.out.println(dates);