Sunday, August 15, 2021

How to split String in Java by WhiteSpace or tabs? Example Tutorial

You can split a String by whitespaces or tabs in Java by using the split() method of java.lang.String class. This method accepts a regular expression and you can pass a regex matching with whitespace to split the String where words are separated by spaces. Though this is not as straightforward as it seems, especially if you are not coding in Java regularly. Input String may contain leading and trailing spaces, it may contain multiple white spaces between words and words may also be separated by tabs. Your solution needs to take care of all these conditions if you just want words and no empty String.

In this article, I am going to show you a couple of examples to demonstrate how you can split String in Java by space. By splitting I mean getting individual words as a String array or ArrayList of String, whatever you need.

In this Java String tutorial, I'll show you three ways to split a String where words are separated by whitespaces or tabs using JDK and without using any third-party library like Apache commons or Google Guava. The first example is your ideal situation where each word in the String is separated by just one whitespace.

In the second example, you will learn how to deal with multiple whitespaces or tabs by using a greedy regular expression in Java e.g. "\\s+" which will find more than whitespaces, and in the third example, you will learn how to deal with leading and trailing whitespaces on input String. You can even combine all these things in one solution depending upon your requirements.

And, If you are new to the Java world then I also recommend you go through The Complete Java MasterClass on Udemy to learn Java in a better and more structured way. This is one of the best and up-to-date courses to learn Java online.




1st Example - Splitting String where words are separated by regular whitespace

This is the simplest case. Usually, words are separated by just one white space between them. In order to split it and get the array of words, just call the split() method on input String, passing a space as regular expression i.e." ", this will match a single white space and split the string accordingly.

String lineOfCurrencies = "USD JPY AUD SGD HKD CAD CHF GBP EURO INR";
String[] currencies = lineOfCurrencies.split(" ");

System.out.println("input string words separated by whitespace: "
                      + lineOfCurrencies);
System.out.println("output string: " + Arrays.toString(currencies));

Output:
input string words separated by whitespace: USD JPY AUD SGD HKD CAD
                                 CHF GBP EURO INR
output string: [USD, JPY, AUD, SGD, HKD, CAD, CHF, GBP, EURO, INR]

This is also the easiest way to convert the String to String array, but If you want to convert the String array into ArrayList of String you can see the Java How to Program by Dietel, one of the most comprehensive books for beginner and intermediate Java programmers.




2nd Example - Splitting String where words are separated by multiple whitespaces or tabs

In order to handle this scenario, you need to use a greedy regular expression, which will match any number of white spaces. You can use "\\s+" regex for this purpose. If you look closely, we are using regular expression metacharacters and character classes. \s will match any space including tabs but \ require escaping hence it becomes \\s, but it's not greedy yet.

So we added +, which will match 1 or more occurrences, so it becomes greedy. To learn more about "\\s+" regular expression to remove white spaces, see this tutorial.

Anyway here is the code in action:

String lineOfPhonesWithMultipleWhiteSpace = "iPhone Galaxy Lumia";
String[] phones = lineOfPhonesWithMultipleWhiteSpace.split("\\s+");

System.out.println("input string separated by tabs: " 
                + lineOfPhonesWithMultipleWhiteSpace);
System.out.println("output string: " + Arrays.toString(phones));

Output:
input string separated by tabs: iPhone Galaxy Lumia
output string: [iPhone, Galaxy, Lumia]

You can see how we have converted a String to an array where three values are separated by multiple whitespaces. If you want to learn more about how regular expression works in Java, I suggest you read the regular expression chapter from  Java: How to Program by Deitel and Deitel.

How to split String in Java by WhiteSpace or tabs



3rd Example - Splitting String with leading and trailing whitespace

Splitting a String by white space becomes tricky when your input string contains leading or trailing whitespaces because they will match the \\s+ regular expression and an empty String will be inserted into the output array. To avoid that you should trim the String before splitting it i.e. call the trim() before calling split().

Though you should remember that since String is immutable in Java, you either need to hold the output of trim or chain the trim() and split() together as shown in the following example:

String linewithLeadingAndTrallingWhiteSpace = " Java C++ ";
String[] languages = linewithLeadingAndTrallingWhiteSpace.split("\\s");
languages = linewithLeadingAndTrallingWhiteSpace.trim().split("\\s+");

System.out.println("input string: " + linewithLeadingAndTrallingWhiteSpace);
System.out.println("output string wihtout trim: " 
                               + Arrays.toString(languages));
System.out.println("output string after trim() and split: " 
                                + Arrays.toString(languages));

Output:
input string with leading and trailing space: Java C++ 
output string without trim: [, Java, C++]
output string after trim() and split: [Java, C++]

If you want an ArrayList of String instead of a String array then follow the steps given in this tutorial.




Java Program to split string by spaces or tabs

Here is our sample Java program, which combines all these examples and scenarios to give you the complete idea of how to split a String by spaces in Java.


public class StringSplitExample {

  public static void main(String args[]) {

    // You can split a String by space using the split()
    // function of java.lang.String class.
    // It accepts a regular expression and you just need to
    // pass a regular expression which matches with space
    // though space could be whitespace, tab etc
    // also words can have multiple spaces in between
    // so be careful.

    // Suppose we have a String with currencies separated by space
    String lineOfCurrencies = "USD JPY AUD SGD HKD CAD CHF GBP EURO INR";

    // Now, we will split this string and convert it into an array of String
    // we use regex " ", which will match just one whitespace
    String[] currencies = lineOfCurrencies.split(" ");

    System.out.println("input string words separated by whitespace: "
        + lineOfCurrencies);
    System.out.println("output string: " + Arrays.toString(currencies));

    // above regular expression will not work as expected if you have multiple
    // space between two words in string, because it could pick extra
    // whitespace as another word. To solve this problem, we will use
    // a proper greedy regular expression to match any number of whitespace
    // they are actually separated with two tabs here

    String lineOfPhonesWithMultipleWhiteSpace = "iPhone Galaxy Lumia";
    String[] phones = lineOfPhonesWithMultipleWhiteSpace.split("\\s+");

    System.out.println("input string separted by tabs: "
        + lineOfPhonesWithMultipleWhiteSpace);
    System.out.println("output string: " + Arrays.toString(phones));

    // above regular expression will not able to handle leading
    // and trailing whitespace, as it will count empty String
    // as another word, as shown below

    String linewithLeadingAndTrallingWhiteSpace = " Java C++ ";
    String[] languages = linewithLeadingAndTrallingWhiteSpace.split("\\s+");

    System.out.println("input string with leading and traling space: "
        + linewithLeadingAndTrallingWhiteSpace);
    System.out.println("output string: " + Arrays.toString(languages));

    // You can solve above problem by trimming the string before
    // splitting it i.e. call trim() before split() as shown below
    languages = linewithLeadingAndTrallingWhiteSpace.trim().split("\\s+");
    System.out.println("input string: " 
                     + linewithLeadingAndTrallingWhiteSpace);
    System.out.println("output string afte trim() and split: "
        + Arrays.toString(languages));
  }

}

This program has demonstrated all three ways which we have discussed earlier to split a String by single or multiple whitespaces or tabs in Java.



Important points about the split() method:

  1. Splits this string around matches of the given regular expression.
  2. Returns the array of strings computed by splitting this string around matches of the given regular expression
  3. The split() method throws PatternSyntaxException - if the regular expression's syntax is invalid
  4. This method was added to Java 1.4, so it's not available to the earlier version but you can use it in Java 5, 6, 7, or 8 because Java is backward compatible.

That's all about how to split a String by space in Java. In this tutorial, you have learned to split a String where words are separated by a single white space, multiple whitespaces, tabs, and String containing leading and trailing whitespace. I have also shown you the trick to convert the String array you get from the split() method to convert into ArrayList of String. If you have any questions or doubts, feel free to ask.


Other Java programming tutorials on String you may like
  1. How to split String by comma in Java?
  2. How to split String using regular expressions in Java?
  3. 5 examples of splitting String in Java
  4. 2 ways to split String in Java by dot
  5. How to split a CSV String in Java?
  6. How to split String using StringTokenizer in Java?
  7. How to split String in JSP page using JSTL

5 comments :

Unknown said...

can some one please post how to split string without using split() library

Kumar.123 said...

Please fix the typo in String:
"How to split a CSV Strign in Java" ==> "How to split a CSV String in Java"

Unknown said...

So what's the difference between the first and second example?
The second one just as the first one, has at most one whitespace between words, not multiple.

javin paul said...

In second example, thee are two whitespace between words, its not clearly visible as you said but that's the intention. May be I can add more whitespace to clearly show the difference.

Unknown said...

the output of the third example could never be correct! the array 'languages' is [Java, C++] after the third line, so how could it ever output "[, Java, C++]"?

Post a Comment