Lecture 28 - Strings
Informally, a String is just a sequence of characters, such as "Hello, class" or "Java is a language". Strings have a length - the length of a string is just the number of characters that make up the string. So the string "Hello, class" has length 12 (counting the "space" and "comma") while "Java is a language" has length 18.
We have already been using several Java methods to output strings to the computer screen. The statement
System.out.println( "Java is a language" );
has been used to output a string to a "standalone" Java console window.
In Java, a String is actually a built-in class that contains a myriad of methods that operate on strings. What are some of the operations you might want to do with Strings?
You might want to find the length of a string or to convert a string to uppercase or to lowercase. You might want to concatenate two strings, i.e., to append one string to the end of another. For example, the concatenation of the string "Chubby" with "Hubby" is the string "ChubbyHubby".
You might want to compare two strings. When two strings are compared, the characters in the strings are compared one-by-one starting at the left. One string is less than another, for example, if the first appears before the second one in the dictionary. This is called a "dictionary" or "lexicographical" comparison. So, in this sense, we might want to compare two strings S and T to find out if "S equals T", if "S is less than T", or if "S is greater than T".
Given a string such as "To be or not to be" we might want to search the string for the first occurrence of a character or the first occurrence of a substring. If we find that a particular character is in a string, we might want to replace all occurrences of the letter with some other letter. Given the occurrence of a substring within a larger string, we might want to replace the substring with some other string or extract portions of the string, for example, the substring itself or everything before or after the substring.
Java contains methods to do all of these operations on strings. Below you will find examples of how to perform some of these activities. The text describes even more. We will also look at a rather nice application that makes use of these string methods. The following material describes how to use many of the string methods.
Since a String is a class, we can declare our own string variables in a program. So, for example, we could declare two strings variables S1 and S2 by writing
String S1, S2;
The above would only declare the variables. Java provides several constructors to both declare and initialize string variables. The easiest to use is the String() constructor:
S1 = new String( "Java is a language" );
The above creates a String variable called S1 whose initial value is "Java is a language". We could also do the same thing with
String S1 = "Java is a language";
The length() method (not to be confused with A.length for arrays) can always be used to determine the exact length of a string. For example,
S1.length()
will return the value 18 for the above string S1.
Strings in Java can be concatenated using the plus (+) operator. For example, given the declarations
G = new String( "Rain" );
C = new String( "Man" );
then
String D = G + C ;
would assign "RainMan" to string D. Strings can also be concatenated using the concat() method. For example, we could do the same with
String D = G.concat( C );
In Java, the
equals(), equalsIgnoreCase(), and compareTo()
methods can be used to compare strings. The equals() and equalsIgnoreCase() methods return a boolean while the compareTo() returns a 0 when a strings are equal, a negative number when the first is less than the second, and a positive number if the first is greater than the second.
So, given the declaration
R = new String( "Fred" );
S = new String( "Sue" );
T = new String( "fish" );
then
R.equals( "Fred" ) would be true
R.equals( T ) would be false
R.equalsIgnoreCase( "FRED" ) would be true
R.compareTo( S ) would be negative
R.compareTo( "Fred" ) would be 0
T.compareTo( R ) would be positive
Java also has several forms of the indexOf() method that that make it easy to search a string for the first occurrence of a character or first occurrence of a substring within a larger string.
Given the declarations
Q = new String( "To be or not to be"
);
S = new String( "Virginia" );
T = new String( "gin" );
then
Q.indexOf( (int) 'o' ) would be 1
Q.indexOf( (int) '*' ) would be -1 (for 'not found')
Q.indexOf( "be" ) would be 3
S.indexOf( T ) would be 3
Q.indexOf( "be", 5 ) would be 16 (begin the search with
index 5)
The methods toLowerCase() and toUpperCase() can be used to
convert strings from one case to another. For example, if S contains
the string "happy", then
S.toUpperCase() would return "HAPPY".
You can extract a portion of a string using the substring() method. For example, given the above String Q, then
Q.substring( 6,12 ) would return "or not"
Q.substring( 6 ) would return "or not to be"
Converting a string to an integer - converting an integer to a string.
This by no means exhausts the string capabilities of Java but I am sure you have the idea by now that Java has been designed methods that can do just about anything you can think of with strings. What follows illustrates a very nice way to use these string methods.
Things You can do with Strings: Eliza & Palindromes
Demo of Eliza
Web Sites: www.gamelan.com
www.vperson.com/mlm/julia.html
www.cl.cam.ac.uk/users/mh10006/eliza.html
String courseName = "Data Structures"; int nameLength = courseName.length();nameLength gets the value 15.
courseName.indexOf("S")returns the value 5. Note that indexing starts at 0, as it does with arrays.
Of course, we can also say
String searchString = "S";and then
courseName.indexOf(searchString)
Another option is:
public int indexOf(String s, int startIndex)which begins its search at the specified index.
What's the result of
courseName.indexOf("Struct", 2) courseName.indexOf("Struct", 8) courseName.indexOf("t", 2) courseName.indexOf("t", 6) courseName.indexOf("t", 7)
.
Here is a method that searches for the number of occurrences of a String, word, in another String, text.
public int wordCount(String text, String word) { int count = 0; int pos = text.indexOf(word,0); while (pos >= 0) { count++; pos = text.indexOf(word,pos+word.length()); } return count; }
What does the above method return for these call?
wordCount("yabbadabbadoo","abba"); wordCount("scoobydoobydoo","oo");
How about this one? [Side note: I bet everyone knows where the strings above come from but what about this next one?]
wordCount("bubbabobobbrain","bob");
Is that what you want? How would you modify the method to include the overlapped "bob"s above?
Note that "s" and "S" are different strings.
public String toLowerCase() public String toUpperCase()
A case-insensitive word counter/finder differs from the above code only by the added two lines in the following.
private int substringCounter( String text, String word) { int count = 0; int pos; text = text.toLowerCase(); word = word.toLowerCase(); pos = text.indexOf(word,0); while ( pos >= 0 ) { count++; pos = text.indexOf(word,pos+word.length()); } return count; }Note that the assignment statements to text and word are required. String methods do not manipulate the given String. They make a brand new one. We say that Java Strings are immutable.
public String substring(int startIndex, int indexBeyond)For example:
String countText = "3 Balls, 2 Strikes, 2 Outs"; String strikesOnly; strikesOnly = countText.substring(9,18);
Link finder:
We want to write a program to find and extract all of the links in an HTML file. To do this, we need to know how a link is defined in an HTML file:
<a href="the URL">link </a>So we need to find the tags "<a>" and "</a>" that surround the URL.
Convert the string that is the HTML file to lowercase.
String links = "";Find the first position of "<a"
While there is a link remaining (i.e., tagPos is not -1)
// Extract all the links from a web page private String findLinks(String fullpage) { int tagPos, // Start of <A tag specification tagEnd; // Position of first ">" after tag start // A lower case only version of the page for searching String lowerpage = fullpage.toLowerCase(); // Text of one A tag String tag; // The A tags found so far String links = ""; // Paste stuff on end of page to ensure searches succeed fullpage = fullpage + " >"; tagPos = lowerpage.indexOf("<a ",0); while (tagPos >= 0 ) { tagEnd = fullpage.indexOf(">",tagPos+1); tag = fullpage.substring(tagPos, tagEnd+1); links = links + tag + "\n"; tagPos = lowerpage.indexOf("<a ", tagEnd); } return links; }
public boolean startsWith(String s) // true only if this string starts with s public boolean endsWith(String s) // true only if this string ends with s public boolean equals(String s) // true only if this string has same sequence of chars as s public boolean equalsIgnoreCase(String s) // true only if this string has same sequence of chars as s // except capital & lower case letters considered the same public int lastIndexOf(String s) public int lastIndexOf(String s, int startIndex) // return index of last occurrence of s (occurring at or // before startIndex) in this string, and -1 if no match. public String replace(char oldChar, char newChar) // Returns a new string resulting from replacing all // occurrences of oldChar in this string with newChar. public String trim() // Eliminates all leading and trailing spaces. public int compareTo(String s) // Returns negative int if string before s in case-sensitive // dictionary order; // returns 0 if equal // returns positive int if string after s in case-sensitive // dictionary order. public char charAt(int index) // Returns the character at the specified index.
Demo: String Demos
The above demo program shows demonstrates many of those functions.
Our Strings are made up of characters, so let's take a look at characters. In some sense, a Java String is really an array of characters, but we don't treat it like a regular array. In C, strings are actually nothing more than an array of characters.
A character is what you probably expect - roughly speaking, it is a keyboard character. This includes special function characters (like newline and tab), in addition to alphanumeric characters.
A character is represented internally as a number - an integer, in fact. There are various "universal" codes that can be used to represent characters:
To declare a variable to be of character type:
private char letter;
Use single quotes for character constants: 'H' is a char; "H" is a String:
char letter = 'H';
Let's look at a program that might use this.
Demo: Color Mixer
It doesn't behave nicely if we enter things that aren't valid numbers into the text fields. But we can check for that. We can use the charAt method to make sure the characters are valid numbers.
Demo: Safer Color Mixer
Notice how it behaves if you include a non-digit in the input.
We do this in a private method, isInteger, that steps through the String and checks each character to make sure it's in the range '0' to '9'. We can do this in a nice, simple if statement because the characters are compared according to their code values, and their code values are consecutive.
// Checks whether a given string can be interpreted as an integer: i.e., // checks that it is made up of characters that are digits in the range // 0-9. Returns true if and only if the string can be interpreted as // an integer. private boolean isInteger(String aString) { boolean allNumeric = true; for (int i = 0; (i < aString.length() && allNumeric); i++) { if (aString.charAt(i) < '0' || aString.charAt(i) > '9') allNumeric = false; } return allNumeric; }