Lecture 31 - Exceptions + Streams
When we looked at Strings, we got our first real glimpse at how Java processes textual information. In lab, you will be processing information contained in strings that were gathered from a web page. Now, we will take a look at how some of that works. First, based on the example where we read input from a web page, and then how we can read text files in our Java programs.
But before we look at those examples, first think about reading and writing of textual data by a computer in more general terms. While we have not done any real reading of text, we have written out text when we used the System.out.println method. It turns out that System.out (and its counterpart for input, System.in) is an example of a stream.
A stream carries data from some source to some destination. In the case of System.out.println, the stream takes its input from the String given as the parameter to the method, and delivers its output to the console window.
It turns out that the input analog of this can be fairly complex with Java, and does not work well with applets, but rather with applications. We have not seen Java applications in this class, but if you go on to CS 136, you will do most of your programming in applications instead of applets. When we consider reading and writing of files, we will do it with a Java application instead of a Java applet.
What is a text file? We have used them all along - our Java source files are text files. What we think of as files are really collections of characters. In fact, we can think of it at a lower level than that. Computers store everything in binary - zeroes and ones - true and false - on and off. So our text file is really a bunch of 0's and 1's - binary digits - in a sequence. When we interpret groups of these binary digits as characters, we can in turn interpret groups of characters as the file.
Suppose we want to represent the number "17". I can think of at least three ways I might store it in Java:
private int ivalue = 17; private double dvalue = 17; private String svalue = "17";
The int results in a 32-bit value being stored in the computer's memory. It looks like this:
00000000000000000000000000010001
Each binary digit represents a power of 2, and with each, we indicate if we include the power of 2 corresponding to its position in the array. If you understand decimal, you can understand binary. You just have two digits instead of our usual 10.
Storing the 17 as a double isn't quite so simple. Here, we store it as a real number, or more accurately, as a floating-point number. We will not go into the details here, but the underlying representation must still be binary, but since we need to be able to store fractional parts of number, the straightforward "powers of 2" idea from ints will not suffice. Clearly, a representation of "17" in a double will be a different collection of binary digits. Take CS 237 for more details.
Finally, the String representation means we would store the "17" using the characters '1' and '7', each of which has its own char representation. The ASCII codes for '1' and '7' are 00110001 (decimal 49) and 00110111 (decimal 55), so at some level, at least the computer might be storing 0011000100110111 to store our "17". Again, different from the other two options.
Back to our text file example, we said the text file is treated as a collection of characters. So we would likely encounter our "17" in a format like the last one. Fortunately, Java allows us to convert among the formats. We already saw the method Integer.parseInt that takes a String argument and returns its int equivalent (or throws an exception). We will see how to deal with different types in streams that read text files shortly.
Our example of how to read from a text file is a simple program that allows its user to select a file with a FileDialog and then reads the text it contains, counting up the four-letter words.
Demo: Short Words
This is an application, not an applet. Java's security features make it difficult to run an applet (especially from a web browser) that can access files on the hard disk (this is a Good Thing). So the above link is useful to look at the source code, but you will not be able to run the program from it.
Putting aside for a moment the fact that it is an application and needs some different syntax, let's look at the main loop that does the reading of the file.
words = new BufferedReader(new FileReader(fileName)); // get the first line of the file curWord = words.readLine(); while ( curWord != null ) { if ( curWord.length() == 4 ) { count++; fourLetters = fourLetters + curWord + "\n"; } // get the next line from the file curWord = words.readLine(); }
We first create a FileReader, passing it a String representing the name of a file we would like to read. The FileReader has a read method that we could use to get a character or group of characters from the file. But we would like something with more functionality, so we send the FileReader that we just created to the consturctor of a BufferedReader, which allows us to read the input one line at a time. This is done by a call to readLine, which reads an entire line of the input (up to a new line character or to the end of the file) and returns it as a String. We use that String to decide if the line is a four-letter word or not.
Other things to note:
public static void main(String[] args)
We can modify the example so it doesn't create any windows at all:
Demo: Short Words No Window
More things to note:
The starter for this week's lab includes the following method to convert the contents of a web page into a String:
// Download a Web page and return its contents as a String // Parameters: // url - The URL of the page to download. private String getWebPage(String url) { StringBuffer buildpage = new StringBuffer(); try { BufferedInputStream page = new BufferedInputStream(new URL(url).openStream()); for ( int input = page.read(); input != -1; input = page.read()) { buildpage.append((char) input); } } catch ( Exception ex ){System.out.println(ex);} return new String(buildpage); }
Let's look at what is happening in this example. First of all, note the use of the StringBuffer class. It's like a String but can be modified. Remember that Java Strings are immutable, and if we are trying to "modify" them by, for example, appending to them, Java must construct a brand new String at each step. The StringBuffer allows this append operation to be more efficient. At the end of the method, we convert our StringBuffer to a String and return it.
This example opens a stream whose source is a web page on a web server (URL) and creates a BufferedInputStream. This is a lot like the FileReader above. It does not contain a readLine method, so we read the characters one at a time and append them to the StringBuffer.
A simple use of this method is demonstrated in:
Demo: URL Reader 1
If we wanted to read them a line at a time, we could convert our program to use an InputStreamReader where we had used a FileReader in previous examples, and use that to construct a BufferedReader, which has the getLine method.
Demo: URL Reader 2
Suppose you want to read in words, not just characters or whole lines. The four-letter word counter was really a four-letter line counter. Here's a possibility for a program that works on all words in a file:
Demo: Tokenized Short Words
There are several interesting things happening here, some of which are noted in the comments. If you run this program on itself (provide ShortWords.java as the input), you get this output:
void main args args WORD Total = 5
even though we know there are more than five four-letter words in the file. The problem is that the StreamTokenizer is trying to be smart and it ignores our comments! So the words in the comments are never returned, and hence never checked. The following version does not ignore comments (or string constants or some other things):
Demo: Tokenized Short Words No Comments
Back to our example where we were reading a dictionary-like file with one word per line, suppose we want to read words from one file but then save those four-letter words out to another file.
Demo: Short Words Write File
The changes from our previous example:
new PrintWriter(new FileWriter(new File(args[1])));
Finally, we modify the example to read the words from the keyboard and write the four-letter lines to a file, also specified by typing in the file name at a prompt.
Demo: Short Words From Keyboard
Instead of opening a file to create our BufferedReader, we create it from System.in, the standard input stream.