[Guava使用教程]-I/O实用工具

字节流和字符流

Guava使用术语“stream”来表示I/O数据的可关闭流，这些数据在底层资源中具有位置状态。术语“byte stream”指的是InputStream或OutputStream，而“char stream”指的是阅读器或写入器(尽管它们的Readable和Appendable常用作方法参数)。相应的实用程序分为ByteStreams和CharStreams实用程序类。

大多数Guava流工具一次处理一个完整的流，并且为了效率自己处理缓冲。还要注意到，接受流为参数的Guava方法不会关闭这个流：关闭流的职责通常属于打开流的代码块。

其中的一些工具方法列举如下：

ByteStreams	CharStreams
[byte[] toByteArray(InputStream)](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/io/ByteStreams.html#toByteArray(java.io.InputStream))	String toString(Readable))
N/A	List readLines(Readable))
long copy(InputStream, OutputStream))	long copy(Readable, Appendable))
[void readFully(InputStream, byte[])](http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/io/ByteStreams.html#readFully(java.io.InputStream,%20byte%5B%5D))	N/A
void skipFully(InputStream, long))	void skipFully(Reader, long))
OutputStream nullOutputStream())	Writer nullWriter())

Sources和sinks

通常我们都会创建I/O工具方法，这样可以避免在做基础运算时总是直接和流打交道。例如，Guava有Files.toByteArray(File)和Files.write(File, byte[])。然而，流工具方法的创建经常最终导致散落各处的相似方法，每个方法读取不同类型的源或写入不同类型的sink。例如，Guava中的Resources.toByteArray(URL)和Files.toByteArray(File)做了同样的事情，只不过数据源一个是URL，一个是文件。

为了解决这个问题，Guava有一系列关于sources和sinks的抽象。sources和sinks指某个你知道如何从中打开流的资源，比如File或URL。sources是可读的，sinks是可写的。此外，sources和sinks按照字节和字符划分类型。

Operations	Bytes	Chars
Reading	ByteSource	CharSource
Writing	ByteSink	CharSink

sources和sinks API的好处是它们提供了通用的一组操作。比如，一旦你把数据源包装成了ByteSource，无论它原先的类型是什么，你都得到了一组按字节操作的方法。

创建Sources和Sinks

Guava 提供了若干sources和sinks的实现：

Bytes	Chars
Files.asByteSource(File)	Files.asCharSource(File, Charset)
Files.asByteSink(File, FileWriteMode...)	Files.asCharSink(File, Charset, FileWriteMode...)
MoreFiles.asByteSource(Path, OpenOption...)	MoreFiles.asCharSource(Path, Charset, OpenOption...)
MoreFiles.asByteSink(Path, OpenOption...)	MoreFiles.asCharSink(Path, Charset, OpenOption...)
Resources.asByteSource(URL)	Resources.asCharSource(URL, Charset)
[ByteSource.wrap(byte[])](http://google.github.io/guava/releases/snapshot/api/docs/com/google/common/io/ByteSource.html#wrap-byte:A-)	CharSource.wrap(CharSequence)
ByteSource.concat(ByteSource...)	CharSource.concat(CharSource...)
ByteSource.slice(long, long)	N/A
CharSource.asByteSource(Charset)	ByteSource.asCharSource(Charset)
N/A	ByteSink.asCharSink(Charset)

此外，你也可以继承这些类，以创建新的实现。

注：把已经打开的流（比如InputStream）包装为source和sink听起来是很有诱惑力的，但是应该避免这样做。source和sink的实现应该在每次openStream()方法被调用时都创建一个新的流。始终创建新的流可以让source或sink管理流的整个生命周期，并且让多次调用openStream()返回的流都是可用的。此外，如果你在创建源或汇之前创建了流，你不得不在异常的时候自己保证关闭流，这压根就违背了发挥source和sinkAPI优点的初衷。

使用Sources和Sinks

一旦有了source和sink的实例，就可以进行若干读写操作。

通用操作

所有sources和sinks都有一些方法用于打开新的流用于读或写。默认情况下，其他源与汇操作都是先用这些方法打开流，然后做一些读或写，最后保证流被正确地关闭了。这些方法列举如下：

openStream()：根据sources和sinks的类型，返回InputStream、OutputStream、Reader或者Writer。
openBufferedStream()：根据sources和sinks的类型，返回InputStream、OutputStream、BufferedReader或者 BufferedWriter。返回的流保证在必要情况下做了缓冲。例如，从字节数组读数据的源就没有必要再在内存中作缓冲，这就是为什么该方法针对字节源不返回BufferedInputStream。字符源属于例外情况，它一定返回BufferedReader，因为BufferedReader中才有readLine()方法。

Source操作

ByteSource	CharSource
[byte[] read()](http://google.github.io/guava/releases/snapshot/api/docs/com/google/common/io/ByteSource.html#read--)	String read()
N/A	ImmutableList readLines()
N/A	String readFirstLine()
long copyTo(ByteSink)	long copyTo(CharSink)
long copyTo(OutputStream)	long copyTo(Appendable)
Optional sizeIfKnown()	Optional lengthIfKnown()
long size()	long length()
boolean isEmpty()	boolean isEmpty()
boolean contentEquals(ByteSource)	N/A
HashCode hash(HashFunction)	N/A

Sink操作

ByteSink	CharSink
[void write(byte[])](http://google.github.io/guava/releases/snapshot/api/docs/com/google/common/io/ByteSink.html#write-byte:A-)	void write(CharSequence)
long writeFrom(InputStream)	long writeFrom(Readable)
N/A	void writeLines(Iterable<? extends CharSequence>)
N/A	void writeLines(Iterable<? extends CharSequence>, String)

实例

// Read the lines of a UTF-8 text file
ImmutableList<String> lines = Files.asCharSource(file, Charsets.UTF_8)
    .readLines();

// Count distinct word occurrences in a file
Multiset<String> wordOccurrences = HashMultiset.create(
    Splitter.on(CharMatcher.whitespace())
        .trimResults()
        .omitEmptyStrings()
        .split(Files.asCharSource(file, Charsets.UTF_8).read()));

// SHA-1 a file
HashCode hash = Files.asByteSource(file).hash(Hashing.sha1());

// Copy the data from a URL to a file
Resources.asByteSource(url).copyTo(Files.asByteSink(file));

文件操作

除了创建文件源和文件的方法，Files类还包含了若干你可能感兴趣的便利方法。

方法	描述
createParentDirs(File)	必要时为文件创建父目录
getFileExtension(String)	返回给定路径所表示文件的扩展名
getNameWithoutExtension(String)	返回去除了扩展名的文件名
simplifyPath(String)	规范文件路径，并不总是与文件系统一致，请仔细测试
fileTreeTraverser()	返回 TreeTraverser 用于遍历文件树