Buffer:
instantiate using allocate, wrap & allocateDirect ( can also be obtained by mapping a portion of file into MappedByteBuffer)
invariant: 0 <= mark <= position <= limit <= capacity
transferring data: Relative (Buffer Overflow & Underflow Exception) , Absolute (Index out of bounds exception) & bulk
clear, flip & rewind
Thread Safety: (for ByteBuffer interally thread safety is not provided) Need to provide exteral synchronization
Others: ReadOnlyBuffer, DirectBuffer, compact, duplicate, Slicing
FileChannel:
fc.read(ByteBuffer)
fc.write(ByteBuffer)
fc.force() similar to flush for a strem
fc.map(MapMode, position, size); MapModes are read, write & private (copy on write)
fc.lock() –> exclusive
fc.lock(position,size,shared) -> shared/exclusive on a portion
use lock() in tandem with release()
*Interruptible: means a “blocking operation” can be interrupted using interrupt() method of the blocked thread.
Channels can be tested for interruptibility using instanceof java.nio.channels.InterruptibleChannel (means can be closed asynchronously and can be interrrupted).
if a thread is IO blocked uisng an interruptible channel then 2 things can be done:
1) Another thread can cause channel.colse() which causes the thread to receive AsynchoronousCloseException
2) Another thread can interrupt the blocked thread causing ClosedByInterruptionException and set interrupt status.
3) If thread’s interrupt status is already set performing a blocked IO operation causes ClosedByInterruptionException and interrupt status will remain set.
Character Sets:
charset or charset encoding is “named mapping between sequences of 16-bit unicode character and sequences of bytes” (now explain that!)
CharsetDecoder is used to convert bit-by-bit representation of a string into actual char (java primitive) values.
CharsetEncoder is used to convert char values into bit by bit representation.
Charset.forName(“UTF-8”);
charset.newDecoder();
charset.newEncoder();
Bonus:
http://stackoverflow.com/questions/2533097/java-unicode-encoding
Variable Width Encodings: UTF-8 & UTF-16 (UTF-32 is fixed width) to make it easy for encoding, decoding the bit representation is divided into singeltons, lead untis & trail units.
BOM (Zero-width non-breaking space): Now you think you know all about charsets, till you try to understand endianess and BOM. http://www.unicode.org/faq/utf_bom.html#BOM
More to cover later:
Buffer Bulk Reads
Scatter, Gather IO
File Locking
Networking and AsynchronousIO