Blog: Archive

agsdix-fas fa-home

Blog: Home

agsdix-fas fa-pen-fancy

Blog: CEO's Corner

agsdix-fas fa-code

Blog: Tech Talk

Blog: Product Releases

agsdix-fas fa-chalkboard-teacher

Blog: Viewing

Blog: Conversion

JDBC Content Handler for VirtualViewer

by | Jun 26, 2008

One way the VirtualViewer Java servlet can be extended is by customizing the content handler on the backend of the servlet, allowing for tighter integration into content repositories or databases. PostgreSQL is an object-relational database management system licensed under the BSD license which has a strong industry backing and a number of different APIs including C/C++, Perl, Python, PHP, and JDBC which allow for database transactions to be written in your language of choice.

For this setup, I used PostgreSQL v8.3.0 with the JDBC3 PostgreSQL Driver v8.3-603 and ran the VV servlet under Apache Tomcat v6.06. Instructions for installing PostgreSQL with vary depending upon your platform of choice, but there are plenty of setup guides to be found– I personally find it easer to setup and interact with databases up under Linux Since the content handler will later be built against the JDBC PostgreSQL driver, I found it helpful to build this from source for debugging.

The default content handler shipped with VirtualViewer can be easily extended to save binary data into a PostgreSQL database using the saveFileBytes() and getFileBytes() methods. By default, this is where the file I/O occurs for all documents and metadata in the content handler. Instead of going to disk, we use JDBC calls to access a table in our database and retrieve the appropriate content.

Step 0: Get PostgreSQL or your DB of choice installed and all that good stuff.

Step 1: Create new table in the database to hold our documents and metadata. I choose to associate both with the bytea data type, so I created a table like so:

CREATE TABLE images (imgname text, img bytea);

Step 2: Create custom content handler. I created an insertViaJDBC() method to be called from saveFileBytes() which took in a file name and byte[] array to be inserted into the database:

DataInputStream dis = new DataInputStream(
new ByteArrayInputStream( filebytes ));
PreparedStatement ps =
conn.prepareStatement(“INSERT INTO images VALUES (?, ?)”);
ps.setString(1, filename);
ps.setBinaryStream(2, dis, filebytes.length);
ps.executeUpdate();

Likewise, I created a fetchViaJDBC() method to be called from getFileBytes() which took in a file

PreparedStatement ps =
conn.prepareStatement(“SELECT img FROM images WHERE imgname=?”);
ps.setString(1, filename);
ResultSet rs = ps.executeQuery();
if ( rs != null ) {
while( rs.next() )
filebytes = rs.getBytes(1);
rs.close();
}

This all standard JDBC code adapted from the manual pages. PostgreSQL can store binary data in two ways, with either the binary bytea data type or the large object feature. I did not implement the large object feature, but depending upon your document imaging needs, you should choose accordingly.

Step 3: Modify the servlet web.xml to point to your content handler. This is the only change to the web.xml you must make, but I also added in parameters for the database such as username, password and hostname. These parameters can then be referenced via your content handler and used in your JDBC methods. In my case, I grabbed these parameters in my content handler init() method. I also removed parameters for filePath and overPath from the web.xml since my content handler isn’t going to disk anymore for documents.

If you’ve implemented the content handler this far, you’ll noticed you’ve got a few problems wit annotations when you try and use it. This is because you’ll need to devise an alternative method for handling annotation layer names that doesn’t depend on file I/O. It’s fairly straightforward, basically just moving from files to String when you’re dealing with document paths. I ended up changing the method signature on both getFileBytes() and saveFileBytes() to accept a String for a file name instead of a File object– this also made for handling JDBC calls a little more logical.