Skip to Content

Ben Thompson: Informed Mix

Syndicate content Informed Mix
Random postings about IBM Informix
Updated: 1 hour 11 min ago

Informix or Client SDK install: No Java virtual machine could be found

Thu, 2017-01-05 13:44

This is a something of a note to self. For some time it has been been the case that you may see this message when attempting an Informix server or Client SDK install if there is a problem starting the installer’s Java runtime environment:

# LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64 ./ids_install
Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...
No Java virtual machine could be found from your PATH
environment variable. You must install a VM prior to
running this program.

To add insult to injury when this condition occurs the installer exits with status code zero, suggesting all is ok.

Now the obvious thing to do seems to be to install a Java package, wondering whether OpenJDK will suffice or the official Oracle version is needed. This is never the answer! The Informix installer comes bundled with its own Java run time environment (JRE) which gets extracted into /tmp/install.dir.X and your challenge is in fact to find out why it isn’t working as it should.

You can see in my attempt at installing the product I have already prefaced the command with LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64. This is already a known way of fixing some installation problems. (For a 32-bit version you’d simply use /usr/lib:/lib.)

Everyone’s friend, strace, is a great way to start investigating this problem. In amongst the output I find this:

faccessat(AT_FDCWD, "/tmp/install.dir.12813/Linux/resource/jre/jre/bin/java", X_OK) = -1 EACCES (Permission denied)

So why is this? I am logged in as root so I ought not be running into permission denied issues.

The core problem here is the way /tmp, which is a separate filesystem on my machine, is mounted. From the mount command output:

tmpfs on /tmp type tmpfs (rw,nosuid,nodev,noexec,relatime)

The key part here is the noexec flag which is a security feature preventing execution of binary files residing on this filesystem.

The best way to fix this is to set the environment variable IATEMPDIR to a directory on a filesystem where execution is allowed. I usually use /root for this purpose. And success:

# export IATEMPDIR=/root
# LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64:/lib64 ./ids_install
Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...

Launching installer...

Preparing CONSOLE Mode Installation...

While the above should be sufficient I have seen the server installer still fail to work even with this environment variable set as some files may still be placed in /tmp. In this situation you can temporarily remove the security restriction with:

mount -o remount,rw,nosuid,nodev,relatime,exec /tmp

and switch it back on again with:

mount -o remount,rw,nosuid,nodev,relatime,noexec /tmp

I suggest before running the above you check the existing mount options for your /tmp filesystem.


Intermittent “CSM: authentication error” with JDBC

Wed, 2016-12-21 19:24

This article will only concern you if:

  • you connect using JDBC.
  • you use simple password encryption.

Simple password encryption just does one thing: it encrypts the password sent to the database server in transit preventing it from being obtainable by network packet sniffing.

At the client end configuring simple password encryption can be done simply by adding
";SECURITY=PASSWORD"
to your JDBC connection string.

At the server end set up a DBSERVERALIAS and add
csm=(SPWDCSM)
to the fifth field in sqlhosts and set environment variable INFORMIXCONCSMCFG before starting the instance to point to a file containing something like:
SPWDCSM("/opt/informix/lib/csm/libixspw.so", "", "p=1")

This is covered in more detail elsewhere and I haven’t covered using CSDK but for JDBC connections it’s all there is to know.

Unfortunately there is a bug in JDBC 3.70.JC8W1 and JDBC 4.10.JC7 and below where every 100th connection attempt or so will fail randomly with this stack:

java.sql.SQLException: CSM: authentication error.
at com.informix.jdbc.IfxSqliConnect.(IfxSqliConnect.java:1337)
at sun.reflect.GeneratedConstructorAccessor3.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.informix.jdbc.IfxDriver.connect(IfxDriver.java:243)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at Connect.main(Connect.java:30)
Caused by: com.informix.asf.IfxASFRemoteException:
at com.informix.asf.Connection.recvBindResponse(Connection.java:1363)
at com.informix.asf.Connection.establishConnection(Connection.java:1619)
at com.informix.asf.Connection.(Connection.java:392)
at com.informix.jdbc.IfxSqliConnect.(IfxSqliConnect.java:1232)
... 7 more

You can see if you’re vulnerable by compiling this app and running it until it fails or you’re reasonably confident you don’t have a problem:

public class Connect
{
public static void main(String[] args)
{
Connection conn = null;
String url = "jdbc:informix-sqli://hostname:port/dbname:INFORMIXSERVER=informixserver;user=user;password=password;SECURITY=PASSWORD";
System.out.println(url);

try
{
Class.forName("com.informix.jdbc.IfxDriver");
}
catch (Exception e)
{
System.out.println("FAILED to load Informix JDBC driver.");
e.printStackTrace();
return;
}

int i=0;
while (true) {
i++;
try
{
conn = DriverManager.getConnection(url);
}
catch (SQLException e)
{
System.out.println("FAILED to connect! "+e);
e.printStackTrace();
}
System.out.println("Connected " + i);
if (conn != null) {
try {
conn.close();
}
catch (SQLException e) {
System.out.println("FAILED to disconnect! "+e);
e.printStackTrace();
}
}
}
}
}

If your application handles fails connections and retries automatically you might not have noticed this error or perhaps it was lost in the noise but for more simple applications it can be a pain.

Fortunately this is fixed in JDBC 4.10.JC8 and the fix works with 11.70 and 12.10 versions of the servers.

Interestingly the JDBC release notes for 4.10.JC8 are coy about this, showing just one fix.


Buffer waits

Mon, 2016-11-21 15:30

Is it really a year since I last wrote a blog post? It does mean I have had time to think of a few topics to write about.

Recently a desk visitor came to me at work about a performance issue. From what he said I got the impression that he thought that DBAs spend most of their time tuning SQL queries, something I spent little time on during a typical day, perhaps one of the advantages of working on a mature system.

Perhaps though he was kind of right. Many of the things I do are around making sure queries run reliably, consistently and in a scalable manner, I just don’t necessarily do this by looking at explain plans. Don’t get me wrong: these are very important; it’s just that once these are as good as they can be it doesn’t mean there aren’t other ways of finding bottlenecks or contention and tuning them out. For example when I was a more junior DBA I used to concern myself with buffer cache hits and, once I knew a little more, buffer turnover ratios, reducing I/O by allocating more memory as Moore’s Law provided rapid improvements in server CPU speed and memory size.

In the Oracle world DBAs have moved away from this measure and use the Oracle wait interface which allows you to see what operations the engine was doing. We can do this (slightly differently) in Informix too and it’s very useful. The simplest overview of waits is to look at position one of the flags column from “onstat -u”. The Informix documentation states:

flags
Provides the status of the session.
The flag codes for position 1:
B Waiting for a buffer
C Waiting for a checkpoint
G Waiting for a write of the logical-log buffer
L Waiting for a lock
S Waiting for mutex
T Waiting for a transaction
Y Waiting for condition
X Waiting for a transaction cleanup (rollback)

Most sessions will probably show Y which usually means they are waiting on TCP transit or are idle, waiting for the application or user’s session to do something.

As the manual says we can get more detail for buffers through onstat -b, -B and -X; for latches (mutexes) through onstat -s, (also -g lmx and -g wmx) and locks with onstat -k.

onstat -X is the most useful way to examine buffer waits but it is not as user-friendly as it could be, which is a shame because it often offers many clues to where problems lie:

Buffers (Access)
address owner flags pagenum memaddr nslots pgflgs scount waiter

Buffer pool page size: 2048
14700 modified, 16777216 total, 16777216 hash buckets, 2048 buffer size

Buffer pool page size: 4096
2443cd7e8 ffffffffffffffff 80 25:1523486 cce17b000 101 2801 0 0
267ca4628 0 0 47:1570054 105c3c5000 122 2890 1 0
53980 modified, 8388608 total, 8388608 hash buckets, 4096 buffer size

Buffer pool page size: 8192
59577 modified, 8388608 total, 8388608 hash buckets, 8192 buffer size

Buffer pool page size: 16384
3784a8188 ffffffffffffffff 80 162:18904760 4baadf4000 248 890 0 0
37854d188 ffffffffffffffff 80 162:24581408 4baeff4000 248 890 0 0
378ead5e8 ffffffffffffffff 80 124:25597240 4beb010000 187 2801 0 0
378f781a8 ffffffffffffffff 80 124:25597376 4bf0128000 710 890 0 0
3798d3388 ffffffffffffffff 80 124:25597176 4c2bf34000 710 890 0 595236d428
3799321a8 ffffffffffffffff 80 162:13196672 4c2e528000 248 890 0 624d39d668
37a353128 ffffffffffffffff 80 124:25597840 4c6f258000 197 801 0 0
37a4cefe8 ffffffffffffffff 80 168:32233760 4c78a50000 399 890 0 0
37c485d28 ffffffffffffffff 80 264:13942672 4d439d8000 319 890 0 0
37c5b45c8 ffffffffffffffff 80 162:24158848 4d4b2dc000 193 2801 0 0
37c80f368 ffffffffffffffff 80 168:33303832 4d5a400000 303 890 0 0
37caf6ce8 0 10 124:25597160 4d6cd70000 710 890 1 0
37ceaab28 ffffffffffffffff 80 166:8227296 4d84898000 332 890 0 0
37ceba8e8 ffffffffffffffff 80 124:25597648 4d84ef0000 710 890 0 0
37d70f4a8 ffffffffffffffff 80 124:25597208 4dba408000 193 801 0 0
37d891088 ffffffffffffffff 80 162:26376432 4dc3e54000 248 890 0 0
37dc9abe8 58cc3c7eb8 80 144:18435488 4dddbd0000 193 2801 0 0
87962 modified, 13762560 total, 16777216 hash buckets, 16384 buffer size

The key problem here from a usability point of view is that it is tedious to convert the chunk:pagenum format into an actual database object.

There is a similar problem with part numbers when deciphering output from, for example onstat -k that can be solved by downloading and compiling the ESQL/C utility partn from the IIUG software repository.

Loosely based on that here is my Perl script, chunkpg, which can provide friendly names for objects in chunk:pagenum format:

#!/usr/bin/perl

# Decipher chunk and page numbers

use strict;
use warnings;

&main;

sub main {

my $key = &check_params;

my $chunkno = 0;
my $nextinfo = 0;
my $pagesize = 0;
my $count = 0;
my $lastdbs = '';
my $syspagesize = 2; # change to 4 kb if required

my (%objs, %start, %end, %c, %ps, %dbs);

open (ONCHECKPE, "oncheck -pe |");
while () {
chomp;
if ($nextinfo == 1) {
$nextinfo = 0;
my ($blank, $path, $size, $used, $free);
($blank, $chunkno, $path, $pagesize, $size, $used, $free) = split / +/;
# print "$chunkno: $pagesize kb\n";
$count = 0;
$c{$chunkno} = 0;
$ps{$chunkno} = $pagesize;
}
elsif ($_ eq ' Chunk Pathname Pagesize(k) Size(p) Used(p) Free(p)') {
$nextinfo = 1;
}
elsif ($_ =~ /^DBspace Usage Report: /) {
my @parts = split / +/;
$lastdbs = $parts[3];
$dbs{$chunkno} = $lastdbs;
}
elsif ($_ =~ /^ \w+:\'\w+\'\.\w+ +/) {
$count++;
my ($blank, $obj, $offset, $size) = split / +/;
# printf ("%s: %d -> %d\n", $obj, $offset*2, $offset*2+$size*2);
$objs{$chunkno}{$count} = $obj;
$start{$chunkno}{$count} = $offset;
$end{$chunkno}{$count} = $size + $offset;
$c{$chunkno}++;
if (!$dbs{$chunkno}) {
$dbs{$chunkno} = $lastdbs;
}
}
}
close (ONCHECKPE);

while () {
chomp;
print "$_ ";
$_ =~ s/^ +//;
my @vals = split / +/;
if ($vals[$key-1] && $vals[$key-1] =~ /\d+\:\d+/) {
my ($chunk, $page) = split /\:/, $vals[$key-1];
for (my $i = 1; $i = $start{$chunk}{$i} && $end{$chunk}{$i} && $page/($ps{$chunk}/$syspagesize) <= $end{$chunk}{$i}) {
print $objs{$chunk}{$i}.','.$dbs{$chunk};
last;
}
}
}
print "\n";
}
}

sub check_params () {

die print "INFORMIXDIR is not set.\n" unless $ENV{'INFORMIXDIR'};
die print "INFORMIXDIR is not set to a valid directory (\'$ENV{'INFORMIXDIR'}\').\n" unless -d $ENV{'INFORMIXDIR'};
die print "INFORMIXSERVER is not set.\n" unless $ENV{'INFORMIXSERVER'};

if ($ARGV[0] && $ARGV[0] eq '-k') {
if ($ARGV[1] && $ARGV[1] !~ /\D/) {
return ($ARGV[1]);
}
else {
print "Invalid key number: $ARGV[1]\n";
exit 1;
}
}
elsif ($ARGV[0]) {
print "Invalid parameter: $ARGV[1]\n";
exit 1;
}
return (1);
}

You can then run as follows: onstat -X | chunkpg -k 4

If you were to run onstat -X repeatedly, perhaps at different times of day, you would begin to build a picture of where buffer waits are occurring.

Having identified buffer waits, what can be done to reduce them?

Ultimately it is going to come back to your database design, including its storage schema, the queries you run and maybe onconfig parameters.

Buffer waits on tables can be reduced using partitioning. Fragmentation by round-robin is effective for improving the rate of concurrent inserts (although it doesn’t facilitate fragment elimination in queries). Very large tables may require some form of partitioning to avoid reaching the 16.7m (2^24) page limit but even smaller tables with a large number of inserts and/or deletes can benefit.

It’s worth noting here that buffer waits occur in server RAM and so it’s not obvious at all that the storage schema should come into it. After it only directly affects what happens on disk, right? Not so: I can show that different storage schemas perform differently by running a concurrent insert test on a machine tuned to only flush data to disk at checkpoints. If a table has 10-way round-robin partitioning then it has ten different part numbers and is for many purposes ten different tables. For inserts we are always appending to the end of the table and so there will be contention on the last buffered page between sessions. Spreading this across ten different partitions reduces it.

Indices can also see a large number of buffer waits. In one stark example I found a large number of buffer waits were occurring on a large single-column index where every value in the column was null (nulls are indexed in Informix, unlike some other RDBMSs) and so there was a single leaf node pointing to all rows. Every insert and delete needed to modify this. A less extreme version of this might be seen where there are a limited number of values a field can take. In this case dropping the index or combining the index column with another to vastly increase the number of nodes would help throughput.

Would index fragmentation/partitioning help? It is probably less useful, in general because there is no 16.7m page limit for indices since version 11.70 and, in the case of fragmentation by expression, the same nodes could be equally congested.

In conclusion I think this method is a useful addition to your performance tuning armoury and by reducing contention you can improve the scalability of your system and increase throughput.


Copyright