toxi.in.process

Sunday, July 29, 2007

String based designs

As we delve deeper into the realms of applied generative design and deal with a whole population of possible design outcomes, we often find ourselves preferring certain outcomes more than others and want to narrow down our explorations. So the identity of each such design plays an important role. Identity in this context can be defined by the set of input parameters used, but we also need to ensure the processing of these parameters is deterministic, meaning that even though we often use (pseudo)randomness as part of the algorithm, the outcome should be replicable for each set of parameters.

Most (if not all) pseudo-random generators use the concept of a random seed which subsequently produces a unique (and deterministic) sequence of "random" numbers. In Processing you can use both randomSeed() and noiseSeed() to achieve this. Now while using numbers is all fine, and technically speaking, all digital media is just numbers - there're use cases where it'd be nicer to use e.g. text as seed directly. For example, the 20,000 designs of the Lovebytes fluffies are all based on their generated character name only. There're about 10 other parameters, but these too are chosen based on the random sequence seeded by the name.

One way of turning a string into a number is by using message digests, like the popular MD5 or SHA1 algorithms. A message digest takes any number of bytes as input and calculates a fixed length hash. MD5 results in a number 128 bits long and SHA1 160 bits. This is more data than we can cope with since most common random number generators only accept up to 64 bits as input. In Java/Processing this is equivalent to the long type.

The following function takes a string as input, computes the hash and then returns the first 8 bytes as long integer to be used as random seed. Because it doesn't use the full hash it's possible in theory to end up with the same result for different inputs. However, I've not yet managed to come across a collision with the relative short strings (names, sentences, phrases) used in my work.

import java.security.*;

/**
* Calculates the message digest of the given string and
* returns the first 8 bytes packed into a long
*
* @param msg string to form hash from
* @param digest message digest ID (e.g. "MD5" or "SHA1")
* @return zero if failed, else partial digest as type long
*/
long getLongHash(String msg, String digest) {
long result=0;
try {
MessageDigest md = MessageDigest.getInstance(digest);
md.update(msg.getBytes());
byte[] buffer=md.digest();
for(int i=0,bits=56; i<8; i++) {
long val=(buffer[i]<0 ? 0x100+buffer[i] : buffer[i]);
result|=val<<bits;
bits-=8;
}
}
catch(Exception e) {
e.printStackTrace();
}
return result;
}

And again, use it like that:

long seed=getLongHash("Hello world!","MD5"); // "SHA1" as alternative
noiseSeed(seed);


Btw. The default Java Random generator does not guarantee to produce a deterministic sequence across all platforms. This means as long you're using Processing's default random() or noise() functions you're only guaranteed the same sequence as long as you stay on either Windows or OSX or Linux. Last year Marius wrote a Processing wrapper for the famous Mersenne Twister generator, however this one can only be used as alternative and in isolation. Processing's noise() function is hardcoded to use the default Java generator...