Statistical analysis question

Robert Maxwell · Dec 8, 2010

At least, I think it is. Bear with me. I kinda suck at math. It's my secret shame.

I have a program I'm working on that generates a heightmap using perlin noise. Straightforward stuff, right? For simplicity's sake, let's just say it creates an image of arbitrary size where each pixel has a value between 0 and 255, which is its height value.

Since I intend to use this as a terrain generator, I want to be able to set the sea level, however I want to be able to set the sea level based on the percentage of land it will cover. It doesn't have to be exact, just fairly close.

What kind of analysis must I do on the pixels to determine how many of them fall below this arbitrary percentage? In other words, how do I find the height value that represents the nth percentile of the entire pixel population? I'm sure there is some really obvious answer but I am utterly failing to see it right now.

I know there are some math wonks around here so I'm hoping you guys can help me.

Lindley · Dec 8, 2010

1) Generate a list of the coordinates of all pixels.
2) Sort the list of coordinates according to the height at that location.
3) Pick index floor(percentile * n_pixels/100), and check what the height is at the corresponding coordinate. This is your sea level.

Robert Maxwell · Dec 8, 2010

I knew there was some obvious solution I was missing. You're my hero, Lindley.

Lindley · Dec 8, 2010

Now that I think about it, a full sort isn't even required. For instance C++ provides the nth_element() algorithm which runs in O(n) rather than the O(n log n) required by sort(). The idea is to run the quicksort algorithm, except only descending recursively down the relevant side of the tree rather than both sides.

Robert Maxwell · Dec 8, 2010

I'm using Python, so I just put every pixel value into an array and then sorted it in-place. That made it really easy to get at the index I want.

Not sure what the efficiency of Python's in-place array sorting is but it seems to be damned fast--certainly a lot faster than the actual perlin algorithm I'm using. :lol:

Jadzia · Dec 11, 2010

How are you interpolating your noise? eg, Linear / Cosine / Cubic Spline.

I can imagine cosine interpolation would be slow if you are calling the function several times for each pixel. Precalculating a cosine lookup table should be quicker.

What may be of interest to you: the distribution of heights in perlin noise roughly follows a normal distribution. So the most likely heights will be around 127, and the least likely heights will be at the extremities around either 0 or 255.

To input X% of land area under sea level, and output an estimate of the sea level Y, you would need to imagine integrating this normal curve from 0 upto Y such that the integral sums to X.

That's not so straight forward to do analytically, but it would not hard to do numerically. Literally Y summations.

Also, what fall-off are you using at the moment? ie, what is amplitude as a function of frequency.

Robert Maxwell · Dec 11, 2010

Linear interpolation. I'm pretty much just using a straight port of Ken Perlin's revised algorithm.

I do some additional calculations to add more variation to the landscape and that prevents a normal distribution. Sea level for 50% coverage is usually near 127 but is rarely right on the nose.

Amplitude and frequency start off identical and change with each octave of the perlin function: frequency is doubled and amplitude is multiplied by persistence, and I'm using 2/5 for persistence by default.

I admit I don't really understand the whole relationship between amplitude, frequency, and persistence. I just mess with the numbers to see what results I get.

Jadzia · Dec 11, 2010

Well the persistence is related to the fractal nature of the noise.

You're doubling frequency each iteration : f = 2^i
You're multiplying amplitude by p each iteration : A = p^i

Now consider zooming into your perlin noise by a factor of 2, and see there are two separate effects:

1. Horizontally, the frequencies would all appear to have been halved.
2. Vertically, the amplitudes would all appear to have been doubled.

If the noise looked to have the same smoothness/spikiness when zoomed, then we would say it is self similar noise, and you can see from the equations above that this happens if p=1/2. The zoom in is equivalent to stepping up one iteration from i to i-1.

When p takes a different value, it means that the zoomed noise will look distinctly different, either becomming smoother or spikier than at the the original zoom level. p=2/5 (<1/2) means your curve will appear smoother as you zoom to infinity, and the zooming image would tend towards a straight line.

For values >1/2 , the curve would appear to become steeper and increasingly mobile as zoom increases. The image would tend to vertical lines which are so closely packed that they actually fill space to some extent.

This is one of the phenomenons of fractals, where a one dimensional curve can actually have some degree of area if it is infinitely mobile. The dimension of the thing is somewhere between being 1 dimensional (a curve) and 2 dimensional (an area), which is the so called fractional (fractal) dimension.

Robert Maxwell · Dec 11, 2010

That actually explains a lot. I did notice low persistence makes for very smooth terrain whereas high persistence makes it really rough, fuzzy, and craggy. I haven't messed much with the frequency or amplitude but I do tweak the persistence some.

I've also got a smoothing function that basically layers zoomed-in versions of the terrain on top of itself, producing large scale variations in the heightmap. Without the smoothing function I just get a bunch of islands more or less evenly distributed. The smoothing function turned out to be the key to making continents.

Thanks for going into some detail about all this.

I'm working on a whole procedural world generation system--terrain, climates, biomes, the whole deal. Actually got most of the details of the climate distribution nailed down today.

Jadzia · Dec 11, 2010

Robert Maxwell said:
I've also got a smoothing function that basically layers zoomed-in versions of the terrain on top of itself, producing large scale variations in the heightmap. Without the smoothing function I just get a bunch of islands more or less evenly distributed. The smoothing function turned out to be the key to making continents.

What that has done is added harmonics below the base frequency.

If you got lots of small islands, it may be that your base frequency is simply too high. In a 1000px wide image, you can have the base frequency (wavelength) as low as 1000px, so in the first iteration you'd be picking random values for only the four corners, and interpolating between those the whole image.

I'm working on a whole procedural world generation system--terrain, climates, biomes, the whole deal. Actually got most of the details of the climate distribution nailed down today.

That sounds interesting.

I remember a few years ago creating an erosion model, that simulated rainfall over a terrain map. I got the water flow to collect in channels which gradually eroded the terrain, picking up solids in some places and depositing it in others, so creating some nice rivers valleys and meanderings.

It was in response to a theory I had a read that the natural length of a mature river approaches π times the linear distance from source to mouth. I wanted to see if I could get that in a simulation.

The program was too slow to work at high resolution though, so I never really saw that happen.

Asbo Zaprudder · Dec 11, 2010

Jadzia said:
It was in response to a theory I had a read that the natural length of a mature river approaches π times the linear distance from source to mouth. I wanted to see if I could get that in a simulation.

That's not hard to rationalise if you imagine a meandering river as being composed piecewise of a number of smoothly joined, semicircular arcs of varying radii -- each segment will have a length that is pi times the radius. I don't know of a theoretical proof, however, nor how to consider segments of varying degrees of arc, nor why a random walk factor such as the square root of the number of "segments" doesn't appear. I guess the last two factors cancel out somehow.

Robert Maxwell · Dec 11, 2010

Jadzia said:
Robert Maxwell said:

I've also got a smoothing function that basically layers zoomed-in versions of the terrain on top of itself, producing large scale variations in the heightmap. Without the smoothing function I just get a bunch of islands more or less evenly distributed. The smoothing function turned out to be the key to making continents.

Click to expand...

What that has done is added harmonics below the base frequency.

If you got lots of small islands, it may be that your base frequency is simply too high. In a 1000px wide image, you can have the base frequency (wavelength) as low as 1000px, so in the first iteration you'd be picking random values for only the four corners, and interpolating between those the whole image.

I'm working on a whole procedural world generation system--terrain, climates, biomes, the whole deal. Actually got most of the details of the climate distribution nailed down today.

Click to expand...

That sounds interesting. I remember a few years ago creating an erosion model, that simulated rainfall over a terrain map. I got the water flow to collect in channels which gradually eroded the terrain, picking up solids in some places and depositing it in others, so creating some nice rivers valleys and meanderings.

It was in response to a theory I had a read that the natural length of a mature river approaches π times the linear distance from source to mouth. I wanted to see if I could get that in a simulation.

The program was too slow to work at high resolution though, so I never really saw that happen.

I'm on the fence about whether I want to bother with erosion. I'm definitely going to do rivers, but erosion? I dunno. I'm generating this world just so I can populate it with people and simulate interactions, civilizations, wars, etc. So, the terrain just has to be plausible, not perfect. Rivers are a necessity since they have so much impact on civilizations. Realistic erosion is a marginal concern by comparison, at least to me.

My initial attempt at rivers will consist of picking random high points in climate regions with substantial precipitation, and have the water flow downhill from that point, allowing for the possibility of forking and so forth.

If I do erosion, I guess I'd have to model the water flows over the whole region, and have it erode grooves in the terrain, eventually forming up in "permanent" rivers. My fear is that I'd have to run a ton of iterations of this to make it look good, and I don't want it to take hours to generate a world. :lol:

Jadzia · Dec 11, 2010

Robert Maxwell said:
My initial attempt at rivers will consist of picking random high points in climate regions with substantial precipitation, and have the water flow downhill from that point, allowing for the possibility of forking and so forth.

If there are any bowl shaped regions of land (and there will be with random generation), then your slope descending algorithm will get stuck if it runs into one. How would your rivers climb out of such a bowl?

In nature, these would correspond to lakes (above sea level) which would fill up over an area until they spill over so allowing the river to pass through.

Robert Maxwell · Dec 11, 2010

Jadzia said:
Robert Maxwell said:

My initial attempt at rivers will consist of picking random high points in climate regions with substantial precipitation, and have the water flow downhill from that point, allowing for the possibility of forking and so forth.

Click to expand...

If there are any bowl shaped regions of land (and there will be with random generation), then your slope descending algorithm will get stuck if it runs into one. How would your rivers climb out of such a bowl?

In nature, these would correspond to lakes (above sea level) which would fill up over an area until they spill over so allowing the river to pass through.

At least initially, if a particular river can't go any lower it will just stop. Either it meets other water or it stops at the lowest point it can reach. From there, I can figure out a good mechanism for turning those endpoints into lakes (and allowing them to flow out as rivers again.)

Jadzia · Dec 11, 2010

The lakes could also flow out to other bowls, which may or may not have been resolved into lakes at that time.

You'd need to keep track of which bowls are unprocessed and which have been resolved into lakes with outflow. A stack/queue would be very useful to ensure that you don't end up with lakes with multiple outflows.

Robert Maxwell · Dec 11, 2010

Actually, it just occurred to me how I could do lakes pretty easily.

1. Starting from an arbitrary high point, proceed downhill in the most direct path possible.
2. If the river reaches a pixel where there are no lower pixels adjacent, fill all of the lowest adjacent pixels. Since my river function will be recursive, each of those newly-filled pixels is a potential river (if it finds a downhill outlet.) Repeat this step until we find a way out!

I'll have to put some restrictions on how big a lake can get, though. Worst case, I could end up flooding the entire map! So there has to be some intelligence involved when building lakes. Maybe limit the recursions for filling nearby pixels so lakes will never get beyond a particular size. Guess I'll trial and error my way through that.

Jadzia · Dec 12, 2010

I can imagine that making diamond shaped lakes.

I like your idea, but I'd probably try a variation on it, by having two heightmaps. The original land height map a[x,y] plus a new land+water height map: h[x,y].

The algorithm you'd use would be something like:

If there is a downward gradient then
... descend
else
... consider this pixel [x,y] flooded :
... call function to flood fill from this pixel. It returns a new pixel for the lake outlet
endif

::flood fill function::
(1)
The idea here is to increase h[x,y] by one and let ffh = h[x,y]. Then increase the height of neighbouring points to match (=ffh) as long as either:

(i) the terrain a[x,y] is sloping down to the lake interior. ie a[x,y] is increasing in the direction of the expansion, or
(ii) h[x,y]>a[x,y], meaning this point is already part of the lake so it doesn't matter if we flood it deeper.

This expansion should take the form of a breadth first search from [x,y]. Valid points in the search only those with h[x,y]<ffh.

If a[x,y] is ever found to be decreasing in the direction of the expansion, and to a point where a[x,y]=h[x,y], then this is where the water level has reached a lip in the landscape and this is the point we will return as the source of the new river.

If we find such a lip, we should bail out of the search, and return the new pixel.

If the search terminates however without finding a lip, then the flood fill has been unsucessful, so we must raise the level of the lake by one, and try again by going to (1)

Robert Maxwell · Dec 12, 2010

Hmm, that's not a bad approach! What's "fhh"?

I'm already keeping multiple versions of the map so one more won't hurt. I discovered how easy it is to use pygame for graphics manipulation. :lol:

Jadzia · Dec 12, 2010

I've revised my algorithm. I think it would have flooded the whole map as it was. ffh is a local integer variable created in the flood fill function.

Robert Maxwell · Dec 12, 2010

Ah, gotcha. Thank you!

Here are some examples of what I've got so far.

Just plain noise:

Then the water mask based on the sea level:

Then the normalized map with water made pure black:

Terrain based on elevation (I don't care for how this works right now):

And finally, distributed climate:

Climate distribution was surprisingly simple and I'd be happy to explain how I did that if anyone cares. :lol:

Otherwise, just enjoy the pictures.

Statistical analysis question

memelord

Moderator with a Soul

memelord

Moderator with a Soul

memelord

on holiday

memelord

on holiday

memelord

on holiday

Admiral

memelord

on holiday

memelord

on holiday

memelord

on holiday

memelord

on holiday

memelord

Similar threads

We value your privacy