pyogre and stackless ?

BerndWill

17-04-2006 20:22:40

Hi,

has anybody successfully used stackless (http://www.stackless.com) and/or twisted (http://www.twistedmatrix.com) in combination with PyOgre ?

Thanks in advance
Bernd

Istari

17-04-2006 21:02:58

http://www.ogre3d.org/phpBB2addons/viewtopic.php?t=344&highlight=stackless

The search page is your friend

pkdawson

20-04-2006 01:28:16

Is there any actual benefit to "Stackless" Python? It sounds like they decided to reinvent OS-level multiprogramming, which is fine...lightweight user-level threads can be quite useful (though it then becomes non-trivial to take advantage of multiple processors), but for game programming? I can't think of any reason it would be worthwhile to have hundreds of threads of execution. The idea of multi-threaded AI just makes me want to cry.

OvermindDL1

20-04-2006 04:29:04

If you use it for AI, it is useful for things like:

def someFunc():
toPoint = GetPointToMoveTo()
moveTo(toPoint) # moveTo can be a blocking function, so it returns when the object gets there.
activateNearestSwitch()


or triggers and such like:

def autoCloseDoor():
try:
playAnimationWithPhysics("Close")
playSound("doorSlamShut")
catch(PhysicsImpactException e):
# Door impacted something, something was in the way
callAutoOpenDoor() # would not be called now, but would be added to the thread stack
yield(5000) #5 seconds
callAutoCloseDoor() # will add this function to the thread stack to be called again to try to close again


Not the best example, but you get the idea.

pkdawson

20-04-2006 05:33:02

Yeah, the code looks nice and elegant, at least if you compare it to poor use of traditional imperative idioms. But the overhead and synchronization issues get glossed over. Context switches aren't free, and throwing an exception is VERY not free ;)

(edit) On second thought, if Stackless Python is single-threaded at the OS level, I suppose synchronization isn't really an issue.

I've got some code that involves on the order of ~100,000 actors moving around a map. I'll see how well it ports to Stackless, and if it runs decently, I'll shut up :)

OvermindDL1

21-04-2006 01:42:15

Yeah, the code looks nice and elegant, at least if you compare it to poor use of traditional imperative idioms. But the overhead and synchronization issues get glossed over. Context switches aren't free, and throwing an exception is VERY not free ;)

(edit) On second thought, if Stackless Python is single-threaded at the OS level, I suppose synchronization isn't really an issue.

I've got some code that involves on the order of ~100,000 actors moving around a map. I'll see how well it ports to Stackless, and if it runs decently, I'll shut up :)


No context switches, no sync issues, etc... As it states, they are microthreads, threads which exist at program level, not kernal level, so they have an overhead of like a couple bytes, instead of a couple megs like a full thread, and they have no sync issues if you run coop multi-threading (the best way imo).

As for the exception, as you see, you use it for *exceptional* events, events that happen rarely, such as you are waiting for it to close but someone runs underneath it, how often do you expect that to happen? Not to mention that it rudely interrupts the closing process. :)

pkdawson

21-04-2006 12:33:50

No context switches
I'm sorry, but I don't think you quite understand what a context switch is. Resuming execution of another 'thread' of any kind means you have to load data from memory into the CPU registers (ie, a context switch). The load instructions themselves are pretty cheap, but I'll bet that finding the data to load from Stackless's execution tree takes some effort. The benefit of user-level threads is that they don't require an extra switch into kernel-space.

Like any threading model that isn't hopelessly broken, Stackless' implementation of tasklets requires storing the CPU state (128 bytes? I'm not familiar with i386 architecture) and an extra set of activation records. What Stackless does is store the activation records in the form of a tree (yay pointers), rather than the usual continuous block of memory in stack form. Dereferencing pointers to traverse a tree means you're potentially accessing memory in widely different locations, which limits the usefulness of the CPU cache.

Stackless has a great feature set, and I don't doubt that it can handle hundreds or even thousands of tasklets on modern hardware with negligible slowdown. Take it up to 10,000 if your program is I/O-bound.

But 100,000? The Stackless whitepaper says it requires about 400 bytes for a simple continuation, so that's about 38MB, all of which needs to be accessed on every iteration. That's a *lot* of overhead. Again, I'll do some benchmarking of my own and shut up if it turns out that I'm wrong.

pkdawson

21-04-2006 15:57:00

Sure enough, Stackless is about 25% slower than the usual iterative solution. Using 100,000 entities and doing ten iterations of some very simple calculations, Stackless starts showing its overhead.

stackless: 21.779
iterative: 17.263


In a similar test, I confirmed that it also has about 40MB of overhead.

I'm disappointed; I was hoping I could use Stackless in my project. But I think I might be able to fake some kind of cooperative scheduling without it. Something to work on when I have more time.

Anyway, here's the code I used:

from time import time
from random import randint

import stackless
from stackless import tasklet

class Group:
def __init__(self):
self.pos = (
randint(-1000, 1000),
randint(-1000, 1000) )

def run(self, dx):
self.pos = (
self.pos[0] + randint(-5, 5),
self.pos[1] + randint(-5, 5) )

class CoopGroup(Group):
def run(self, dx):
for i in range(10):
Group.run(self, dx)
stackless.schedule()

def test_stackless():
stackless.run()

def test_iterate(map):
for i in range(10):
for g in map:
g.run(0)

def main():
for n in range(100000):
g = CoopGroup()
t = tasklet(g.run)(0)
t = time()
test_stackless()
print "stackless:", (time() - t)

map = []
for n in range(100000):
g = Group()
map.append(g)
t = time()
test_iterate(map)
print "iterative:", (time() - t)

main()

OvermindDL1

21-04-2006 17:18:05

For context switching I was refering to the hardware level switch done by the kernal.

Yes, stackless will be slower, but that is because it uses channels for messages, tasklet's can be yielded, passing messages, do examples as above, and not to mention you can serialize them to disk/network even when in the middle of running, which is just not possible in normal python.

BerndWill

22-04-2006 19:51:34

Hi,

@pkdawson: do you think it is possible to "emulate" stackless tasklet switching (reactiviatin objects through channel messages) by using observer patterns coded in "pure" python ?

Regards
Bernd

OvermindDL1

22-04-2006 21:58:15

Actually Python 2.5 (public beta is out) has much better suspending functions, you can get pretty close using it, but suspending function in 2.4 and lower, generators are about as near as you can get, not near as easy to use.

futnuh

08-08-2007 07:00:35

Anyone running stackless 2.5 able to run pkdawson's simple test code. Is there much improvement over the 2.3 results?