Python-Ogre has wrong-sized pointers on 64-bit Linux

virtual

25-06-2011 05:55:12

More details on my experience:

Due to the vagaries of memory allocation, things worked okay for me with a toy-sized example because small allocations resulted in addresses that fit in 32 bits, but a large allocation resulted in an address greater than 32 bits. (I understand that there might be big holes in a process's address space due to large reservations by the kernel, video card, libraries, etc., but the memory allocater on my machine with 3GB of RAM and a 512MB GeForce 8800GT chose an address of 0x7f4313fed010L, a value greater than 130 GB, which seems a little excessive. I allocated 555696 vertices of 36 bytes each.)

The specific error was:

OverflowError: bad numeric conversion: positive overflow

I had converted the ctypes pointer to an integer with a simple C function:



#include <stdint.h>



uint64_t pointer_to_integer(void *p)

{

   return (uint64_t) p;

}

I think there might be functions in ctypes and/or Python-Ogre that do the same thing, but I didn't use them. I'm already writing custom C code for parts of this project, so it wasn't too much trouble to add this function.

virtual

25-06-2011 06:35:23

Some thoughts on possible fixes:

Quick fix #1: Change the pointer type to 'unsigned long' in Python-Ogre (I think that's 32 bits on most 32-bit compilers and 64 bits on most 64-bit compilers, but I'm not 100% sure, might want to check that on 32-bit and 64-bit gcc and Microsoft compilers before you patch it into svn trunk).

Quick fix #2: Since PyOgre has a rather finicky many-step manual-intervention-intensive, buggy build process (I recall at least having to manually edit one script to tone down the number of resource-intensive g++ processes it launches) and it's been months since I did it, I really don't want to rebuild it if I don't have to. So I'm not going to change PyOgre. Instead, I'm going to essentially manually do what SWIG does automatically. That is, write a C++ function that does nothing but pass its parameters to writeData(), then export it with extern 'C' linkage and call it with ctypes. Wash, rinse, repeat for every function I need that takes a buggy pointer.

Real fix #1: IMHO a more 'proper' fix would be, when compiling Ogre and generating the wrapper, first compile a test program with the same compiler that will be used to compile Ogre. The test program prints out the sizeof() a pointer and various integer types. Then pick an integer type whose size matches the size of the pointer type.

Real fix #2: It seems that, in the dim and distant past, Python-Ogre may have used a dedicated type to wrap pointers. Following the suggestions of a Google groups thread, I tried using ogre.castAsVoidPtr (in the thread, it's called CastVoidPtr, but my version of Python-Ogre, a fairly recent svn, doesn't have a function of that name). But I got this:



Boost.Python.ArgumentError: Python argument types in

    HardwareVertexBuffer.writeData(HardwareVertexBuffer, int, int, void*, bool)

did not match C++ signature:

    writeData(Ogre::HardwareBuffer {lvalue} inst, unsigned long offset, unsigned long length, unsigned int pSource, bool discardWholeBuffer=False)

I think that it's cleaner to have different types for pointers and integers, with support for casting between. Number one, it's safer; if you accidentally use a pointer where you need an integer or vice versa it will fail immediately with an informative error message instead of blithely calling Ogre with an incorrect address so the C++ code writes stuff into an arbitrary locations and causes a much more problematic memory corruption bug. Number two, it matches the C model. There's a lot to be said for consistency.

But apparently someone, at some point between the 2007 thread and my last SVN checkout a couple months ago, made a conscious decision that 'integers-are-pointers' is better for Python-Ogre. I respectfully disagree, and I'd even go so far as to say that automatically building correctly on platforms with all pointer sizes might be a reason to switch back to this model, if you're willing to break backward compatibility, make both wrappers available, or add some kind of compatibility layer (e.g. Python code to check if the parameter is whatever Python type castAsVoidPtr() returns or a Python integer, and convert the one that's not used by the underlying code to the one that is). Bonus points if the compatibility layer accepts ctypes pointers as well, and the support for integers-as-pointers can be turned on or off by the application (so integer pointers can be used by apps that need them for backwards compatibility or whatever unfathomable reasons were provided for making the switch to integers-as-pointers in the first place, while people like me can switch the feature off to catch bugs or leftover integers-as-pointers usages if we're migrating our code base away from them).

BTW, here is the thread:

http://groups.google.com/group/python-ogre-developers/browse_thread/thread/398643b83c1cc56c?hl=en

Finally, I'd like to say that, just because the two quick workarounds exist, doesn't mean one of the two real fixes isn't necessary or appreciated.

virtual

08-07-2011 09:02:26

I ended up not using any of the workarounds I suggested in my previous post.

The way I made it work is by using lock() to get a void pointer and calling ogre.castAsUnsignedLong on the result to convert the pointer into a Python integer, then passing it to ctypes.memmove. Of course you have to unlock it, and I haven't yet looked into properly handling the exception the docs claim unlock() may throw in rare circumstances. I suspect I might be re-implementing writeData() in Python so maybe I should look at the Ogre source for that method...

The source data for the memmove is vertex and index data generated by a C library that I wrote and loaded with ctypes. The C library just computes the data and stores it in memory.

Of course, the bug still really needs to be fixed, even though I was able to successfully work around it:

1) For this particular API call I lucked out because the API supports a "back-door" approach that doesn't trigger the Python-Ogre bug (lock / unlock instead of writeData). There might be other Ogre API methods that take a pointer and don't have a corresponding "back-door" way to accomplish the same thing without using a pointer.

2) Even if writeData() is the *only* Ogre function that suffers from this bug (doubtful), and even if no current or future Ogre version or extension supported by Python-Ogre has, or will ever have, another function that takes a raw pointer (this is treading on truly fantastical ground), the bug should *still* be fixed, because writeData() itself is useful. Quoting from the docs:

Locking and unlocking a buffer can, in some rare circumstances such as switching video modes whilst the buffer is locked, corrupt the contents of a buffer. This is pretty rare, but if it occurs, this method will throw an exception, meaning you must re-upload the data.

Note that using the 'read' and 'write' forms of updating the buffer does not suffer from this problem, so if you want to be 100% sure your data will not be lost, use the 'read' and 'write' forms instead.

So the quickest, easiest code with lock() -- that is, using lock() and not bothering to write an exception handler -- will probably work in initial testing, but ultimately result in a rare bug that probably depends heavily on timing, hardware, and drivers. The correct code lock/unlock is longer and harder to test, and I might be somewhat in doubt about it until and unless I was able to observe correct behavior after the exception handler was triggered in actual usage.

But writeData() is a single function call that handles the corner case in the way you almost certainly want it to be handled.

To me, this says that writeData() is useful enough in its own right to justify squashing the bug, EVEN IF it's the only function that is, or ever will be, affected by it, and EVEN THOUGH the lock/unlock workaround exists.