Guile Mailing List Archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SCM_ASSIGN( Re: gc notes available )



> > If fact, if we go for SCM_ASSIGN(loc, val) then
> > we can throw out stack scanning then and there because the C user
> > is implicitly announcing the existence of a stack variable as soon
> > as he/she assigns anything to it.
> 
> And if the stack shrinks and grows during that period? Then you get
> much futzing about to figure out what does and doesn't require
> unprotecting, and so on. It is a hassle. 

OK, to properly replace stack scanning the user would have to announce
when they were finished with the SCM value, which is a major annoyance.
Otherwise the same stack memory might be used for something else by the
time gc comes around. Again, targetting the C++ user is easier than the
C user because the compiler knows when the local scope is complete --
dunno about the situation with longjmp() and C++ compilers, my guess
is that the destructors are not called (look at the way g++ does exception
handling, it's a nightmare).

> This is just proposing that users to change x = y to SCM_ASSIGN(x,
> y). You could train a text editor to do this. 

Reading everything in capitals gets tiring.

> Even better would be to require that a user either use an abstracted
> interface in the places where they would currently be directly
> modifying things (better for more reasons than making the write
> barrier work, IMO)

Well a suitable abstract layer works as its own documentation system
(to some extent), presuming you can build in ways to make the compiler
warn about misuse of the interface functions.

> This isn't what I was talking about. If you are messing with chunks of
> scheme values in such a way, it can be assumed that you'll have an
> idea of what you need to do to keep the gc sane. This could involve
> something like scm_memmove_protected, or just a function to let the gc
> know that it's view of that particular piece of the world is a bit
> screwed up. The gc at this point already knows what it considers
> important to that block, but if you move it behind it's back, it can't
> be expected to keep up.

I'm happy enough with the idea of some scheme functions that replace
memmove, etc -- that shouldn't be too rough. If you are sure that the
memory contains no SCM values then use the normal memmove, otherwise
use the special one. A little bit intrusive but livable.

> There are lots of ways you can abuse c, but I don't see any
> need to accomidate them.

That's what Nick Wirth said when he made Pascal.
Thankfully, the world got over it.

> The assurance it should be giving is that, if the original smob dies,
> previous returns from copy(original) won't be affected. As long as
> that holds, the smob can do whatever it wants to oblige.

OK, that sounds clear enough, just wanted to see it in writing.

> > Yes, ideally it would be nice to move things around to repack memory.
> > That would give scheme one solid advantage over just about every compiled
> > languages and would give massive improvement to the generational GC
> > performance. It would also be an incredible number of pointers to track.
> 
> I'm not big on copying, period.

Try using memcpy() and memmove(), enjoy the difference :-)

My B-trees use blocks of items so that a small tree may actually
fit inside a single block and not be a tree at all. Inserting within
a block is a copying operation more like an array than a tree. The
block size is a compile time constant. Thus, trees made from big blocks
do more array operations and a few big mallocs, trees made from small
blocks do more pointer operations and many small mallocs.

I tried some speed tests on a Cyrix processor and found that a block
size of 100 items is considerably better than small blocks of 5 or 10
items. Performance stays pretty flat from 100 to 300 and is getting
notably worse around 500 items per block (but 500 is still better than 5).

The implication of this is that you can do quite a lot of copying
and still be faster than tracing through pointer lists. Remember that
most of the big memory usage is inside SMOBs but each SMOB still has
a cons cell that acts as scheme's handle to that SMOB. Only the cons cell
needs shuffling, the main bulk of SMOB memory can sit where it likes.

> > I'm running B-trees containing approx 10000 SCM data values which are
> > a mix of symbols, integers and floats (and the odd string and SMOB).
> > I'm hoping to push this further by a factor of at least 10 before I could
> > consider the system getting into the workable region.

I made it to a bit past 20000 data values yesterday, some of which are
themselves large matricies containing several thousand elements --
total disc file length of 15M, consuming 30M when loaded into core.
Unfortunately it takes over a minute to load it all up, but once in
memory, operations are quite quick. Even GC is less than a second.
The matricies don't store their contents as SCM values so from the GC
point of view each matrix is one item.

> > As I said above, if we really do go with SCM_ASSIGN() then we have
> > alrady declared every live stack value so we can even move objects that are
> > pointed to from the stack (dangerous but fun!)
> 
> Not really, we wouldn't know if they were dead or alive, even if we
> were treating stack values with something like SCM_ASSIGN, which isn't
> too likely.

Yup, you are right, items pointed to by the stack can't be moved,
there is just no way to know when the stack is no longer using the item.

> The user almost certainly won't be able to add a hook to the fault
> handling, partly because that would be a bit of a mess (yeah, it might
> be already, but it's not a mess that gives me the willies), and partly
> because there probably isn't a sensible way that they can handle the
> guts of the write barrier if it's possible to use one or the other.

I agree that user fault handling is a real mess. I was kinda thinking about
the idea of using memmap to save my database directly in tree form.
Then I could access database files directly rather than have the time
taken to load and save. I suspect that if I did that then I couldn't
use SCM values in the tree anymore because they would be all different
next time the database was loaded, so I haven't considered it too hard.
Anyhow, I just might need access to memory protection in order to
implement such a device (it might be better off in it's own process space
since copying each item would probably be inevitable).

> Actually, the point is that smobs actually know what smobs are doing,
> and whether any particular bit of memory is (or can be) a scheme
> value.  I'm making a (possibly too simplistic, there are probably
> places where they overlap) distinction between end-user code, and smob
> code. I think that smob code can be considered enough of a
> modification to the core of guile that knowing a bit about how the gc
> wants things to work isn't an awful lot to ask.

A lot of users want guile as an extension language and really do
need SMOBs. Expect users to mostly write SMOBs. Knowing a bit about
gc is OK provided there are fill-in-the-blanks style examples
for people to follow. Most people don't get too shirty about
using an abstraction layer provided it's not too cumbersome and
restrictive. Some people even prefer the feeling of safety from
doing things through an abstraction layer.

	- Tel

Guile Home | Main Index | Thread Index