TurtleWare

Tagged as lisp

Written on 2026-05-27 by Daniel Kochmański

Common Lisp is renowned for its excellent object system CLOS. Its implementation is often accompanied by the Metaobject Protocol that, while it is not part of the standard, allows programmers to customize the system underpinnings in numerous interesting ways. This level of customization doesn't come without a cost – some CLOS code paths will be slower compared to open-coding equivalent solutions without the use of standard objects.

The purpose of this blog post is to draw an intuition of differences between structure objects and standard objects when it comes to accessing their slots. From now on I'm going to refer to structure objects as structures, and standard objects as instances.

We could imagine a structure is represented in memory as a tuple (CLASS SLOTS), while an instance is represented as a tuple (CLASS STAMP SLOTS). Modifying the structure class has undefined behavior, while the instance's class may change. This is why the instance needs to track whether it is up-to-date or obsolete. In our simple scheme that information is represented by a stamp that represents the class generation.

Tracking whether the instance is obsolete is important, because the memory layout of slots may change - they may be deleted, added, or moved to different positions. This is convenient for long-running programs without downtime, for incremental development and for image-based workflows - the program may be modified at any time to account for changing requirements, without recompiling it from scratch.

But this doesn't come without a downside. The implementation may conformingly assume that structure accessors won't ever change and therefore they can be inlined. In this case, structure access is a simple memory reference.

(declaim (inline structure-reader-a))
(defun structure-reader-a (object)
  (svref (%slots object) 3))

On the other hand, this can't be assumed for objects, as they must be checked for obsolescence (at the very least), and because readers are more generic functions - another level of flexibility. Inlining generic functions is hard because new methods may be added at runtime and the effective method can change. Moreover, there may be different classes that have same reader names, so we need to include a piece of code that uses the correct class layout for an instance.

This is why calling instance readers involves:

calling a function (can't be inlined)
finding the memory layout (dispatch)
verifying whether the instance is up-to-date

That is exemplified by the following pseudocode that ignores other generic function intrinsics. Depending on the implementation of generic functions, the test for obsolete instances may be evaded when instances are not obsolete.

(declaim (notinline instance-reader-a))
(define-reader-function instance-reader-a (object)
  (unless (%up-to-date-p object)
    ;; Among other things updates indexes for memory accesses. 
    ;; This is a slow path.
    (%recompile-reader-function #'instance-reader-a)
    (return-from instance-reader-a (instance-reader-a object)))
  (typecase object
    (standard-class-a (svref (%slots object) 3))
    (standard-class-b (svref (%slots object) 4))
    (custom-class-c (slot-value object 'a))
    (custom-class-d (slot-value object 'a))
    (otherwise (no-applicable-method #'instance-reader-a object))))

All this is assuming that we're dealing with standard readers. Using the metaobject protocol it is possible to store slot values anywhere - most notably, not in a vector bundled with the instance - or to add additional preprocessing. I'm not going to touch on MOP much here; this is just to signify that standard readers for standard classes may directly access the slot vector.

At minimum, assuming a single reader and a clever dispatch algorithm:

(declaim (notinline instance-reader-a))
(define-reader-function instance-reader-a (object)
  (if (eql (stamp object) 42)
      (svref (%slots object) 3)
      (if (%up-to-date-p object)
          (no-applicable-method #'instance-reader-a object)
          (progn
            (%recompile-reader-function #'instance-reader-a)
            (return-from instance-reader-a (instance-reader-a object))))))

In other words, comparing structure access with instance readers is comparing apples to oranges, because the former is a memory access, while the latter is a function call.

SLOT-VALUE will be even slower, because this function is a trampoline to a more involved SLOT-VALUE-USING-CLASS, and to do that we need to:

read the object class
find the slot definition in the class
invoke a generic function SLOT-VALUE-USING-CLASS

The generic function SLOT-VALUE-USING-CLASS may be similar to the reader defined above, with the caveat that it has more arguments to dispatch on (so the dispatch procedure may be more involved). In any case, it is at least as slow as the optimal reader defined above (a single reader for the standard class).

(defun slot-value (object slot-name)
  (let* ((class (class-of object))
         (slots (mop:class-slots class))
         (slot (find slot-name slots :key #'mop:slot-definition-name)))
    (mop:slot-value-using-class class object slot)))

Tim Bradshaw recently made a blog post that claims that instance slot access is around 38x slower than structure access, but he compares inlined memory access to generic function dispatch. A fair comparison would use the operator STANDARD-INSTANCE-ACCESS.

The metaobject protocol defines MOP:STANDARD-INSTANCE-ACCESS, an optimized way to access instance slots that does not incur the overhead associated with dispatching generic functions. This function may be inlined and is similar to structure object accessors. A possible definition would look like this:

(declare (inline mop:standard-instance-access))
(defun mop:standard-instance-access (object location)
  (svref (%slots object) location))

The argument LOCATION is technically an opaque object, but for illustration purposes we assume that it is an index (it usually is!). Its value may be read using the function SLOT-DEFINITION-LOCATION.

Let's dig into benchmarks! We will measure access time to slots in equivalent structure and instance, each containing ten untyped slots initialized with fixnums.

(defpackage "FAR-FROM-MOP"
  (:import-from #+ccl "CCL"
                #+ecl "MOP"
                #+lispworks "CLOS"
                #+sbcl "SB-MOP"
                #-(or ccl ecl lispworks sbcl) "MOP"
                "FINALIZE-INHERITANCE"
                "CLASS-SLOTS"
                "SLOT-DEFINITION-LOCATION"
                "SLOT-DEFINITION-NAME"
                "STANDARD-INSTANCE-ACCESS"
                #+lispworks "FAST-STANDARD-INSTANCE-ACCESS")
  (:export "FINALIZE-INHERITANCE" "CLASS-SLOTS" "SLOT-DEFINITION-LOCATION"
           "SLOT-DEFINITION-NAME" "STANDARD-INSTANCE-ACCESS"
           #+lispworks "FAST-STANDARD-INSTANCE-ACCESS"))

(defpackage "EU.TURTLEWARE.SLOT-BENCH"
  (:use "CL")
  (:local-nicknames ("MOP" "FAR-FROM-MOP")))
(in-package "EU.TURTLEWARE.SLOT-BENCH")

(declaim (optimize (speed 3) (safety 0) (debug 0)))

(eval-when (:compile-toplevel :load-toplevel :execute)
  (defclass a ()
    ((a :initform (random 10) :reader a-a)
     (b :initform (random 10) :reader a-b)
     (c :initform (random 10) :reader a-c)
     (d :initform (random 10) :reader a-d)
     (e :initform (random 10) :reader a-e)
     (f :initform (random 10) :reader a-f)
     (g :initform (random 10) :reader a-g)
     (h :initform (random 10) :reader a-h)
     (i :initform (random 10) :reader a-i)
     (j :initform (random 10) :reader a-j)))

  (defstruct b
    (a (random 10)) (b (random 10)) (c (random 10)) (d (random 10)) (e (random 10))
    (f (random 10)) (g (random 10)) (h (random 10)) (i (random 10)) (j (random 10)))

  (defparameter *o1* (make-instance 'a))
  (defparameter *o2* (make-b))


  (defparameter *locations*
    (mapcar (lambda (slot-name)
              (let ((class (find-class 'a)))
                (mop:finalize-inheritance class)
                (mop:slot-definition-location
                 (find slot-name (mop:class-slots class)
                       :key #'mop:slot-definition-name))))
            '(a b c d e f g h i j))))

We will measure four slot reading patterns:

structure: structure reader
instance : reader, SLOT-VALUE and MOP:STANDARD-INSTANCE-ACCESS

Moreover, to put some pressure on a hypothesized method cache, we will randomize access to slots. The macro expand-body generates consecutive access forms:

(defmacro expand-body (type n-access)
  (flet ((random-a () (nth (random 10) '(a-a a-b a-c a-d a-e a-f a-g a-h a-i a-j)))
         (random-b () (nth (random 10) '(b-a b-b b-c b-d b-e b-f b-g b-h b-i b-j)))
         (random-s () (nth (random 10) '(a b c d e f g h i j)))
         (random-l () (nth (random 10) *locations*)))
    (ecase type
      (:reader
       `(progn
          ,@(loop repeat n-access
                  for read = `(,(random-a) object)
                  collect `(incf count (the fixnum ,read)))))
      (:slot-value
       `(progn
          ,@(loop repeat n-access
                  for read = `(slot-value object ',(random-s))
                  collect `(incf count (the fixnum ,read)))))
      (:instance-access
       `(progn
          ,@(loop repeat n-access
                  for read = #+lispworks `(mop:fast-standard-instance-access object ',(random-l))
                             #-lispworks `(mop:standard-instance-access object ',(random-l))
                  collect `(incf count (the fixnum ,read)))))
      (:structure-access
       `(progn
          ,@(loop repeat n-access
                  for read = `(,(random-b) object)
                  collect `(incf count (the fixnum ,read))))))))

Now our "benchmark tool" and the tests. It is a simple measurement that compares internal real times before and after the computation.

(defmacro do-bench (() &body body)
  `(let ((now (get-internal-real-time))
         (cnt (progn ,@body)))
     (values (- (get-internal-real-time) now) cnt)))

(macrolet ((frob (name object access-type)
             `(defun ,name (n &aux (object ,object))
                (declare (fixnum n)
                         (optimize (speed 3) (safety 0) (debug 0)))
                (do-bench ()
                  (let ((count 0))
                    (declare (fixnum count))
                    (dotimes (v n count)
                      (expand-body ,access-type 100)))))))
  (frob test-object-v1 *o1* :reader)
  (frob test-object-v2 *o1* :slot-value)
  (frob test-object-v3 *o1* :instance-access)
  (frob test-object-v4 *o2* :structure-access))

(defun test-batch (n)
  (list (test-object-v1 n)
        (test-object-v2 n)
        (test-object-v3 n)
        (test-object-v4 n)))

(defun do-benchmarks ()
  (list* (list (lisp-implementation-type)
               (lisp-implementation-version)
               (machine-type)
               internal-time-units-per-second)
         (loop for e from 17 upto 26
               for n = (expt 2 e)
               collect (let (b)
                         (format t "... (expt 2 ~a):~%" e)
                         (setf b (test-batch n))
                         (format t "~a~%" b)
                         b))))

I've run these tests on four implementations. This table presents ratios of the access pattern compared to the best result. Absolute timings are not included.

Implementation	reader / best	svalue / best	access / best	struct / best
CCL 1.12.2	17	12	2	1
ECL 26.5.5	616	719	1	175
LispWorks 8.1.2	22	79	1	1
SBCL 2.4.2	10	9	1	1

Edit: I've been asked a few times for a comparison between implementations, so I'm also including a bar chart comparing absolute timings between them:

Y-axis is in seconds and each bar represents 2^26 x 100 slot accesses in randomized order.

Conclusions:

Accessing slots using generic functions is indeed slower than a single memory access. This is because we can't inline these functions, and we must take care of many possibilities - most notably dispatching arguments of different classes and redefinitions of both the instance class and the reader generic function. All this cost buys us extensibility and runtime flexibility of the program.

Readers, under certain circumstances, can be better optimized than SLOT-VALUE, because they don't have to go through another function and access class slot definition. CCL and SBCL don't exploit this optimization opportunity.

Instance memory access and structure memory access times are roughly the same on SBCL and LispWorks, while instance access is two times slower on CCL.

ECL does a peculiar thing where structure readers are not inlined for some reason. That needs investigating, but hey, instance access is 175x faster ;-)! Instance access is also abnormally fast compared to other imlpementations and that also begs for investigation.

Notes:

To avoid external dependencies, I've defined a very basic time measurement and used MOP operators directly defined by a few hand-picked implementations. For more complete solutions look into "trivial-benchmark" by Yukari Hafner and "closer-mop" by Pascal Costanza.

Lispworks' CLOS::STANDARD-INSTANCE-ACCESS does not conform to MOP specification and errors when supplied with the slot location (it expects the slot name). That severely impacts the performance of instance access. The correct function to call is, for some reason, CLOS::FAST-STANDARD-INSTANCE-ACCESS.

ECL performance is poor in comparison, but I have good news! I'm implementing Fast Generic Function Dispatch algorithm and it will get better.

Somewhat a point of interest, but some implementations specialize slot-value-using-class and other CLOS protocols to structure classes too.

Plots were generated with Polyclot, work-in-progress McCLIM implementation of Grammar for Graphics.

I'd like to thank modula t. for reviewing this post and suggesting improvements.

A brief note about slot access cost in Common Lisp