[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[LONG] ModuLite: Scheme Module System proposal



You can also view this proposal at:
	http://www-swiss.ai.mit.edu/~jaffer/modulite_toc.html

			      -=-=-=-=-

		   ModuLite Scheme Module Proposal

* Table of Contents:

* Acknowledgments::
* Background::
* The Problem::
* Nomenclature::
* Creating Modules::
* Using Modules::
* Gilding::
* Anticipated Questions::

Acknowledgments
***************

  Many people contributed to the May 1994 discussion of modules and
integration through the Usenet group comp.lang.scheme and subsequent
e-mail (which form the basis for this idea).  Tommy Thorn sparked my
interest suggesting ways to reconcile "closing the world" with an
interpreted dynamic top-level environment.

  Matthias Blume pointed out the desirability of unnamed modules, showed
that the interpreted vs. compiled issue was also true of changes in the
order of files loaded, and elucidated the thorny area of macro exports.
He also wrote convincingly of the usefulness of loading more than one
version of a module.  The modules outlined here allow this.

  Paul Stodghill pointed out that first class environments can
ameliorate some of the problems I had with no-reserved-identifiers.
Mikael Pettersson suggested the analogy between identifier scope being
static and the `load' environment.  Michael Blair improved my
understanding of what a compiler can and can't know about identifier and
macro scopes (and integration) without declarations.

  Tom Lord, Marc Singer, and Miles Bader designed and implemented a
local environment module system for GUILE.  Marc Singer, in an
insightful aside, pointed out that modules are just scoping constructs,
after all.

  Per Bothner pointed out connections with other implementation's module
systems and helped refine the current proposal.

  The module system for GUILE is more powerful than the one presented
here.  It is capable of dynamically creating and extending binding
contours.  While reading through some old messages on this subject, I
realized that loading modules into a clean IEEE/R4RS-Scheme environment
simultaneously solved the "closing the world", no-reserved-identifiers,
and interpreted vs. compiled fealty requirements.

Background
**********

  SLIB is a portable scheme library meant to provide compatibiliy and
utility functions for all R4RS scheme implementations.  Since 1991,
more than sixty contributors have grown SLIB's source and documentation
to over one megabyte.  Experience with a large corpus of shared code
should make clear the goals and constraints a module system should
have.  From my experience with SLIB I conclude that a code "library"
should:

  1. Contain portable code for useful procedures;

  2. Organize into logical groups ("modules") those procedures likely to
     be used together;

  3. Allow coders to specify modules using mnemonic names which are
     independent of, and unaffected by, the conventions and constraints
     of file and operating systems;

  4. Document each module and its procedures;

  5. Allow modules to be used in any combination which makes sense;

  6. Have small runtime resource overhead;

  7. Consume runtime resources only for those modules in use;

  8. Not replicate sharable modules in multi-user implementations;

  Goal 3 is to create an abstraction barrier between module (feature)
names and library internal details.  This seems a natural goal; Yet, the
C-library does not satisfy goal 3.  Almost all of the discussion of
module systems for Scheme has also not addressed this issue.  Since
there has been little controversy over SLIB's feature-to-file
translation (and no interest in standardizing SLIB), I will take as
given that:

   * There is some (invisible to me) natural distinction between module
     systems and libraries;

   * RRRS-authors feel that modules are in the purview of the report,
     but libraries like SLIB aren't.

  Most of the module discussions deal with properties which are nice,
but not strictly necessary for library operation:

  9. In order to fully utilize a library, compilers for the language
     should resolve module references at compile time.

 10. Module code and data built by module code should be safe from
     mutation except by module procedures.

     One way to protect module data is to export only procedures and
     macros.  The opacity of procedures allows module procedures to
     control access to data.

 11. Only documented procedures and syntax should be accessible to
     module users.

 12. Modules should be a first class datatype.

 13. The module system should work with existing code.

The Problem
***********

  Let us look at the types of dependencies which arise in SLIB:

  `(require 'collect)' causes a macro package to be loaded (if not
already) in order to `MACRO:LOAD' `collect.scm'.

`macwork.scm'
     requires `common-list-functions'

`collect.scm'
     require `yasos' which causes `yasyn.scm' to be loaded.

`yasyn.scm'
     requires `object' and `format'.

`format.scm'
     requires `string-case', `string-port', `rev4-optional-procedures',
     and `pretty-print'

`pp.scm'
     requires `generic-write'

  We already have a conflict.  Both `collect.scm' and `comlist.scm'
(`common-list-functions') define `reduce' (differently).  A module
system should prevent such a conflict.  "comlist.scm" was loaded by the
macro expander.  Procedures involved in the macro expansion should not
be visible in the expanded code's name space.  This example shows that
modularity is not so much about sharing identifiers as it is about
*not* sharing identifiers.

Nomenclature
************

  A "binding contour" is a set of one or more variable bindings.  An
"environment" is an ordered set of binding contours.  If a variable is
bound by more than one binding contour in an environment, the variable
in the lower binding contour is visible from the environment and the
lower binding is said to "occlude" the higher bindings of that variable.

  There are 2 varieties of binding contours corresponding to the binding
constructs `let' and `letrec'.  A "module" is a `letrec' variety
binding contour.  An identifier bound in a module is said to be
"exported" by that module.

  A "scheme report environment" is an an environment that contains the
set of bindings specified in the `Revised^n Report on Scheme' that the
implentation supports, as well as the procedures and syntax specified
here.  An implementation is free to include any other bindings which do
not occlude report bindings.

  I am trying to create a module system which meets the stated goals and
can be implemented with little effort by diverse implementations,
hopefully by using their existing module facilities.

Creating Modules
****************

 - syntax: module <body>
 - procedure: make-module EXPRESSION
 - procedure: load-module FILENAME
     (1)

     <Body> should be a sequence of one or more definitions and
     expressions.  `Module' sequentially evaluates the expressions and
     definitions of <body> in a scheme report environment.  Definitions
     occuring at the top level cause bindings to be created in the
     binding contour which the call to `module' creates and returns.

     `Make-module' evaluates EXPRESSION in a scheme report environment.
     Definitions occuring at the top level cause bindings to be
     created in the binding contour which the call to `make-module'
     creates and returns.

     FILENAME should be a string naming an existing file containing
     Scheme source code. The `load-module' procedure reads expressions
     and definitions from the file and evaluates them sequentially in a
     scheme report environment.  Definitions occuring at the top level
     cause bindings to be created in the binding contour which the call
     to `load-module' creates and returns.

     The `load-module' procedure does not effect the values returned by
     `current-input-port' and `current-output-port'.

     *Rationale:* For portability, `load-module' must operate on source
     files.  Its operation on other kinds of files necessarily varies
     among implementations.

        * The expressions evaluated in these calls are evaluated in a
          fresh scheme report environment.  The current environment has
          no effect on the module.

        * If an (evaluated) expression executes a call to `load', any
          top level bindings made in the loaded code are exported
          (included in the binding countour returned).

        * If an expression executes a call to `import-module', any top
          level bindings imported by `import-module', although visible
          to the module code, are not exported.

        * Loading source for a new module into a fresh scheme report
          environment ensures that every procedure defined in the
          module is integrable unless source analysis reveals
          otherwise.   Source not found during lexical analysis will
          not effect the module's operation.  Hence, a compiler can
          "close the world" on every module.

        * Each module must import all the modules it needs.

        * Since identifiers bound to modules will not be visible while
          evaluating module source, the mechanism for translating
          module names to filenames should be part of the scheme report
          environment (although not specified in the report).

        * `Module', `make-module', and `load-module' can each be
          defined in terms of `Module', `make-module', or `load-module'.

  ---------- Footnotes ----------

  (1)  Proffering three ways to create modules (roughly analogous to
`lambda', `eval', and `load') should forestall argument over which
method is better.  Use your favorite and tolerate others' choices.

Using Modules
*************

 - procedure: import-module MODU
     MODU should be a module as returned by `module', `make-module', or
     `load-module'.  The `import-module' procedure adds the bindings
     from MODU to the top level environment.  If an identifier bound in
     MODU already has a top level binding, that binding is occluded.

 - syntax: with-module expr <body>
     EXPR is a expression which should evaluate to a module as returned
     by `module', `make-module', or `load-module'.  After evaluating
     EXPR, <body> is evaluated in the current environment extended by
     the bindings from that module, and the value of the last
     expression of <body> is returned.

Gilding
*******

  The module system outlined above should work well for SLIB.  Using
`load-module', most SLIB modules will work without modification.

  Correspondents have expressed concern over a variety of issues.

   * Making modules a disjoint datatype involves only writing the usual
     boilerplate about `module?'.

   * `define-internal' is not strictly necessary and can be emulated
     with convoluted coding.  But it is convenient and directly
     satisfies one of the capabilities people expect from a module
     system.  `define-syntax-internal' is not necessary because
     `letrec-syntax' will suffice.

      - syntax: define-internal
          A `define-internal' behaves identically to top level `define'
          except that an identifier defined with `define-internal' will
          not be exported.

   * Per Bothner has pointed out that this proposal is quite different
     from other module proposals.  He suggests it would be less
     wrenching for users if a declarative `export' syntax were
     supported.

      - syntax: export IDENT1 IDENT2 ...
          IDENT1, IDENT2, ... should be identifiers.  If an `export'
          form occurs at top level in code for a module (argument to
          `module', `make-module', or `load-module'), then the returned
          binding contour is modified as follows: Only IDENT1, IDENT2,
          ... are exported.

          It is an error if a module has more than one `export'
          statement.  It is an error if an identifier occurs more than
          once in an `export' statement.  It is an error if IDENT1,
          IDENT2, ... are not all defined at top level in the code for
          the module containing the `export' statement.

   * At least one Scheme compiler gives only negligible speed
     improvement over interpretation when "integrable" declarations are
     lacking.  I believe all Scheme compilers could compile the type of
     modules discussed above with good benefit because of the
     INTEGRABLE guarantees.

     Given that compilers could produce reasonable code without platform
     specific declarations, and given that the extent of the source
     (which code gets compiled) is well delineated by the module
     boundary, it seems reasonable to also propose a uniform compiler
     interface.

      - procedure: compile-module-file FILENAME
          `compile-module-file' compiles the module specified by its
          string FILENAME argument and returns a value suitable for
          passing as an argument to `import-compiled-module', or `#f'
          if the compilation was not successful.

          If the value returned by `compile-module-file' is a string,
          then it can name a file produced by `compile-module-file',
          which can be passed to `import-compiled-module' in the future
          without re-compilation.

      - procedure: import-compiled-module COMPILED-MODULE
          Behaves as `import-module' does, but for a compiled module.

Anticipated Questions
*********************

Q
     How will SLIB use these modules?

A
     I envision that SLIB's `require' will call `load-module' with the
     appropriate file if that file has not already been `load-module'd;
     then invoke `import-module'.  Since (in most situations) we don't
     want to create more than one module from a given source file,
     `require' will cache library modules.

     For each of SLIB's `macro:LOAD' procedures there will be a
     corresponding `macro:LOAD-MODULE'.

Q
     How can I debug my module if I can't redefine identifiers in a
     module?  Do I have to reload my whole program every time I trace a
     different procedure?

A
     The only difference between creating a module and loading a Scheme
     file as you do now is the single call to `load-module' instead of
     `load'; all of the subsidiary files included in the module are
     brought in using `load'.  Therefore, `load' rather than
     `load-module' the module you are debugging.

     Scheme code currently must avoid naming conflicts, so doing this
     with current code would not introduce new name collisions; Having
     more than one module loaded (rather than `load-moduled') in the
     future will potentially introduce collisions.

Q
     If I can't export data types such as vectors, how can I access a
     table from a module?  If I can't set an imported identifier, how
     can I change a shared resource?

A
     Modifying the values or structure of exported identifiers directly
     might corrupt the module for another user, so it is an error to do
     so.  The module author can provide a shared resource to importers
     by exporting identifiers bound to procedures which access and
     modify the shared resource.

Q
     Why do you use files as the organizing structure for modules?  What
     if I want to have several modules defined in one file?  What if I
     want to define modules from an interactive session?

A
     Files are already the organizing structure for `load'.
     `make-module' and `module' allow you to define any number of
     modules in a single file (or multiple files).

     If you want to interactively extend and modify modules from an
     interactive session, you will need a more powerful system.  I
     believe the GUILE module system is capable of this.  My reasons for
     not incorporating that capability in this Scheme module system are:

        * it is complicated;

        * it is more than is necessary; and

        * the analogous capability, that of being able to extend and
          modify internal definitions of a closure, is not mandated by
          R4RS.



-- 
			     -=-=-=-=-=-
I am a guest and *not* a member of the MIT Artificial Intelligence Lab.
      My actions and comments do not reflect in any way on MIT.