shortly about different ML implementations

>> i hears it used in practice somewhere… but alice definetely needs
>> updating to modern realities and closing bugs.
>> yes, could be worse… and, yes, it’s interesting environment.
> So what implementation of SML would you recommend?

as others already sayed, it depends on your needs.
i’m not expert … but can remember some on all ML versions
(not CAML, it has somewhere different target audience).

hmm, seems i know many… seven!… implementations, besides alice…
looks like with ML you definetely NOT vendor-locked, like with gnat or
evergrowing ghc…

0) so, alice… it’s bytecoded implementation and purposely always will be such (one of project targets is code migration). so it somewhere slow (but has jit) and more or less portable. as i remember, it have some problems with OSes which «not linux» (like xBSD). base sml language in it is greatly extended with parallel and distributed features, with constraint programming (sometimes it very useful), etc. all of that features is not so well-founded and narrow as in smlsharp case.
this implementation is general and «powerful», of course. but you not always need power, sometimes predictability and reliability is preferred. including one solution you deny other solutions. formally verified standards (Standard ML 97) is good… i hear rumors — alice developer now worked in google and making v8 javascript… meanwhile, javascript is bad language, it can’t grow.

1) smlnj. the so named «standard implementation of ML». i use it some, it’s average. it oriented to dynamic, explorating, interactive use. memory images can be stored to file and runned on other computer with same runtime version (or can be packed with runtime forming independent, somewhere big, executable). but smlnj is not for binaries. it’s big, elaborated, interactive environment for small-to-medium programs.
memory consumption is moderate. generates average native code in-memory (MLRISC backend) and so has slower compiling time than mosml, and also has somewhere bigger memory consumption, but faster resulting run times (on par with mlkit). smlnj is not intended for really big heaps.

2) mlton. optimizing compiler. i say it «optimising whole picture, not small peepholes», optimising-at-large. it do less optimisations-in-small than gcc, i think, but apply optimisations-in-big more thoroughly (some peepholes rests unoptimized, sse not used etc). supports parallelism (threads), as is polyml, but in somewhere different way. resulting binaries is self-contained and relatively small (what not used is not included).
…one BIG problem with it — it’s a real memory and processor hog, and always will be. mlton can easy eat gigabytes of memory because of «whole-programness» (that mean including all used libraries) and multiple passes of optimisation transformations in memory. and because of that, it sometimes just can’t be used on old desktops or netbooks. i recommend you setup zswap (packed swapfile) in your linux box for mlton.
peoples recommends to use mlton for compiling release binaries only. some game developers or crossdevelopers use it as such wunderwaffe. but there is one possible trap in «use for final release only». «whole-program»-ness somewhere changes semantic for some SML constructs (like scope for definitions). this can be good (you get efficient functors for free!) or bad. it depends… be warned.

3) polyml. compact interactive system. two files, easily distributed with application. has parallelism. i think internally looks like smlnj (generate native code in-memory), but codegen is smaller and simpler. moderate sequential performance, but people say it has good parallel scalability. it’s main application is isabelle theorem prover. i do not know if polyml GC is parallelized now (that somewhere limited speedup on 4-6 in applications other than symbolic computations with small shared state), but it is actively supported.

UPD: polyml gc is parallelized in last versions so speedup is not so limited anymore. with parallelized gc ML possibly will be faster than haskell. haskell always will have bigger number of sequential parts in runtime (because of it’s lazy-by-default semantic). and so haskell’s speedup will always be more limited than ML’s.
Meanwhile, polyml (in line with mlton and smlsharp), is one of three more or less actively developed ML implementations.

4) mlkit. simple native batch compiler (MLRISC-based codegenerator, as in smlnj). backend optimiser in this compiler is simple and gives moderate quality. sure it not so good in optimisations as mlton, but it more predictable. and has smaller runtime. really great things in this compiler is:
a) region-based memory allocation (compiler optimise some or all work for gc, depending on program, which make it possible to implement realtime and system programs).
i think regions will be very useful if someone tries to implement barrier-based parallel ML in style of JoCaml. possible speedup will be better, i think.
b) separate module compilation! this is really good thing for incremental big project development by several programmers. memory consumption is also much smaller than mlton’s.
try to use it in the middle of development, before releasing with mlton.
…it’s bad, bad that this compiler somewhere forgetted and leaved in shadows ((

UPD: forgive me for some misinformation. MlKit has it’s own, simpler, backend, not MLRISC.
too bad… it supports i386 and hp-pa native targets only. and KAM bytecode generation.maybe somebody wants to bolt MLRISC to mlkit? with some tweaking of runtime this gives to mlkit posibility of running on amd64…

5) moscow ml. simple bytecode interpreter on old caml (wich was before ocaml) runtime. fast compiles, slow runtimes (bytecode, without jit). compact. have separate module compilation!
as i hear, have some problems with float numbers (because tagging?). some admins use it as scripting language and for building small web interfaces. in the past, some peoples use it for theorem provers.
small memory consumption is main goodie of this implementation.

6) hamlet. interpreter of full ml written in ml. useful in teaching and experimenting with language, can be in one big sml file. slow, of course. compile it with mlton.

7) smlsharp. interesting ml native-code (x86 only… that’s bad) compiler. roots of project grows from mlkit as i think (it not related with dotNet and C-sharp) but i not sure.
It’s characteristics is like of mlkit. have good c interoperability. Somewhere polished in fresh, release version. Developers say it has parallelized gc now.
It’s ml language is extended with some enterprise-y constructions like records and such. and, unlike alice, this extensions was _formally verified_ (meanwhile, i personally do not like oop and stupid inheritance, but managers like it). Developed by japan government-founded project, they tries to get language, useful for normal commercial development and specially targeted at low maintenance costs (like Ada was). i ask developers if it will have jvm codegen in future (for more «enterpriseness» which it targets). they reply — «maybe». interesting project, i will look with one eye at it future…

resulting runtime performance grade from fast to slow:
1) mlton (on par with c or even faster)
2) mlkit, smlsharp, smlnj (acceptable, 1.5-2 times from c)
3) polyml (more or less acceptable, 2-3 times from c, but i also hear rumors that _in symbolic computations_ and with big heaps polyml sometimes is very good).
4) alice (i think, somewhere between poly and mosml)
5) mosml (5-10 from c, it’s normal for bytecode)
6) hamlet (slow, you bet)


Добавить комментарий

Заполните поля или щелкните по значку, чтобы оставить свой комментарий:


Для комментария используется ваша учётная запись Выход / Изменить )

Фотография Twitter

Для комментария используется ваша учётная запись Twitter. Выход / Изменить )

Фотография Facebook

Для комментария используется ваша учётная запись Facebook. Выход / Изменить )

Google+ photo

Для комментария используется ваша учётная запись Google+. Выход / Изменить )

Connecting to %s