New language for AI?
The following is the list of language requirements I thought an AI would minimally need.
- Object Oriented.
- All Classes, Methods, Properties, Objects can be created and changed easily by the program itself.
- Efficiently run many programs at the same time.
- Fully extensible.
- Built-in IDE (Interactive Development Environment) that allows programs to be running concurrently with program edit, compile and run.
- As simple as possible.
- Quick programming turnaround time.
- Fast where most of the processing is done.
- Simple hierarchy of Classes and of Objects.
- Simple Class inheritance.
- Simple external file architecture.
- Finest possible edit and compile without any linking of object modules.
- Scalable to relatively large size.
- Built in SQL, indexes, tables, lists, stacks and queues.
- Efficient vector handling of all data types.
- Internet interface.
- Runs on Windows PC machines.
- Can run as multiple separate systems on the same computer or in the background
Reasons for the above list of features:
- Object Oriented.
- The OOPS model of programming combines the 2 areas that programmers deal
with all the time which is code and data.
- Keeping the data and code together to minimize errors and modularize the
design is a good idea.
- Many programmers did this even before the OOPS model was popularized.
- Some parts of the OOPS model are overdone and contribute to the difficulty
many programmers have with OOPS languages. (eg: protected and private properties
and methods, overloaded methods, etc)
- The object model has many analogies to real life and therefore makes the
model fit many different problems nicely.
- All Classes, Methods, Properties, Objects can be created and changed easily
by the program itself.
- Having the ability to change any data and code in the system seems like
an absolute requirement if the AI is ever to experiment or grow on it’s
own.
- This capability is rarely allowed in any programming language. (LISP and
I think PROLOG are exceptions.)
- Efficiently run many programs at the same time.
- If I want to run 100 programs at the same time, what kind of performance
will I get from using operating system threads?
- What kind of overhead will be incurred trying to prevent all these programs
from using the same code at the same time?
- What synchronization overhead will this entail?
- Needless to say, nobody tries to run 100 threads on today’s PC regardless
of how much memory or gigahertz they have.
- I have developed a way of running 100 or more programs at the same time
with minimal overhead (6%) and because any internal code can only be executed
by one program at a time, I eliminate the overhead of locking and synchronizing
all the program code. (Objects can still be locked but this is only when
required rather than all the time.)
- I tried using a MUTEX to eliminate conflict for my socket routines and
when I implemented it in my program loop, the program run up to 10x slower.
- This was just the result of looking after the operating system MUTEX
lock and not the enormous overhead of 100's or 1000's of threads.
- I have recently tested my system with 2,500 simultaneous programs running
without problems.
- Fully extensible.
- Many languages like C++ are old languages with an OOPS added on. Eg: The
data definition of a class is defined in a “.h “ source file but the actual
methods of those classes are defined in a separate “.c” source file. (So
much for putting the code and data together in the same place!)
- Many languages can have programmer written routines that look and act
like the ones that are built in but most don’t allow the built-in routines
to be overwritten and enhanced by the independent programmer.
- Built-in IDE (Interactive Development Environment)
- IDE that allows programs to be run concurrently with program edit, compile
and run.
- I know of no language where you can sit in the IDE and program (edit,
compile and run) while the engine runs other code at the same time.
- A normal programming environment sits in a disk file and waits for the
program to run.
- A good IDE might allow interactive editing and some debugging but it still
must build the whole system for each change made. (A make file and linking
makes this less painful but this assertion still holds.)
- I want a system that gets programmed from the inside so that I can interact
directly with the objects and data structures while the program runs.
- Would a person like to be killed (even if temporarily) while they learn
something new? Would the AI? (My analogy is to stopping the program every
time you make a change to the source code.)
- As simple as possible.
- Making a complex program is quite easy. Making a program that solves complex
problems in a simple way is sublime.
- In all the programming I have done in C, I found an acceptable way to
indent my code and I have used that exact method fastidiously ever since.
- If one way of programming works, and other ways of doing the exact same
thing are provided, then unnecessary complexity has been created. (C requires
a “;” at the end of each line because it allows multiple lines to be treated
as if they were one line even though this is rarely ever used. Think of
the number of unnecessary keystrokes from around the world that have occurred
because of this “feature”.)
- Most modern programming languages provide many methods of doing the same
thing where each method isn’t better or worse than the other. They are just
different!
- I believe that programming an AI is complex and difficult enough without
adding complexity that isn’t absolutely necessary.
- Quick programming turnaround time.
- Today’s PCs are fast enough to provide the traditional edit, compile,
link and run method of programming with speed it didn’t use to have.
- However, it is still a cumbersome and slow method of creating very large
programming systems.
- In the C++ IDE that I am using to create this language, I have 23 (for
now) different source files. As these increase in number and size, it will
be more and more difficult to know in what modules what routines are located.
That doesn’t mean they aren’t organized and that I don’t have a global search
capability but sometimes you want something that you don’t exactly know
the name of. Or you get search hits from many source files and you have
to look through them all to find which one you are looking for.
- These source files are organized into groups of methods not necessarily
by class. (I have programmed only the IDE in C++ and the rest of the program
in C. This was done to facilitate migrating the program to other operating
systems in the future and because I dislike having the class definition
in the “.h” file and the methods in the “.c” file.)
- Fast where most of the processing is done.
- The decision to make an interpreter/compiler made the issue of execution
speed an issue.
- A fully compiled optimized language has the fastest execution times.
- However, it has many bad side effects.
- I decided to avoid those side effects, increase my flexibility and minimize
the performance penalty inherent in interpreter designs.
- If the major work of a program is in doing things like indexing, searches
etc, then I have optimized those routines in C for execution times no slower
than fully compiled C.
- I have built the language so that if some required language feature is
used often, then it can be coded into the language (binary compiled) without
any code or compatibility problems. (These kind of changes would be only
for essential language elements that are used all the time.)
- Most code runs instantaneously if compiled or instantaneously if intercepted.
Only a small amount of code is run often enough for it to matter how fast
it runs and for these codes, I have implemented them in my compiled C or
found other ways to make them fast.
- Simple hierarchy of Classes and of Objects.
- Most OOPS languages don’t provide for a hierarchy of classes. (This is
not the same as inheritance.)
- I wanted this feature so that I could keep even very large projects organized
and at the same time keep the names of classes and objects short.
- I think using large names like “MyMostFavoriteMethod” for method names
wastes a lot of programming time and detracts from the readability.
- I saw COBOL for the first time in 1975 and one of it’s “features” was
very long variable names. (No one ever paid me enough money to program in
such a bad language.)
- Simple Class inheritance.
- The biggest problem I have with accessing inherited properties and methods
is that it isn’t obvious (in the code) that you are accessing an inherited
property/method.
- In some languages (Powerbuilder) all propagation of changes to parent
classes is not always propagated automatically to the child processes.
- I have a problem with the complexity of “protected”, “public” and “private”
properties and methods. (The default in C++ is to protect/hide everything
which means you have to make the decision to show properties/methods to
other objects everytime you make a single class!)
- So long as you can override any inherited method, the fact that it can
be executed is no reason that you must call it!
- Simple external file architecture.
- I look at the installation of many quite simple programs and they have
many nested directories filled with data files, scripts, setup files and
programs.
- To work in conjunction with the idea of an integrated IDE and running
program environment, I wanted to make just a single operating system file
and put all classes, and objects in that.
- In C you have to worry about memory allocation.
- In Java they included automatic heap garbage collection.
- But in all the programming languages I have used, I haven't seen a single
language (other than APL) where both memory and disk storage are automatically
handled for you.
- Memory and disk management are two issues that programmers of an AI shouldn’t
have to think about.
- If the language handles small mundane things for the programmer then the
programmer can spend more time on the code that really matters.
- I studied a language called APL in 1975. It had no concept of memory allocation
or disk storage. You had a workspace and in it you had data (scalars, vectors,
matrixes of any dimension) of simple types (number, character) and programs.
I liked the language a lot but it had a peculiar right to left execution
direction and a huge number of special symbols that made up it’s built-in
methods. It had no “if” or “while” statements but you could simulate these
with some of the many operators (like 65 operators eg: +/#&%$* etc). (The
strange syntax reminds me of LISP.) Memory management and disk management
were not issues you had to deal with at all.
- Finest possible edit and compile without any linking of object modules.
- I have always programmed on PC’s (except for the short time I was at University,
I have programmed on PC’s exclusively since 1976) and I normally only program
10 lines or so before I run the compiler. (Sometimes after only a single
byte change)
- Most code depends on other code to work correctly and I use the computer
as a tool in my programming process.
- I never want to change too many things without knowing that the computer
agrees with my recent changes.
- This also tends to isolate where a new bug has come from.
- If I change a single method, I only want to compile that method. (C compiles
at least the source file you changed and recompiles all modules if you change
an “.h” file that is included in all source modules.)
- Scalable to relatively large size.
- A program for an AI would start small but well before it had much intelligence
it would be a very large project.
- I needed a language with built-in structures that could handle large sets
of data (millions of records) efficiently, and be able to access huge amounts
of disk space.(hundreds of gigabytes at least) Both of these criteria can
already be handled with ease in my implementation.
- Would the list structure of LISP handle such extreme requirements?
- Could a very large system be managed by nonstandard data structures (link
lists with no overall structure)?
- My answer to the above 2 questions was no and I designed a database at
the core of the new language.
- Built in SQL, indexes, tables, lists, stacks and queues.
- I consider the above entities to be the power tools of a language.
- I got my degree in Computer Science without using more than a single sequential
file access.
- In my career, I have programmed almost nothing without creating and manipulating
many files indexed in many different ways.
- Oracle has made a billionaire out of Larry Ellison just selling the database
part of normal programming.
- The problem with all such “external” database programs is that they have
a different language for triggers and stored procedures and they are not
integrated as a normal part of the underlying language that is used to access
them.
- Money is only part of my complaint about the current model of “a language
by one company” and “a database by another”.
- I want to work with database files just as I would with any variable in
my program.
- If I want triggers and stored procedures, then I want to use the same
language that I program anything else in.
- Efficient vector handling of all data types.
- Java took a step in the right direction by eliminating the pointers that
are so much a part of C and allowing only length checked vectors and matrixes
instead.
- Being able to run past the end of a vector without any complaint from
the language is a non fixable design mistake in C. (C++ as well)
- Many bugs in C are related to out of bound pointers.
- For any use I could make of a pointer, I could use a vector or matrix
instead.
- Internet interface.
- I have programmed a few commercial web sites with active pages and a database
on the back end and I know that a browser interface is not the best in the
world.
- However, the advantages of using a browser interface include:
- You don’t have to make all the code to create all the visual objects
that people would expect from a modern language.
- You can access the server from many different PC’s including Apples’.
- You can access the server without regard for distance.
- Processing can be much more efficient when done in close proximity to
the source of the data. No matter how fast network bandwidth gets, a local
program will always be able to process local data faster than having to
get it from somewhere else.
- Runs on Windows PC machines.
- Many programmers think that LINUX is the operating system that will send
them to heaven.
- I don’t like Windows much but LINUX (and UNIX before them) is cryptic,
old and very complex, even if much of the software is free. (What difference
does free make when I had to buy books ˝ a meter high just to get started.
Forget all the different versions and hacks to make anything work and then
there are the people that hack into your server every night!)
- Even LINUX programmers normally have a PC with Windows around so that
they can minimally fit in to the rest of the world computing scene.
- I don’t exclude the possibility of porting the language to LINUX in the
future but the Windows version would be my priority until it is finished.
- Can run as multiple separate systems
- Can run as multiple separate systems on the same computer or in the background
- Having the ability to run multiple small footprint (10M of memory) database
backed servers on the same PC would be a nice plus.
Additional Comments:
- I looked at many languages including LISP, PHP, C, C++, Java, Python, Powerbuilder, Visual FoxPro etc.
- All of these languages failed at least 1 of the first 3 requirements.
- None even came close on the whole list.
- There is a lot of risk and effort involved to create a new language.
- I personally designed and wrote a dBase compiler/interpreter and sold over 30,000 copies in the 1980’s.
- I have first hand knowledge of how difficult it is to create a new bug free language.
- The biggest reason to create a new language is that in building an AI, you don’t know what language features you might absolutely require.
- A language like LISP might be ok for a year or two but what if the performance, or feature set just can’t make it?
- If you have the source code and design rights to the language, you have choices you wouldn’t otherwise have.
Compiler advantages/disadvantages
- Advantages:
- Speed of execution.
- Checks existance and type of variables and method parameters at compile
time.
- Disadvantages:
- Complex and time consuming edit, compile, link and run cycle.
- Code must define all variables and have a “fixed” memory structure at execution time.
- Errors produced are “Invalid memory accessed at XXXX:YYYYY”. How does this tell you where to fix the problem in your source code?
- Environment only comes alive when the compiled program is running. (Other people can’t be accessing the data at the same time you are changing that program.)
- No live updates to code are permitted. (It is a good thing to have a test
system that isn’t production but to always require a dichotomy of program
and production is not always useful!)
- Writing a compiler that produces binary code is a very technical and difficult job. (Many times more complex and difficult that producing an interpreter.)
Interpreter advantages/disadvantages
- Advantages:
- Can run program and change code at the same time.
- Each method that is changed can be compiled by itself and linking is totally eliminated.
- Efficient multiple program execution can be made because cooperative multitasking
can be used instead of preemptive multitasking. This doesn't work very well
for all types of programs (aka the old Apple OS) but can work much more
efficiently than preemptive if all programs conform to your requirements.
- You can interactively work with and check any persistent data structures (objects) while editing a method.
- After changing a class property, you could view the effect immediately.
- Errors in the program can be shown immediately in the editor, changed and compiled just by saving the source code. (Even execution time errors)
- Interpreter can be debugged much easier than a compiler and the problems
are at a much higher level. (Even compiler code is just a set of called
procedures much like what happens in the interpreter. Once the execution
is inside the functions then the speed of execution is the same. The biggest
difference is the inherent quickness of machine code branching versus interpreted
branching. The interpreter can check some things at execution time that
compiled code normally doesn't. If it did then the execution time for the
compiled code would approach that of the interpreter anyway.)
- Disadvantages:
- Slower execution than a compiled program. (How much slower depends on
the interpreter and the mix of code executed.)
- Much of this speed disadvantage can be minimized.
- In general, the speed difference occurs from the design choice of what
to check at execution time. More checking at execution time means more flexibility
but it also means that you get somewhat slower execution speeds.
Conclusion:
- The flexibility and advantages of the interpreter far out way the disadvantages of execution speed.
- Some of the speed lost can be gotten back by:
- Compiling the source to byte code. (Done in my implementation.)
- Writing well used code (indexes, stacks etc) in compiled code in the interpreter
as needed. (Also done.)
- Provide a “Block” structure that would automatically loop though a table
or matrix in a single compiled function to enhance the speed of some low
level routines. (This eliminates the looping speed problem even if it doesn't
stop certain extra type checking at execution time.)
- Provide the ability of having quick procedures inside methods. (This technique
compiles a procedure’s source code efficiently into the calling method without
the overhead of creating a new symbol table and pushing return information
on the stack.) It would also help organize blocks of code in a method.
Addition Points:
I have been asked why I wouldn't just use XYZ version of Lisp. Although I have
put forward many reasons why Lisp is not an adequate language for AI and other
uses, I will offer the following points specifically about Lisp.
1. Lisp is NOT object oriented. It does have a kind of object add-on but the
language is NOT OOPS. It's syntax is ('function name' 'parm1' 'parm2' ...) which
is a prefix notation for the standard form of function call which is 'function
name'('parm1','parm2' ...). This is NOT OOPS. There is no direct association
between the data and methods that change that data as in the object model. In
Lisp's defense, it was created in 1956 and object oriented code only became
popular on micro computers by about 1990.
2. Lisp has a strange prefix notation.
- I have used a right to left notation in APL at University which was different
from any other language I have used, but it didn't add anything to that languages'
use. If it had been just left to right instead, nothing would have been lost
to the language except that I wouldn't have had to write my programs like
someone writes in Arabic.
- This point is exactly one of my problems with Lisp. What does the prefix
notation add to the usefulness of the program?
- If someone doesn't work in Lisp all the time, how does experience with other
programming languages make any useful plus to programming in Lisp? A programming
language should not get in the way of the programmers code, so how does all
the idiosyncrasies of Lisp help someone learn or leverage their knowledge
of other programming languages to code in Lisp?
- The "Byte" code my language is automatically compiled into, is
essentially prefix notation. It has a function followed by it's parameters.
My compiler was designed to take normal looking code and translate it into
this prefix notion exactly because it makes the code more readable and easier
to program. Lisp's prefix notation almost looks like the compiler writers
were just plain lazy and that is why they required that the source code be
in prefix notation which has only advantages for the computer and not the
people that are programming it.
3. Lisp has much extra detail that isn't needed and adds to it's complexity
without providing extra benefit.
The following was taken from a Lisp beginner help session. It shows the 5 different
syntax's of the loop function. (Used for looping)
(1) (loop for I from N1 to N2 do ...)
Also by and downto variations
(2) (loop for Entry in List do ...)
(3) (loop repeat X do ...)
(4) (loop until Condition do ...)
(5) (loop while Condition do ...)
My language has only 3 kinds of loops.
1. WHILE 'condition' ... ENDW
2. FOR i=1 to 10 ... NEXT
3. BLOCK 'matrix, varchar, list, table, index etc' ... ENDB
I don't support a post condition (where the test for continuing the loop is
at the bottom of the loop) for looping because it is seldom used.
It can be easily simulated with a WHILE _TRUE at the top of the loop and an
IF 'condition' BREAK at the bottom.
The BLOCK structure makes defining what is to be done with each member of a
matrix, table etc both efficient and easy to read.
Even having this looping mechanism and making it execute a function somewhere
else like 'loop for Entry in List do' just doesn't cut it.
My language has only IF ELSE ENDIF, ELSE IF, and the SWITCH command to allow conditional branching.
The computer chips that we all use actually have only GOTO and conditional GOTO commands to accomplish conditional branching.
Any type of program branching can be accomplished with the small set I have implemented.
The following is an excerpt from a beginner's guide to Lisp.
This is a shorthand for a nested if. The entire form consists of a set of clauses.
Each clause consists of a condition and then any number of actions.
The condition is evaluated, and if true (non-NIL), all the actions are evaluated,
the value of the last one is returned, and the cond exits. Otherwise the process
is repeated for the next clause. Since T is always non-NIL, it is often used as
the condition for an "otherwise" clause.
Example:
(setq X 4)
(cond
((oddp X) (+ X 3))
((evenp X) (+ X 2))
((< X 9) 999) ) ; This would be true, but is never reached
==> 6
Contrast this with the exact same code in HAL.
int x, ans
x=4
if x.odd()
ans=x + 3
else if x.even()
ans=x + 2
else if x < 9
ans=999
endif
? ans
Are these two code samples equivalent in terms of understanding what they do?
You have never seen HAL before but I would guarantee that all programmers would
understand the HAL code. Would everyone understand the brackets, the mixing
of functions and variables or even that 'cond' means that a 'condition' is to
be evaluated?
Here is another example from Lisp:
-
In my last message, I mentioned two variables with slightly odd names:
*print-pretty* and *Self-Eval*. "*" is a perfectly legal part of a
variable name in Lisp, and the convention is that you put "*"s around
GLOBAL variable names in order to make them stand out in your code.
Lisp does not enforce this; it is just a programmer convention. But it
is a widespread convention, so do try to use *Foo* for the names of
global variables, and just Foo for local variables. If you declare
global variables with DEFVAR before setting them, the compiler will
not warn you about using undeclared global variables.
A similar convention is to put "+" around constants. Constants are
declared via DEFCONSTANT, cannot be changed at run-time, and must be
declared before they are used, since Lisp replaces all the occurrences
with the value. You may never use them this semester, but just in case
you are interested:
(defconstant +Half-Pi+ (/ pi 2.0))
or (with an optional documentation string)
(defconstant +Half-Pi+ (/ pi 2.0)
"Pi/2, to avoid doing this division repeatedly in my trig routines")
Why would there be '*' around a global variable? The whole point of an object
oriented language is to match data with the functions that work on it. What
does a 'global variable' mean from on OOPS point of view?
if a + 5 > 10 // 'a' is obviously a local variable because you weren't told it wasn't
if cust.a + 5 > 10 // 'a' in this case is a property of the object cust
There is no need to put '*' around the variable name and it is obvious what
object the variable 'a' belongs to in the second example. Why put '+' symbols
around a variable you want to be a constant? It's your program, just put the
value in the variable and don't change it. What could be more simple?
Why make variable names case sensitive when most people's eyes, view a variable
like 'foo' and 'Foo' the same? Why set the programmer up for these silly little
mistakes when making variables case insensitive costs nothing for the language?
Why does a person have to count a bunch of '(' to make sure they are all balanced?
Here is an example of imputing a list of numbers 1, 2, 3, 4 into a list. This
is rarely done. Most lists are gathered interactively or from a file and are
therefore shown in a program by just a variable name. The ' before the bracketed
list is also quite archaic.
(if (find 3 '(1 2 3 4)) 'Yes 'No) ==> YES
(find 4 '(1 2 3 4 5)) ==> 4
The following is the equivalent in HAL. No part of HAL doesn't look familiar
to any programmer. (Excuse the double negative!)
class noname inherit list {
int var
} mlist // create the list
mlist='1,2,3,4,5' // store 4 numbers into the list
if mlist.find(3) // find element of list
? 'Yes'
else
? 'No'
endif
? mlist.look(4) // would display the number 4