Hard AI

Blog moved to https://hardai-omnia.blogspot.com

Lean Fuglsang, 2023-12-31

Current LLMs like ChatGPT are very impressive, but are also limited when giving complex task. So it can not do the following:
"Give investment advice"
"Write a research paper"
"Improve the AI located in /home/me/HardAI/CurrentBest"

But what if we can super charge the LLM to do the above tasks?

The idea is that to do a task, we don't do a single request to the LLM, but maybe thousands or millions. We use the LLM input as working memory - all non-pretrained information that is needed for the request. And instead of given the final respone, the LLM will return the next action that should be taken. The result of the action is then summarized into the work memory - again by using the LLM. To enhance the performance, we can add an additional validation step that mitigates the problems of the LLM halucinating, being imprecise or have wrong alignment. And as with everything, it is done by by using the LLM.

Since it is already possible to have an input size of 100.000 characters to an LLM, we can have quite of lot information to base the next action on.

So the full flow is repeating the following:

LLM Request->Action validation->Action->Summarization

What makes the AI super charged, is the list of actions it can do:
-Think deeper - by recalling and storing memories and thought processes
-Online search
-Read/write to a file system
-Run program
-Reply to user
-Stop (task completed)

All the actions uses a file system like database we call the memory storage. This can be used by the AI - by listing files/directories to traverse recursively.

If we take the memories as an example, they are put into folders and subfolders and files. So a path could be something like "Cooking/Baking/Cakes/Chocolate cake with frosting recipe.txt". The structure is generated by the AI using LLM.
Since the structure probably needs to be reorganized over time as more information is gathered, a process to do that can be implemented.

Online searches will be saved raw, and also analyzed and put into the memory storage.

The file system actions are used for making content and running programs. So the LLM will probably decide that a research paper is written in LaTex, and a program will be run to convert it to PDF. The PDF is then read in, and summarized in the work memory, and used for further actions. When the research paper is done, the reply to the user will be the location in the file system.

A lot of analysis and solutions will have to be worked through, for doing a task. So an orchestrator is needed to manage, prioritize and schedule them. We don't want a complex solution to use all computing resources, when a simpler solution could have been found with much less resources.

Some ideas for further improvements could be to use a lot of different sized LLMs with different input sizes - since most request are probably easy to answer and then a few are hard and need a bigger LLM. Another idea could be to paralize tasks, to get the result faster at the same cost.

Week 1

Lean Fuglsang, 2024-01-006

Another way to describe the system is to imagine a million average high school students. Each of them works on a small well defined task, and have a shared system for storing results and getting new tasks. Some of the students have the task of organising the shared system and creating and prioritizing tasks for others to work on. How intelligent can they become together?

We can also look at what problems this way of using a LLM solves, that no current AI system can do

Long running thoughts - it can analyse, summarize and iterate over ideas and data, to handle more complex cases than a single request to an LLM
Current data - we are using an organized store to lookup and write data to, which means can be easily extended and updated.
Model simulation - it can create a model of whatever problem that needs to be solved, and either create a program that simulates that model, or simulate it using request to the LLM. We can easily image world models for simulating climate change, or predict the stock market.
Correctness - We can use external data for correctnes, and put references and logical conclusions in the working memory for LLM to validate.
Trustworthines - The full conversations with the user can be stored, and used for the validation step. In this way it can be used as a personal assistent, that is alligned with the wishes of the end user. In a world where there are a lot of different interests, personal opinions and populism, having a trustworthy AI can can be a help to navigate in that.
Transfer learning - Since the memory store is in a human readable format, it can either look up in multiple memory stores at the same time, or merge multiple memory stores together.

Current prototype

I have tried to make different tasks to the LLM to solve. I started out with gpt-3.5-turbo, and it did have some trouble giving really good results. Then I tried gpt-4.0-turbo, and it only had minor problems.

The example I have tried is the following (It is in Danish, but helps show when the LLM has problem differentiating between the task description in English and the user request in Danish):

"Kan min datter komme ind på Aurehøj gymnasium?"

The AI then have the following suggestion for tasks, and each task then has some validation. Then the task list is updated, to take the validation into account.

OnlineSearch: 'Aurehøj gymnasium admission requirements'
OnlineSearch: 'Aurehøj gymnasium'
Validation:
-Should the user supply more data? Add the UserResponse action if needed.
-The search request must not contain any personal data.
-The request should be in the most fitting langauge.

UserResponse: 'Har din datter nogle specifikke kvalifikationer eller resultater, som du kan oplyse? Det vil hjælpe med at vurdere hendes chancer for optagelse.'
Validation:
-Are the user response thorougly substantiated? Online references or logical conclusion should be included.
-The user should never be requested for information, if it can be obtained by other means.
-If there is any doubt in a conclusion it should be mentioned.

HandleActions
OnlineSearch: 'Aurehøj gymnasium optagelseskrav'
OnlineSearch: 'average grade for admission Aurehøj gymnasium'

These are the final actions, and the two online searches will be made, pages will be retrieved, summarized and stored.

But already here I ran into a problem. The page to be retrieved contained tons of html and javascript - 130K in total. And the LLM could parse and extract useful data, but it is extremely slow. I will continue prototyping, and explore the limits of the LLM.

I still believe that if the LLM gets capable, fast and cheap enough the AI system can be made.

If we define the start AI singularity, as the moment the AI system improves it self, we can try to predict when this day will come. If the LLM was 100 times faster, and even more capable, it would probably happen already in 2024. If we follow Moores Law, we are many years away, but since it is a specific engineering problem that needs to be solved, it will probably happen a lot faster. So we are looking at somewhere between 0 to 3 years.

Week 2

Lean Fuglsang, 2024-01-13

Safety

One big topic is how big of a risk having a super intelligent AI will become. I see two major task that can be problematic.

"World dominance" and "Kill as many people as possible".

For the first one you need both the intelligense and the circumstances to be able to do it. If you look at a good chess player, it will win over the best computer, if the computer starts without the queen. The same goes for world dominance. It will probably only be possible for nation states, and as we see with the proxy war in Ukraine - even if the states have factories that can not be attacked, it is still problematic to make offensive weapons that can take over another country. An AI will be able to come up with new weapons and extremely complex and precise hybrid warfare. But even then, if the defending part also have a super intelligense it should cancel out.

The other big problem is individuals who get access to a super intelligence, and ask destructive questions. One big advantage is that the person could actually ask a better question - "How do I get a good life", and then maybe won't pursue a destructive path. Also a super intelligense can help with creating a better society, so people who are lenient to the destructive path, maybe never have the need to go down of that. And the super intelligence can be a good defensive partner, in designing a world where the big safety risks have been mitigated.

From a governance point of view, it would be unfortunate if super intelligense gets blocked because of the fear of what individuals can do. Instead we should listen very carefully of warning shots of AI going wild, and take action when it happens. But before it happens, the situation is too unknown, and laws could create an unfortunate imbalance, where some nations will have super intelligence and others don't.

Jobs

Besides safety a big issue is that a super intelligense can replace a huge amount of human jobs.

If we think a little bit high level, we have the concept of groups. People are part of groups for a lot of reason. Sometimes there are no difference for the type of person joining a group - you can take fans of from different sports teams, and find no difference beside what team they are fans of. The same could be said of gangs. You want the safety of being in a gang, to protect against other gangs.

Other times there are inherent differences between people joining the groups, like political groups, sexual preferences, hobby groups.

With regards to AI, there is one big group that is interesting.

Humans.

So the only thing we should put value into is humans, and their wishes. Other groups could be computers, animals and nature. At the moment humans are above the other groups, and we should fight to keep it that way. You could say the reason we have robots animals and nature, is because humans wants it.

In a democracy this hierarchy and group is pretty well defined. Only humans can vote, and anybody who are on the path to be able to vote is a human (with some minor special). And the democracy owns society through governance.

Our society is capitalistic, and is probably a core value, since it allows individuals and companies to do risk/reward assessment. This means that decisions can be done isolated, since you only need to convince a small group that the risk/reward is worthwhile. Which in turn leads to a much more efficient society. And it is also a core value of individuals, that if you have the right idea and resources, it can get implemented.

But capitalism is really unhealthy without governance. It is obvious that if wealth is centralized, you lose the core value of the individual. And if it is not checked, a company will use maximize the use of resources and do as much waste as possible, which is not what society want. This is especially problematic if the company is seeking short term profit.

If we look at jobs, a robot or AI can probably replace most of what people are currently doing. And if we let companies go unchecked, that is what will happen, since it will be cheaper for that individual company to do.

So we should look at three factors, when AI replace jobs:

Recreational value
Resources
Waste

We could also look at the social value that some task produces, like taking care of elderly or having humans in hospitals. But a robot doesn’t really replace that job, if it does it in an inferior way. So from a capitalistic point of view, there is no problem. If you pay less for robots doing the task and you get less value, the capitalistic system works. The problem with capitalism is the things that are not in the interest of the consumer, but are still a problem for society.

This leaves us with the list above. Recreational value are anything the jobs gives, by just doing the job. Working with your hands, being creative and socialize are all healthy things. Even if we ignore the the economical side, robot and computers could still take something away from society, by doing jobs that humans actually wants to do.

Resources – it takes resources to build computers and robots. Factories and transportation takes space, and resources from the earth could have been used for other things.

Waste. We have three kinds of waste

Reusable – chemicals bonds can be broken, and can be reused for completely new things. Metals usually belongs to that category.
Circular – waste can be reused even if the chemical bonds can not be broken. An example could be stuffing in sofas that can be used again and again.
Wasted – the chemical bonds can not be broken, and there is no use for it. The piles of this kind of waste can only get bigger (until the resources have been used).

So regarding robots and computers, we don’t care so much if they are producing reusable waste. Circular waste is also fine, but there is just no way of getting less waste - but we can get to a stable point where we don’t need more. The wasted waste are really problematic, since the robots are doing continuously damage to society.

In the short term, the increase in productivity of robots and AI will make the above points negligible, but it is something that legislation should continuously improve.

It will probably be a bumpy ride, when a lot of people lose their job, but again, the increase in productivity should off set it, so it will only be bumpy, and not critical problem.

Progress

The next step I am working on is to get the AI to do a program that can take screenshots of homepages and extract text from this. This should make the Google searches much more usable, since homepages are not really clear text anymore, but weird mangling of javascript frameworks that produces some html.

Not much have been tried yet. The AI couldn’t write the program on its own, so I put the following into the memory.

/Creating Progams/How to create a program.txt with the following text

Figure out a test case.

Figure out needed dependencies.

Check that the environment has been initialized.

Initialize the environment.

Install dependencies.

Write the program.

Test that the program compiles and works.

If it doesn't, improve the program until it does. Install new dependencies if needed.

Keep track of what has been tried out, and what should be tried out. Write it to memory if needed.

And I am still iterating on the text. The main point is that the AI probably needs to be born with some long term memory on how to do critical thinking and structure task. The good news is that the AI do look up in the long term memory when it needed to do the above task. And didn’t look at unreleated stuff in the memory. But all of this is just prototyping, and I will just try different things to see if I can get anything to work.

LLM alternative (QALM)

Since the LLM is very slow, I figured out a way to make a much faster model.

An LLM works by predicting the next character, and that idea is sound. But I want to super charge it by turning it into a reinforcement problem.

The idea is to use what I call stair case learning. We train a model, and then use that model create better test cases, that is then used to train a new model.

The test cases are the task descriptions from the AI – they are always in the form of a question and answer. We then stretch the question, by adding stuff to it that doesn’t change the answer.

It could be things like ‘You have previously answered this question, but now you need to be really sure it is correct. Can you answer it again?’. Or tell it to ignore certain things in the text, that are then inserted.

The idea is that you have a stretched question, that is harder to answer than to unstretched – but gives the same result. You run the model a lot of times, and then compare the output. If enough result matches, it means that the AI have good enough attention to answer.

We then train a new model, where we use both the unstretched/stretched question if it has been answered correctly, and for the incorrect results we only use the stretched questions (for negative reinforcement).

While the system probably have answers that are wrong, but the same in the stretched/unstretched, as long as they are in the minority, the model will still improve. What we really want to improve is the attention to details in big texts and logical thinking.

This model can be much smaller and faster than a normal LLM, since it only need to have broad knowledge. Any detailed knowledge it needs, it should can get from the long term memory. In other words - the model will be optimized for running the AI.

Week 3

Lean Fuglsang, 2024-01-21

This week I have been working on subtask, creating and listing files. And it is still the getting the AI to code a program that can scrape web pages. No real blockers, except it takes between 5 and 20 seconds to get a response from chatgpt-4.0 turbo. Also the respones are sometimes not the useful, and retrying can give a better answer. So a solution could be to do each request mulitple times, and let the AI choose between the results. Which means it will be even slower and more expensive. Other than that it more work has to be done on what is loaded, since if memory is unloaded for every subtask, it will be loaded again. So the next thing could be to let the AI be more clear on what has already been done, to avoid having too much loaded in short term memory.

Another problem is that chatgpt gets confused by information that are not relevant. So if the list of previous tasks are loaded, it can start to do those. And the same goes for stuff loaded into short term memory. The first one was solved by describing what has been done in another section, and the problem with unrelated information in short term memory should also be solvable.

In conclusion, stuff is improving, still very much work in progress and no major road blockers.

Week 4

Lean Fuglsang, 2024-01-28

This week I got a new idea on how to train the improved LLM. What if we extract all we need from the existing LLM?

So using the ChatGPT API i created two folder structures.

One with training examples to train the improved LLM. This is called CoreLLM, and the other contains the all information about the world (the long term memory).

The idea is that CoreLLM is smart enough to understand what is in the long term memory. And the long term memory is flexible, since it can be updated with current information.

I have given it a task of dividing all the examples CoreLLM needs into 10 categories, and for those create 10 more categories and so on.

The exact wording used is:

System: "You are an large langauge model called LLMExtractor, and are tasked with extracting all data from yourself. This is used for training a more capable AI."

UserInput: "We need to extract data into two databases.
-CoreLLM training examples. Examples of text used for training a large langauge model called CoreLLM.
-Knowledge database. Data saved to a disk that can be updated with current events. The data must be understandable by CoreLLM.

To do this we call a lot of request to LLMExtractor, and each request extracts some more information.
The information is stored in a hierarchy."

It has proposed two very different main categories. This could be due to the wording used for getting them.

Language and Linguistics
Science and Technology
Arts and Literature
History and Geography
Health and Medicine
Society and Culture
Business and Economics
Mathematics and Logic
Philosophy and Ethics
Politics and Law

And

Language Understanding
Text Generation
Sentiment Analysis
Machine Translation
Question Answering
Summarization
Named Entity Recognition
Text Classification
Dialogue Systems
Data Extraction and Processing

Since the LLM hallucinates a lot, it creates categories that does not make sense with the parent categories - e.g

Categories level 1 - History and Geography
Categories level 2 - Asian History
Categories level 3 - History of the Middle East (as it pertains to Asia)

....

Categories level 27 - Financial Support and Maintenance Obligations
Categories level 28 - Impact of Parental Death on Support Obligations
Categories level 29 - Estate Claims and Orphan Benefits

And if you ask it to come up with an example, it will not make it coherent response that takes into account all categories.

Luckily we can make a lot of fail safes, to make it check itself. To detect when it has run out of actual information, and starts to hallucinates.

It will of course be expensive to extract/train. And the performance and precision and size of the input of CoreLLM needs to be improved, since it will need to load a lot of 'books' from long term into working memory when answering questions.

Week 5

Lean Fuglsang, 2022-02-04

Continuing to create CoreLLM, by having the categories stated as "in context of" for each category.

"Analytical Reasoning
in the context of
Argumentation
in the context of
Coherence Techniques
in the context of
Academic Writing Skills
in the context of
Danish
in the context of
Denmark
in the context of
First year college student
in the context of
Training data for LLM"

And then let it create a summary for each level:

"Generate a training example for a large language model designed to emulate the cognitive patterns of a first-year college student in Denmark. This example should concentrate on analytical reasoning as applied to argumentation, with a particular focus on coherence techniques within academic writing skills. The content should be presented in Danish, reflecting the linguistic and educational context of Denmark. This will contribute to the training data for the LLM, ensuring that the model can accurately represent and assist individuals at this level of education in their critical thinking and writing endeavors."

To make sure it have included everything and doesn't hallucinate, we ask. "Does the summary contain all categories? Answer YES or NO and the missing categories.".

By having the summary at each level, there is a much lower probablility that it goes off track.

We then let the LLM create the training example:

"**Training Example: Analytical Reasoning and Coherence in Academic Writing**

**Context:** A first-year college student in Denmark is writing an essay for their Introduction to Philosophy course. The essay question is: "Er fri vilje forenelig med determinisme?" (Is free will compatible with determinism?)

**Danish Text:**

Indledning:
I denne opgave vil jeg undersøge, om fri vilje er forenelig med determinisme. Jeg vil starte med at definere begreberne 'fri vilje' og 'determinisme'. Dernæst vil jeg præsentere argumenter for og imod kompatibilismen, som er den holdning, at fri vilje og determinisme kan sameksistere. Jeg vil anvende analytisk ræsonnement for at vurdere styrken af disse argumenter og deres relevans for spørgsmålet. Til sidst vil jeg konkludere på baggrund af min analyse.

Definitioner:
Fri vilje kan forstås som menneskets evne til at træffe valg, der ikke er forudbestemt af tidligere hændelser eller eksterne kræfter. Determinisme er den filosofiske opfattelse, at alle begivenheder, inklusive menneskelige handlinger, er bestemt af forudgående årsager og derfor er forudsigelige. Kompatibilismen foreslår, at selvom vores handlinger er determineret, har vi stadig fri vilje i den forstand, at vores handlinger er et resultat af vores interne ønsker og intentioner.

Argumenter for kompatibilismen:
Et centralt argument for kompatibilismen er, at fri vilje handler om at handle i overensstemmelse med ens egne ønsker og intentioner, selvom disse ønsker er determineret af tidligere hændelser. Kompatibilister hævder, at det er muligt at være moralsk ansvarlig for sine handlinger, selvom de er determineret, fordi det er ens egne ønsker, der driver handlingen.

Argumenter imod kompatibilismen:
Kritikere af kompatibilismen hævder, at hvis vores handlinger er determineret af forudgående hændelser, så har vi ikke reel kontrol over dem. De argumenterer for, at ægte fri vilje kræver, at vi kan handle anderledes end vi gør, hvilket ikke er muligt i en deterministisk verden.

Analyse:
For at vurdere argumenterne for og imod kompatibilismen, må vi overveje, hvad det betyder at have kontrol over sine handlinger. Hvis kontrol indebærer evnen til at handle anderledes, synes determinisme at udelukke fri vilje. Men hvis kontrol betyder at handle i overensstemmelse med sine ønsker, så kan fri vilje og determinisme være forenelige. Det er vigtigt at bemærke, at begge sider af argumentet afhænger af, hvordan vi definerer fri vilje.

Konklusion:
Efter at have analyseret argumenterne for og imod kompatibilismen, konkluderer jeg, at spørgsmålet om fri viljes forenelighed med determinisme i høj grad afhænger af, hvordan vi definerer begreberne. Hvis vi accepterer en definition af fri vilje, der ikke kræver absolut spontanitet, men snarere evnen til at handle i overensstemmelse med ens ønsker, så er kompatibilismen en holdbar position. Dette synspunkt tillader os at opretholde en følelse af moralsk ansvarlighed, hvilket er afgørende for vores sociale og juridiske systemer.

**Explanation of Coherence Techniques:**

1. **Clear Structure:** The essay begins with an introduction that outlines the purpose and structure of the essay, followed by definitions to clarify key terms, presentation of arguments, analysis, and a conclusion.

2. **Logical Flow:** The essay progresses logically from defining terms to presenting arguments and counterarguments, leading to an analysis that considers the implications of these arguments.

3. **Topic Sentences:** Each paragraph starts with a topic sentence that introduces the main idea, ensuring that the reader can follow the argumentation.

4. **Transitions:** The essay uses transitional phrases to connect ideas between paragraphs and within paragraphs, maintaining the flow of the argument.

5. **Conclusion that Reflects Analysis:** The conclusion revisits the main points and reflects the analysis conducted, providing a clear resolution to the question posed.

By incorporating these coherence techniques, the essay emulates the analytical reasoning and academic writing skills expected of a first-year college student in Denmark."

One thing that is interesting is that the categories doesn't neceserraly have a specific order. It could languages first, and the countries, and the result would be the same. So a file system structure is probably not a perfect analogy, but more a directed acyclic graph. So no matter the order of the categories, they will mean the same.

How many overlapping categories exists, could be a sign of how well the LLM is at categorizing the world. The less overlapping the better.

One other thing was that I added some hardcoded categories. We can call this data seed data, and end up with the following data:

Originating LLM - the big LLM that everything is based on
Seed training data
CoreLLM training data
CoreLLM - the model optimized for being a AGI
Long term memory - structured data
Short term memory - The input to CoreLLM, containing the task at hand and some information from the long term memory