Voyager

This document is analysis results of the project Voyager.

Pseudo Code

You can check the pseudo code in the paper, but I write the new version of the pseudo code here.

// Pseudo code of Voyager learn function

// Prepare agents
action_agent = new Action_Agent()
curriculm_agent = new Curriculm_Agent()
critic_agent = new Critic_Agent()
skill_manager = new Skill_Manager()
env = new Minecraft_Env()
event = new EventManger()
recorder = new Recorder()

function rollout(task, context) {
    action_context = construct_action_context(task, context, [])
    while (true) {
        // Get action which is the javascript code to execute
        action = action_agent.generate_action(action_context)

        // Execute the action
        event = env.step(action)

        // Recrod the event and task
        recorder.record(event, task)
    
        // Updat the chest(inventory) memory
        action_agent.update_memory(event)

        // Check the success of the task
        success, critique = critic_agent.check_success(event, task, context, env)

        if !success {
            // Revert all the placing event in this step
            env.step("undo")
        }

        // Retrieve the skill from db
        skills = skill_manager.retrieve_skills(task, context, action_agent.summarize_log())

        // Update new action context
        action_context = construct_action_context(task, context, skills)
        
        // Determine the termination of the task
        done = determine_termination(success)

        if done {
            return construct_info(success, critique, task, context, recorder)
        }
    }
}

function learn() {
    env.reset()
    event.reset()
    max_iteration = ...
    current_iteration = 0
    
    while (current_iteration < max_iteration) {
        // Propose the next task
        task, context = curriculm_agent.propose_next_task_from_current(env, event)

        // Rollout the task
        info = rollout(task, context)
        
        // Update the skills from rollout
        skill_manager.update_skills(info)

        current_iteration += 1
    }
}

learn()

Dependency of LLM

Voayger uses the LLM at the following places in the pseudo code:

curriculm_agent.propose_next_task_from_current
- render_humange_message: make a questino, answer pair
  - run_qa1: Geneate the questions to find the blocks, items, and mobs
  - run_qa2: Answer the questions from run_qa1
- propose_next_ai_task: Generate the next task from the render_humna_message
action_agent.generate_action: Generate the action(javascript code) from the task
critic_agent.check_success: Check the success of the task

So if Voyager run the 1 cycle of the learn function, it will use the LLM 5 times sequentially.

run_qa1 from curriculm agent
run_qa2 from curriculm agent
propose_next_ai_task from curriculm agent
generate_action from action agent
check_success from critic agent

Voyager

Pseudo Code​

Dependency of LLM​

Pseudo Code

Dependency of LLM