Board 8 > This "AI Learns A Video Game" video is on another level. (Pokemon Red)

Topic List
Page List: 1
Forceful_Dragon
10/13/23 12:10:38 PM
#1:


https://www.youtube.com/watch?v=DcYLT37ImBY

The quality and process of this video is head and shoulders above similar videos, and watching an AI get trained on something with the complexity of a pokemon game is crazy. Highly recommend watching if you have a half hour to spend.

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
MrSmartGuy
10/13/23 12:56:48 PM
#2:


That was fascinating. I love that the first iteration just stopped and people-watched because it was secretly being rewarded to do so.

---
Xbox GT/PSN name/Nintendo ID: TatteredUniform
http://www.scuffletown.org/wp-content/uploads/2010/05/tRBE1.gif
... Copied to Clipboard!
Forceful_Dragon
10/13/23 12:59:40 PM
#3:


I think my favorite part was how it was given so much PTSD from the pokemon center computer that it absolutely refused to heal until the rewards got adjusted.

I can't believe this is this youtuber's first ever video it has so much polish. I'm guessing we won't see a TON of new content because I'm sure videos like this take a lot of time. But Quality > Quantity and this guy deserves to blow up if this is the level we can expect.

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Lopen
10/13/23 1:03:14 PM
#4:


Yeah that was hilarious

It's like "this is how to make your Pokmon Trainer become an NPC"

Pretty interesting video. I feel like there might be other ways to train it to make it actually understand combat in ways that aren't randomly stumbling on the right solution like rewarding it more for combats won with fewer attacks but not something I'd ever want to delve into.

---
No problem!
This is a cute and pop genocide of love!
... Copied to Clipboard!
Forceful_Dragon
10/13/23 1:06:03 PM
#5:


I think that would be pretty simple with weighting on receiving the "super effective" and "not very effective" screens? But then you would still need to balance actual game progressions versus spamming gust on bug pokemon in the forest.

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Lopen
10/13/23 1:27:26 PM
#6:


Yeah.

My thinking is you'd have to weigh things on orders of magnitudes to better weigh game objectives to prevent grinding super effectives from being a way to maximize score.

Beating Brock should be worth like 1000 points cause you need to do it to beat the game. Getting super effective in a fight would be worth like 1 point. Winning a fight would be 10, winning a trainer fight is worth 100. This is super dumbed down but I would think you could train based on getting items like badges and stuff since in theory that's memory space that can be accessed.

You could also have scores within categories kinda capped as a percentage. Like say the AI has a "combat" score worth 10 points that can go up or down based on what percentage of "effective" moves it uses

I dunno how exactly the algorithm works though even though he explained it pretty well. I've never used machine learning for anything near that complicated so yeah I'm kinda just guessing on what you can implement or whether that's a different algorithm entirely.

It's definitely cool to think about though. Would watch more videos by this guy for sure.

---
No problem!
This is a cute and pop genocide of love!
... Copied to Clipboard!
PrivateBiscuit1
10/13/23 2:12:09 PM
#7:


I can accept AI taking our jobs, but I draw the line at AI taking our gamer status.

---
I stream sometimes. Check it out!
www.twitch.tv/heroicbiz/
... Copied to Clipboard!
Fiop
10/13/23 2:19:10 PM
#8:


Tag to watch later. I haven't played any Pokemon game more than a few minutes, but this sounds interesting.

---
"so is my word...It will not return to me empty, but will accomplish what I desire and achieve the purpose for which I sent it." - Isaiah 55:11
... Copied to Clipboard!
Forceful_Dragon
10/13/23 3:27:33 PM
#9:


Lopen posted...
Beating Brock should be worth like 1000 points cause you need to do it to beat the game. Getting super effective in a fight would be worth like 1 point. Winning a fight would be 10, winning a trainer fight is worth 100. This is super dumbed down but I would think you could train based on getting items like badges and stuff since in theory that's memory space that can be accessed.

So it would be like the Pokecenter PC Trauma, but in reverse? Where eventually it would stumble into a relatively massive reward and then assign a ridiculously priority to those tasks after it "finds" them?

I wonder if it would create a situation where it happens to defeat Misty, only to start to beeline towards Misty before it's strong enough to win chasing that prior high.

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Steiner
10/13/23 3:29:03 PM
#10:


Lopen posted...
getting items like badges and stuff since in theory that's memory space that can be accessed.

maybe i missed something saying otherwise but i'm under the impression this was all driven by visuals and not accessing actual data

---
Born to bear and bring to all the details of our ending
To write it down for all the world to see
... Copied to Clipboard!
Lopen
10/13/23 3:49:31 PM
#11:


Steiner posted...
maybe i missed something saying otherwise but i'm under the impression this was all driven by visuals and not accessing actual data

I think visual was just for exploration. I think it specifically referenced accessing non-visual data with the Bill's PC problem and fix but maybe I'm mistaken.

Forceful_Dragon posted...
So it would be like the Pokecenter PC Trauma, but in reverse? Where eventually it would stumble into a relatively massive reward and then assign a ridiculously priority to those tasks after it "finds" them?

I wonder if it would create a situation where it happens to defeat Misty, only to start to beeline towards Misty before it's strong enough to win chasing that prior high.

I think that's OK because beelining towards Misty won't work generally and on a successful run it will get a winning strategy. It will get a bad habit and then correct it because eventually it'll fluke into a way to beat her at low level or fluke into accidentally grinding and realizing that beats her more reliably. Also if you keep incentive of fighting trainers relatively high vs super effective hits/killing wild pokemon it may continue to use those to level.


---
No problem!
This is a cute and pop genocide of love!
... Copied to Clipboard!
Forceful_Dragon
10/13/23 3:51:13 PM
#12:


Oh and for refence this guy had less than 2k subs and less than 40k views earlier today. Currently sitting at over 4k subs and closing in on 120k views.

The algorithm found this guy (probably shortly before I got offered the video) and it's been popping off. Feels good to be "on the ground floor" of discovering a quality youtuber.

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Lopen
10/13/23 3:53:48 PM
#13:


Also technically even if it is visual you need to just specifically capture the "you defeated brock" screen as the +1000 or whatever.

---
No problem!
This is a cute and pop genocide of love!
... Copied to Clipboard!
#14
Post #14 was unavailable or deleted.
NeatoAnAccount
10/13/23 5:38:30 PM
#15:


this is a bananas first video for his channel lmao
algorithm showed me this before i saw in on b8 but i didn't click it until now
good for him :D

---
Neato, an account
... Copied to Clipboard!
tcaz2
10/13/23 5:38:52 PM
#16:


Very fun and informative video. Glad you shared it.
... Copied to Clipboard!
foolm0r0n
10/13/23 5:53:13 PM
#17:


Steiner posted...
maybe i missed something saying otherwise but i'm under the impression this was all driven by visuals and not accessing actual data
He explains the full input later, it's the last 3 frames of visuals, plus some encoded game data. So he could encode more game data in there.

Technically though, you should be able to train a model using this method purely with visuals, and a really high-level goal like "reach the end screen". You would just need to run the training for millions or even billions of iterations so that the randomness has a chance to get to the end. Adding intermediate goals helps training massively since the randomness can latch onto something exponentially quicker.

---
_foolmo_
he says listen to my story this maybe are last chance
... Copied to Clipboard!
Forceful_Dragon
10/13/23 6:06:32 PM
#18:


So you'd just set exponentially more valuable goals and try and figure out which lower value goals to put and how to format them, such as adjusting the threshold to prevent water animation from triggering a goal. In theory you could structure the entire games goal into a brand new AI and have it reach a game winning state within a certain number of generations?

10^1 Points:
Find new screens
Level up pokemon
Heal pokemon (but only when they are injured somehow?)

10^2 Points:
Defeat trainers

10^3 Points:
Defeat Brock
Exit Mt Moon ?

10^4 Points:
Defeat Misty
Talk to Bill

10^5 Points:
Acquire Cut
Teach Cut to a Pokemon
Enter Vermillion Gym (Speak to a trash can in the gym for the first time?)

Something like that? But then when you start to get to things like acquiring and using HMs you start to introduce new mandatory mechanics that will require inputs that are pretty unlike anything that it has had to learn how to do before, right? Random exploration will probably get it through the diglett cave and to receive Cut, but without specific programming how will an iteration that has 'learned' enough to reach that point in the game go through the steps necessary to teach the HM to a pokemon? It just spent however many hours learning to AVOID opening it's menu so it could actually get stuff done.

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Lopen
10/13/23 7:24:46 PM
#19:


Yeah HMs were my concern. It seems difficult to randomly stumble into how they work. I suppose you could give the AI points for learning HMs and then also for successfully using HM moves from distinct squares?

I think tiering the points too hard is bad though. You would push the AI away from generally good play by mistake.

So beating gym leaders and achieving one time goals would be the higher point stuff. You can't value teaching cut higher than defeating Brock otherwise it gets HM and learns to get Pokmon, teach them cut, then release them for many points and win the game by repeatedly catching nidorans and teaching them cut. Teaching cut can have a small incentive and the AI will probably do it, but it can't be so great to learn cut that it doesn't want to defeat the next gym leader.

---
No problem!
This is a cute and pop genocide of love!
... Copied to Clipboard!
Lopen
10/13/23 7:45:55 PM
#20:


My feeling is like

10^0 - Use super effective moves, recover PP/HP (one point per HP recovered, 10 per PP)
10^1 - Win Battles, Find New Screens, Catch new pokemon, level up Pokemon
10^2 - Win Trainer Battles, Teach HM move to Pokemon, Use HM move from distinct square
10^5 - Storyline one time stuff such as beating a gym leader etc
10^8 - Beat game

Giving all the one time storyline stuff significantly higher point value than everything else even if it gives linear points rather than exponential means the AI SHOULD attempt to speedrun the game once it figures out how to achieve each objective especially if you train the AI with a variety of time limits. The rest are just weighted to encourage good play with the more spammy stuff having less value so the AI doesn't overdo them.

But again I'm pretty casual with my AI experience so I may be missing something.

---
No problem!
This is a cute and pop genocide of love!
... Copied to Clipboard!
Forceful_Dragon
10/13/23 11:36:02 PM
#21:


Forceful_Dragon posted...
Oh and for refence this guy had less than 2k subs and less than 40k views earlier today. Currently sitting at over 4k subs and closing in on 120k views.

6k / 220k
Still going crazy

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Aecioo
10/14/23 8:03:51 AM
#22:


Sorry but I have brain rot from watching dougdougs AI.

If I'm not getting shouted at by an AI Napoleon, I'm out

---
http://28.media.tumblr.com/tumblr_lcb35gGx0t1qailr4o1_500.gif
http://www.megavideo.com/?v=57N0YAEJ
... Copied to Clipboard!
Forceful_Dragon
10/14/23 4:39:06 PM
#23:


Someone in a comment just mentioned the Surf HM in the Safari Zone.

How on earth will the AI randomly navigate through the safari zone even one time to trigger the positive feedback? I guess you put a point reward on "enter safari zone" to teach it that attempting the SZ is a good idea. But other than directly programming it to go the right way I'm not sure how it will be able to develop a habit for reaching the right spot in the zone.

It's just about 300 steps exactly to reach the right spot and moving correctly 300/500 times feels near impossible. I guess you would have to remove the step limit like they did in Twitch Plays Pokemon?

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Paratroopa1
10/14/23 4:41:41 PM
#24:


Forceful_Dragon posted...
Someone in a comment just mentioned the Surf HM in the Safari Zone.

How on earth will the AI randomly navigate through the safari zone even one time to trigger the positive feedback? I guess you put a point reward on "enter safari zone" to teach it that attempting the SZ is a good idea. But other than directly programming it to go the right way I'm not sure how it will be able to develop a habit for reaching the right spot in the zone.

It's just about 300 steps exactly to reach the right spot and moving correctly 300/500 times feels near impossible. I guess you would have to remove the step limit like they did in Twitch Plays Pokemon?
I'm not even all that confident the AI can learn how to cut a bush
... Copied to Clipboard!
tcaz2
10/14/23 4:44:28 PM
#25:


I think cut MIGHT be possible but yeah the Safari Zone would be so close to impossible it might as well be.

Also things like buying a lemonade and giving it to the thirsty guy, etc.
... Copied to Clipboard!
Forceful_Dragon
10/14/23 4:47:46 PM
#26:


Paratroopa1 posted...
I'm not even all that confident the AI can learn how to cut a bush

I'm pretty confident it can do this. You just have to put a juicy enough reward on "obtain cut" and "teach cut to a pokemon" and it will be able to do those things with some regularity.

Although in pokemon red it doesn't prompt you to use Cut when you A-press a cuttable tree, does it? So it would have to go back into the menu and activate cut while standing in the right spot? OKay yeah maybe impossible.

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Paratroopa1
10/14/23 4:48:47 PM
#27:


Forceful_Dragon posted...
I'm pretty confident it can do this. You just have to put a juicy enough reward on "obtain cut" and "teach cut to a pokemon" and it will be able to do those things with some regularity.

Although in pokemon red it doesn't prompt you to use Cut when you A-press a cuttable tree, does it? So it would have to go back into the menu and activate cut while standing in the right spot? OKay yeah maybe impossible.
That's correct

Maybe if you fudge the rewards enough it'll get it, but I feel like the point of the whole thing is that you're not teaching it how to beat the game; this video stuck to very general rewards.
... Copied to Clipboard!
Forceful_Dragon
10/14/23 4:54:07 PM
#28:


Yeah and you can't reward it for "using cut" regardless of where it uses it or it will just spam it to farm points.

And you can't punish it for using cut when you aren't facing a tree or then it will never ever try.

Maybe a reward for attempting cut, but only if it's taken at least X number of steps since the last time it attempted cut? But then you are approaching very non-general reward conditions.

But the AI learned to recognize that geodude and onix are good bubble targets, so maybe it will recognize cuttable trees at some point once it actually cuts a tree on accident?

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
NeatoAnAccount
10/14/23 4:58:24 PM
#29:


Total HP should probably be part of the reward function. And maybe they could hook the whole thing up to an LLM to ask it to do some reasoning for things like HMs etc.

---
Neato, an account
... Copied to Clipboard!
Paratroopa1
10/14/23 5:23:00 PM
#30:


NeatoAnAccount posted...
Total HP should probably be part of the reward function.
time to rng manipulate six chanseys out of safari zone
... Copied to Clipboard!
Forceful_Dragon
10/15/23 9:47:09 PM
#31:


Forceful_Dragon posted...
6k / 220k
Still going crazy

16k subs
900k views
Over 7 days (though almost all of it in the past 2)

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
DeepsPraw
10/15/23 11:05:46 PM
#32:


Forceful_Dragon posted...
I'm pretty confident it can do this. You just have to put a juicy enough reward on "obtain cut" and "teach cut to a pokemon" and it will be able to do those things with some regularity.

they should just put a big reward on "beat the game"
why has no one thought of this?

---
pepsi for tv-game
... Copied to Clipboard!
HaRRicH
10/16/23 9:17:44 AM
#33:


Cool video! Legend of Zelda on the NES next please.

---
O P E R A T I O N O U S T : Nominate SHEIK!
https://i.imgur.com/OpudFxm.jpg
... Copied to Clipboard!
Forceful_Dragon
10/18/23 1:30:26 AM
#34:


This video is gonna hit 2M views tomorrow. Have anyone seen a channel blow up like this from it's first video? Really curious to see what his timeline is gonna be for continued content creation, that will have a big impact on the sort of momentum he can carry

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Forceful_Dragon
10/23/23 9:01:16 AM
#35:


3.5M and it's finally slowing down. Any stragglers should give this video a chance.

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
HaRRicH
10/23/23 9:52:28 AM
#36:


I saw this video the other day:

https://youtu.be/5BmET_okYVk?si=JK1zTqtmNFOlwJT2

Not as good of a video, but seeing a Super Meat Boy-take on AI learning from a year ago was interesting.

---
O P E R A T I O N O U S T : Nominate SHEIK!
https://i.imgur.com/OpudFxm.jpg
... Copied to Clipboard!
Forceful_Dragon
10/23/23 10:22:18 AM
#37:


Yeah I think I've seen that one before, but definitely not one of the better ones out there.

I actually like the progress made on some racing games. Yosh made a pretty good video training an AI for Trackmania:

https://www.youtube.com/watch?v=SX08NT55YhA

The thing I like is that because the AI is receiving information about it's current angle/velocity as well as the next couple turns it's more adaptable to being put into a brand new track and executing maneuvers that will be better than pure random.

That video was from March 2022, but he actually released an update a few weeks ago:

https://www.youtube.com/watch?v=Dw3BZ6O_8LY

And here's a similar concept applied to Mario Kart where the focus is training it on some maps with the intent to play on brand new maps the AI hasn't seen yet based upon behaviors developed on other maps.

https://www.youtube.com/watch?v=lnnHmVNO07Q

---
~C~ FD
http://i.imgur.com/dGDfxaw.png
... Copied to Clipboard!
Topic List
Page List: 1