estirose: A pixel portrait of a woman (Default)
[personal profile] estirose
My audience probably isn't much into having AI help code software, but I ran an experiment just out curiosity. Basically, I asked 3 different AIs (ChatGPT, Claude, and Llama) code a Stardew Valley Content Patcher mod. I am very familiar with coding these things and can easily spot errors. None of them got the coding right.


The mod request was simple: create the code for a machine that would output 1 random item per day, acting similar to the Worm Bin and/or the Statue of Endless Fortune. (These actually work slightly different in their coding despite essentially doing the same output at start of day thing.) Most of the time, the player should get common farm items - stone, wood, fiber. But upon occasion, the player should be able to get either the Dish of the Day, a random seasonal vanilla item, or a random object in one of a couple of categories.

(It was fun to play with, but I initially put the percentages up too high for the random items and so I was getting stuff that was way out of balance with what I really wanted. Not that I really needed this; I just like random generators!)

All of them were able to write a good manifest.json, which is kind of the introductory file that all Stardew Valley mods use.

ChatGPT 5.2: Mostly decent code, including finding DISH_OF_THE_DAY and RANDOM_BASE_SEASON_ITEM (the saloon's Dish of the Day and the pool for when you break mine barrels/fish in garbage cans respectively) but refused to recognize that you could concisely put in percentages in; it insisted on multiple entries for wood/stone/fiber to weigh them higher.

Claude Sonnet 4.5: Pulled a similar odd way of weighing choices, though when I gave it an example from one of my other mods, it did fix the code. However, it failed to recognize RANDOM_BASE_SEASON_ITEM when I gave it the example of "trash/barrels"; it created a list of the trash items, which works, but not really what I wanted. It did write a nice recipe entry to make the machine though. (Mine I just set to buy from Robin.)

Llama 4 Maverick: I have no idea what in the heck this was doing. It got the structure right, but not much else.

ETA: I forgot Gemini!
Google Gemini 3: Gemini, like Claude, failed to find RANDOM_BASE_SEASON_ITEM; it recognized the idea of random chances and mostly got the idea correct but had the code for them wrong (see below on how to write that in Content Patcher). Even when I linked it to the wiki page that included that, it insisted I was wrong.

Here's the meat of the mod:

"OutputItem": [
{
"CustomData": null,
"ItemId": "(O)388",
"MaxItems": null,
"MinStack": 1,
"MaxStack": 3,
"Quality": 0,
"ModData": null,
"
},
The section above is my default. Apparently I decided that if nothing else applied, that I would get 1-3 wood. (O)388 is the Item ID for wood.
{
"CustomData": null,
"ItemId": "DISH_OF_THE_DAY",
"MaxItems": null,
"MinStack": 1,
"MaxStack": 1,
"Quality": 0,
"ModData": null,
"Condition": "RANDOM 0.1",
},
This section tells the game that there's a 10% chance that the machine will give me the Dish of the Day instead.
{
"CustomData": null,
"ItemId": "RANDOM_BASE_SEASON_ITEM",
"MaxItems": null,
"MinStack": 1,
"MaxStack": 1,
"Quality": 0,
"ModData": null,
"Condition": "RANDOM 0.1",
},
This section tells the game that there's a 10% chance that the machine will give me a random item from the list of base season items (that is, the ones you get from breaking barrels and digging through trash cans).
{
"CustomData": null,
"ItemId": "RANDOM_ITEMS (O)",
"PerItemCondition": "ITEM_CATEGORY Target -2 -5 -6 -12 -15 -16 -20 -21",
"MaxItems": null,
"MinStack": 1,
"MaxStack": 1,
"Quality": 0,
"ModData": null,
"Condition": "RANDOM 0.1",
}
This is another 10% chance. This one tells the game to choose an item out of one of 8 categories: gems, eggs, milk, minerals, metal resources, building resources, junk, or bait.

Assuming I've coded this correctly - still testing it, wouldn't put it in a serious game - it would choose an item for any categories where this was random (that is, not the dish of the day or the wood), then check to see if any of them rolled under 10%. It then chooses one category randomly and spawns the item. The order might be slightly different (the game might see if the percentage applies first, then rolls the random item). Not sure, doesn't change things.

Date: 2026-03-12 12:14 pm (UTC)
mara: (Girls can do)
From: [personal profile] mara
I 100% wouldn't go to AI for coding! Since you tried this, though, I'm curious...do you feel like any of these would have saved you any time if you'd done this seriously? I'm hearing *some* people say "oh, I just edit it and it's fine" but it seems to me like for complicated coding it's the same problem as letting genAI write your paper for you. Like, if you didn't write it yourself, can you really know how to fix it?

Not sure that makes any sense, but I hope it does :D

Date: 2026-03-12 03:12 pm (UTC)
brightknightie: Stonetree and Norma looking at a CRT monitor (Computer)
From: [personal profile] brightknightie
Definitely, for sure, don't let any LLM or AI agent run around unsupervised! :-) The best widely available at this time are like an eager intern on her first day, very useful if you explain absolutely everything and check the work, but cannot even find the restroom without directions. ;-)

I work on related stuff... may I share an opinion?

FWIW, the paid versions are more effective than the free versions right now, and I think this may be an important thing to be aware of. I expect that advances in the paid tiers will eventually filter down into the free tiers.

A key is to run everything (whether coding or report analysis or content generation) through multiple times (as you tried to do!!!) and/or get it to run tests/comparisons to get it to find its own errors and correct itself. Feeding examples of what one wants helps a lot ("few-shot prompting" v. "zero-shot prompting"). I have access at my workplace to paid tiers of Gemini and they can give much better output for the same guidance, and much much better for better guidance. I'm not a SWE, myself, but the engineers as a group seem to have widely varied feelings about integrating the tools into their workflows. They don't seem, overall, worried about being replaced by the tools, as such, as much as by being pressured to let the tools do the fun stuff for them instead of the tedious stuff, because management doesn't always understand that humans really must do the fun (thoughtful, puzzling, innovative) stuff, that it both gives better results and that not every human wants to be a Program Manager in her heart of hearts. ;-)

Date: 2026-03-13 02:38 pm (UTC)
brightknightie: Stonetree and Norma looking at a CRT monitor (Computer)
From: [personal profile] brightknightie
Understood!

I have to learn (and prove I'm using) vibe coding by mandate of my work leadership.

FWIW, you might want to dip your toe in again on a very small scale in 6 or 9 months? I understand that the models are getting exponentially better at the high end, though it takes a while to filter down. ~shrug~

Profile

estirose: A pixel portrait of a woman (Default)
estirose

March 2026

S M T W T F S
1234 567
8910 1112 1314
15161718192021
22232425262728
293031    

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 1st, 2026 07:37 pm
Powered by Dreamwidth Studios