use AI to rewrite all the spells from all the books, then try to see if AI can detect the rewritten ones. This will ensure it's not pulling from it's trained data set.
Its a test. Like all tests, its more or less synthetic and focused on specific expected behavior. I am pretty far from llms now but this seems like a very good test to see how geniune this behavior actually is (or repeat it 10x with some scramble for going deeper).
This thread is about the find-and-replace, not the evaluation. Gambling on whether the first AI replaces the right spells just so the second one can try finding them is unnecessary when find-and-replace is faster, easier and works 100%.
... I'm not sure if you're trolling or if you missed the point again. The point is to test the contextual ability and correctness of the LLMs ability's to perform actions that would be hopefully guaranteed to not be in the training data.
It has nothing to do about the performance of the string replacement.
The initial "Find" is to see how well it performs actually find all the "spells" in this case, then to replace them. They using a separate context maybe, evaluate if the results are the same or are they skewed in favour of training data.
Playing a musical instrument also achieves this, it makes a game about the little things about the brain (an organ we all love), like sequencing, motor skills, etc..
Cannot emphasize this enough.
I literally built my own typing app for this reason - plateaued on monkeytype and wanted better detailed stats so I build typequicker.com
> Better when you have experiences to relate back with.
100% this. I've been teaching software design since the 1990's and it's so much easier when the audience has enough experience that I can wave my hands and say "you know when things go like this ... ?" and they do, and then we can get on with how to think about it and do better.
Without that, it's tedious. Folks with less experience have the brainpower to understand but not the context. I try to create the context with a synthetic example, but (again, waving hands...) you know how much richness your current system has compared to any example I can put on a page or a slide.
I have a groups like feature requests and backlog for catch-all tasks. Then for deadlines I use "Mid January" or "End of January" and move tasks into them, based on a rough hour estimate. Story points doesn't matter because I'm not quantifying complexity across a team members. It's just me working on it.
I work on the tasks that have the highest priority, build locally, test them in a QA instance, and then ship them to prod. I usually run smoke tests in prod just to make sure I don't break something.
I used to use GitHub tickets and JIRA, but over time I needed organization that didn't cause a lot of busy work. For myself, Todoist (paid plan) works.
If I need to add members in the future, I'll go back to using GitHub (kanban, labels, milestones). Most likely my marketing team will need a Kanban before I add another developer.
Other notes:
Use tools that get objectives done and don't create micro tasks.
Finally, create tasks during off time like during lunch, while working out, or on a walk break. I generally plan 2-3 days in advance on the exact work I need to get done. Don't over work yourself.
Right now most chatbot widgets are scripted and require forethought from the admin or business owner. So it's pretty hard figuring out the comment questions on your own or formulating them over time. Even harder when you don't know your customer journey or haven't solidified the ideal customer fit.
This is exactly why I created a bot [1] that blends chatbots and Stack Overflow together. You can have an automated first-line support system without scripting or conversational dances.
- Instead of conversations, you can reuse previous question/answers.
- Instead of guessing FAQs, you can let your users define them for you.
- When questions don't exist, they get catched in the traditional support flow (email). Then overtime, this effort declines.
reply