Good article. Should have at least mentioned Claude 2 which has 100k context.
This is an example of why "mastering" a high technology doesn't have the same value as doing so for an old-fashioned skill in an area that changes less rapidly.
Good point!
I haven't been able to play around with Claude 2, but I have with GPT4-32k. To me, 32k sounded like a lot, but in practice it came out to fit only a few of the dozen or so files I was interested in working with. I still had to splice in only the most relevant context.
I think we're going to have a lot of the same problems until context lengths get several orders of magnitude longer.
This is an example of why "mastering" a high technology doesn't have the same value as doing so for an old-fashioned skill in an area that changes less rapidly.