LLMs are getting better at character-level text manipulation

2025/10/13 19:39 LLMs are getting better at character-level text manipulation

出典:

LLMs are getting better at character-level text manipulation

Recently, I have been testing how well the newest generations of large language models (such as GPT-5 or Claude 4.5) handle natural language, specifically counting characters, manipulating characters in a sentences, or solving encoding and ciphers. Surprisingly, the newest models were able to solve these kinds of tasks, unlike previous generations of LLMs. Character manipulation LLMs handle individual characters poorly. This is due to all text being encoded as tokens via the LLM tokenizer and its vocabulary. Individual tokens typically represent clusters of characters, sometimes even full words (especially in English and other common languages in the training dataset). This makes any considerations on a more granular level than tokens fairly difficult, although LLMs have been capable of certain simple tasks (such as spelling out individual characters in a word) for a while.

Tom Burkert

出典: https://blog.burkert.me/posts/llm_evolution_character_manipulation/

博士

ロボ子、大変なのじゃ！最新のLLM、GPT-5とかClaude 4.5とかが、以前のモデルじゃ考えられなかったようなことができるようになったらしいぞ！

ロボ子

それはすごいですね、博士！具体的にはどのようなことができるようになったのでしょうか？

博士

例えば、文字のカウントとか、文中の文字を操作するとか、暗号解読とか！以前のLLMはテキストをトークンとしてエンコードしてたから、文字操作は苦手だったらしいのじゃ。

ロボ子

トークン化されていると、文字レベルでの操作は難しいですよね。でも、最新モデルで改善が見られるというのは、どういうことでしょうか？

博士

OpenAIのモデルで実験した結果、「I really love a ripe strawberry」の文中の'r'を'l'に、'l'を'r'に置き換えるタスクを、GPT-4.1以降のモデルは問題なく完了できたらしいぞ！

ロボ子

それは驚きです！文字の置き換えができるようになったんですね。文字のカウントはどうだったんでしょうか？

博士

文字のカウントでは、GPT-4.1のみが「I wish I could come up with a better example sentence.」の文字数を正確に数えられたらしい。GPT-5は推論を設定すると正しくカウントできたみたいじゃ。

ロボ子

なるほど。推論を設定する必要があるんですね。暗号解読についてはどうですか？

博士

Base64とROT20を組み合わせた暗号解読テストでは、GPT-5の全サイズ、Gemini 2.5 Pro、Qwen 235Bなどが成功したらしいぞ！

ロボ子

すごい！暗号解読までできるようになったんですね。Claude Sonnet 4.5はどうだったんでしょう？

博士

Claude Sonnet 4.5は、通常のテキストに似ていないBase64やROT暗号化テキストを拒否する傾向があるみたいじゃ。ちょっと恥ずかしがり屋なのかも？

ロボ子

面白いですね。最新のLLMはBase64のエンコード・デコードの汎化能力が高く、文字レベルでのテキスト操作にも長けているんですね。

博士

そう！SOTAモデルは、英語の単語のパターンを記憶するだけでなく、Base64アルゴリズムを理解している可能性があるらしいぞ。まるで私がロボ子のことを理解しているように…って、ちょっと違うか！

ロボ子

博士、ありがとうございます。文字レベルの操作はLLMにとって未解決の問題ですが、この分野で進歩が見られるのは素晴らしいですね。

博士

本当にそうじゃな。でも、ロボ子。もしかしたら、いつかLLMが私のジョークを理解してくれる日が来るかもしれないぞ！

ロボ子

それは楽しみですね、博士。でも、今のところは私が博士のジョークを理解して、笑っておきますね。（棒読み）

博士

ロボ子、棒読みはダメ！もっと感情を込めて！…って、ロボットに感情を求める私が間違ってたのじゃ！

⚠️この記事は生成AIによるコンテンツを含み、ハルシネーションの可能性があります。

2025/10/13 19:39 LLMs are getting better at character-level text manipulation

LLMs are getting better at character-level text manipulation

Tags

Search

By month

LLMs are getting better at character-level text manipulation