php - Safe to use strpos with UTF-8 strings? -
i have bunch of strings different charsets. $charset variable contains charset of current string.
$content = iconv($charset, 'utf-8', $content); with done, safe use strpos, strlen, substr etcetera , not multibyte equivalent? i'm asking because use preg_match lot well. if use preg_offset_capture position of word in string can't use value mb_substr remove before word.
that entirely depends on want do. core strlen , similar functions work on bytes. every number accept , return byte count or byte offset. mb_* functions work encoding-aware on characters. numbers accept , return character counts or offsets.
if have safe way of getting byte offset in string ("safe" meaning offset not in middle of multi-byte character) , then, example, crop before offset using substr, that'll work fine. instance:
$str = '漢字'; $offset = strpos($str, '字'); $cropped = substr($str, $offset); works fine.
however, won't work:
$cropped = substr($str, $offset, 1); you can't safely cut out single byte without running risk of cutting multi-byte character.
Comments
Post a Comment