php - Safe to use strpos with UTF-8 strings? -
i have bunch of strings different charsets. $charset
variable contains charset of current string.
$content = iconv($charset, 'utf-8', $content);
with done, safe use strpos
, strlen
, substr
etcetera , not multibyte equivalent? i'm asking because use preg_match
lot well. if use preg_offset_capture
position of word in string can't use value mb_substr
remove before word.
that entirely depends on want do. core strlen
, similar functions work on bytes. every number accept , return byte count or byte offset. mb_* functions work encoding-aware on characters. numbers accept , return character counts or offsets.
if have safe way of getting byte offset in string ("safe" meaning offset not in middle of multi-byte character) , then, example, crop before offset using substr
, that'll work fine. instance:
$str = '漢字'; $offset = strpos($str, '字'); $cropped = substr($str, $offset);
works fine.
however, won't work:
$cropped = substr($str, $offset, 1);
you can't safely cut out single byte without running risk of cutting multi-byte character.
Comments
Post a Comment