php - Safe to use strpos with UTF-8 strings? -


i have bunch of strings different charsets. $charset variable contains charset of current string.

$content = iconv($charset, 'utf-8', $content); 

with done, safe use strpos, strlen, substr etcetera , not multibyte equivalent? i'm asking because use preg_match lot well. if use preg_offset_capture position of word in string can't use value mb_substr remove before word.

that entirely depends on want do. core strlen , similar functions work on bytes. every number accept , return byte count or byte offset. mb_* functions work encoding-aware on characters. numbers accept , return character counts or offsets.

if have safe way of getting byte offset in string ("safe" meaning offset not in middle of multi-byte character) , then, example, crop before offset using substr, that'll work fine. instance:

$str     = '漢字'; $offset  = strpos($str, '字'); $cropped = substr($str, $offset); 

works fine.

however, won't work:

$cropped = substr($str, $offset, 1); 

you can't safely cut out single byte without running risk of cutting multi-byte character.


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -