ruby - Can't figure out why match result is nil -
>> "<img src=\"https://filin.mail.ru/pic?width=90&height=90&email=multicc%40multicc.mail.ru&version=4&build=7\" style="">".match(regexp.new("<a href=\"http(s?):\/\/(?:\w+\.)+\w{1,5}.+?\">|<img src=\"http(s?):\/\/(?:\w+\.)+\w{1,5}.+?\"(?: style=\".+\")?>")) => nil
but testing in rubular says should catched
i can't understand why testing rubular says string should catched, , not.
regex wrong tool handling html (or xml) 99.9% of time. instead, use parser, nokogiri:
require 'nokogiri' html = '<img src="https://filin.mail.ru/pic?width=90&height=90&email=multicc%40multicc.mail.ru&version=4&build=7" style="">' doc = nokogiri::html(html) url = doc.at('img')['src'] # => "https://filin.mail.ru/pic?width=90&height=90&email=multicc%40multicc.mail.ru&version=4&build=7" doc.at('img')['style'] # => ""
once you've retrieved data want, such src
, use "right" tool, such uri:
require 'uri' scheme, userinfo, host, port, registry, path, opaque, query, fragment = uri.split(url) scheme # => "https" userinfo # => nil host # => "filin.mail.ru" port # => nil registry # => nil path # => "/pic" opaque # => nil query # => "width=90&height=90&email=multicc%40multicc.mail.ru&version=4&build=7" fragment # => nil query_parts = hash[uri.decode_www_form(query)] query_parts # => {"width"=>"90", "height"=>"90", "email"=>"multicc@multicc.mail.ru", "version"=>"4", "build"=>"7"}
Comments
Post a Comment