python - extract price from html tag -


this question has answer here:

<dt class="col2"> <p>rs. 2691.00 </p> </dt> 

from above html code,i need extract price using regular expressions.i used beautifulsoup parsing.

can propose regular expression above?

if you're trying "2691.00" use:

(?<=rs\.)\s*(\d+\.\d{2}) 

most regex engines can't * in lookbehind, make dynamic enough not fail if there's more 1 space left in main group. can either use main match , trim off excess spaces or use capture group 1.

(?<= ) positive lookbehind. tells regex engine whatever inside of has matched before main matching group, don't include in match.

rs\. matches "rs.". in regex . character matches have escape match period.

\s matches spaces.

* matches between 0 , infinity.

\d matches numbers.

+ matches between 1 , infinity. similar * has find @ least 1 successful match.

{2} means has find 2 of whatever before it. \d{2} same \d\d.

and have parenthesis around price's match create group. allows extract group entire match. used further if want extract "dollar" amount or change with:

((\d+)\.(\d{2})) 

then... , may have order wrong... capture group 1 contain 2691.00, capture group 2 contain 2691, , capture group 3 contain 00


Comments

Popular posts from this blog

c++ - Creating new partition disk winapi -

Android Prevent Bluetooth Pairing Dialog -

VBA function to include CDATA -