python - extract price from html tag -
this question has answer here:
<dt class="col2"> <p>rs. 2691.00 </p> </dt>
from above html code,i need extract price using regular expressions.i used beautifulsoup parsing.
can propose regular expression above?
if you're trying "2691.00" use:
(?<=rs\.)\s*(\d+\.\d{2})
most regex engines can't * in lookbehind, make dynamic enough not fail if there's more 1 space left in main group. can either use main match , trim off excess spaces or use capture group 1.
(?<= )
positive lookbehind. tells regex engine whatever inside of has matched before main matching group, don't include in match.
rs\.
matches "rs.". in regex . character matches have escape match period.
\s
matches spaces.
*
matches between 0 , infinity.
\d
matches numbers.
+
matches between 1 , infinity. similar * has find @ least 1 successful match.
{2}
means has find 2 of whatever before it. \d{2}
same \d\d
.
and have parenthesis around price's match create group. allows extract group entire match. used further if want extract "dollar" amount or change with:
((\d+)\.(\d{2}))
then... , may have order wrong... capture group 1 contain 2691.00
, capture group 2 contain 2691
, , capture group 3 contain 00
Comments
Post a Comment