performance - Randomize timestamp column in large MySQL table -
i have test database table ~100m rows generated cloning original 3k rows multiple times. let's table describes events have timestamps. due cloning have ~10m events per day far real cases. i'd randomize date column , scatter records several days. here procedure i've come with:
drop procedure if exists `randomizedates`; delimiter // create procedure `randomizedates`(in `daterange` int) begin declare id int unsigned; declare buf timestamp; declare done int default false; declare cur1 cursor select event_id events; declare continue handler not found set done = true; open cur1; the_loop: loop fetch cur1 id; if done leave the_loop; end if; set buf = (select now() - interval floor(rand() * daterange) day); update events set starttime = buf event_id = id; end loop the_loop; close cur1; end // delimiter ;
on 3k table executes ~6 seconds assuming linear сomplexity take ~50 hours applied on 100m table. there way speed up? or maybe procedure incorrect @ all?
just do:
set @datarange = 7; update `events` set starttime = now() - interval floor(rand()) * @datarange day;
databases not @ fetching , processing single rows in lopp, used in procedural languages (iterators, each loops, arrays etc), best at, , optimized processing sql, essetially declarative language - declare want without specyfying how it, in contrast procedural languages, used specify steps program must do.
remember - row row = slow slow.
@ simple example simulates table , compares procedure update:
drop table `events`; create table `events` select * information_schema.tables 1=0; alter table `events` add column event_id int primary key auto_increment first; alter table `events` change column create_time starttime timestamp; insert `events` select null, t.* information_schema.tables t cross join ( select 1 information_schema.tables limit 100 ) xx mysql> select count(*) `events`; +----------+ | count(*) | +----------+ | 17200 | +----------+
we created table 17 thousand rows. call procedure:
mysql> call `randomizedates`(7); query ok, 0 rows affected (34.26 sec)
and update command:
mysql> set @datarange = 7; query ok, 0 rows affected (0.00 sec) mysql> update `events` -> set starttime = now() - interval floor(rand()) * @datarange day; query ok, 17200 rows affected (0.23 sec) rows matched: 17200 changed: 17200 warnings: 0
as see - 34 seconds / 0.23 second = 14782 % faster - it's huge difference !!!
Comments
Post a Comment