|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
The survey from Issue #56 is still open. I encourage you to answer the survey so we can find ways to better support the postgresql community.
You can use POSIX regular expression pattern matching in PostgreSQL queries. Regular expressions give you a much wider range of pattern matching than either LIKE or SIMILAR TO. Regular expressions are the familiar ones you use with grep and other unix tools. There is a nice explanation of using regular expressions in the 7.4 PostgreSQL manual in 9.6.3. POSIX Regular Expressions. And a quick web surf can give you more explanations and tutorials. To match to regular expressions in a PostgreSQL query use the appropriate operator:
This is very handy when you would want to do something like selecting the rows with a column beginning with non-alphabetic characters. These queries give identical results. The first does a case insensitive search on values which do not start with characters a through z. The second does it in a more traditional case sensitive way. SELECT * FROM table_name WHERE col !~* '^[a-z]' SELECT * FROM table_name WHERE col !~ '^[a-zA-Z]';
In 7.4 it is easy to turn a list into a set of rows. There are two parts to this problem. The first is handling the comma separated list and the second is writing a function which returns a set. Two suggests were made for splitting the text apart: using the split_part(string text, delimiter text, field integer) which extracts a field from a delimited string and using the the string_to_array(string text, delimiter text) function. For plpgsql functions, I prefer using the array functions, however, if I were writing this in plperl, I would use the native perl functions. This example is about plpgsql. In this function, the comma delimited string is passed to the function unravel(). The delimiter is hard coded as a comma. A holding array is declared. Then the string is converted to an array. The FOR loop uses the array_upper() and array_lower() functions to determine the index into the array. The second parameter on these functions is the dimension of the array--in this case, it is a one dimensional array.
create or replace function unravel(text) returns SETOF int as ' declare v_list alias for $1; v_delim text := '',''; v_arr text[]; begin v_arr := string_to_array(v_list, v_delim); for i in array_lower(v_arr, 1)..array_upper(v_arr, 1) loop return next v_arr[i]::int; end loop; return; end; ' language plpgsql; regression=# select * from unravel('1,2,3,4,5'); unravel --------- 1 2 3 4 5 (5 rows)
MySQL has a feature where the first column of a table of type timestamp is updated whenever the row is inserted or updated. You have to use datetime fields if you want the column really update. This auto-updating of timestamps is helpful for cases where you want the system to track the last modified timestamp of a row for, say, auditing purposes. In order to do this in the smallest, most regular way, I add two columns called modified_timestamp to any table that I want to track the updates. This allows me to write a one function for all of them. CREATE OR REPLACE FUNCTION setmodtime() RETURNS TRIGGER AS ' BEGIN NEW.modified_timestamp = now(); RETURN NEW; END ' LANGUAGE 'plpgsql';
I usually have two variations on this function that operate on two
other standard columns: modified_user and create_timestamp.
These track the user who changed the row and when the row was created.
To track the modified user, add Having created the function, for each table and column that I want updated with the modification time, I add the triggers: CREATE TRIGGER tbl_upd BEFORE UPDATE ON tbl FOR EACH ROW EXECUTE PROCEDURE setmodtime(); CREATE TRIGGER tbl_ins BEFORE INSERT ON tbl FOR EACH ROW EXECUTE PROCEDURE setmodtime(); Be careful with these columns because if you try to change the modification_timestamp, it will be overwritten with the system time of now(). Also remember, now is being deprecated in favor of now().
The question was raised about PostgreSQL's performance with massive simultaneous inserts into a heavily indexed table. It was conjectured that the CPU could not handle this case. In fact it is unlikely to be CPU bound, instead it is much more likely to be I/O bound. The money to be spent on multiple CPUs would probably be much better spent on good disks. The discussion from there veered over to which disk configurations might be best under the circumstances. Tom Lane suggested that, "If your insert load exceeds the available disk bandwidth, I think you have little choice but to figure out a way to divide the workload among multiple independent databases." Alex Satrapa writes, "Properly set up in a RAID-1/0 array, you'll get much better "bandwidth" out of those drives. Whether you RAID in software or hardware is up to you and your budget - but if you choose hardware, make sure you actually get a hardware RAID controller, not one of the cheapies which are just multiple-IDE-controllers-with-special-drivers." Greg Stark wrote: I would agree and if you really need the I/O bandwidth you can go to much larger stripe sets than even this. The documentation I've seen before suggested there were benefits up to stripe sets as large as twelve disks across. That would be 24 drives if you're also doing mirroring. Ideally separating WAL, index, and heap files is good, but you would have to experiment to see which works out fastest for a given number of drives." It was also noted that indexes are not built at the end of the transaction, but rather as the inserts proceed. This is one of the main features of PostgreSQL's MVCC. Vivek Khera pointed out that he did a bunch of testing with a 14 disk SCSI array and found that RAID5 was best over RAID10 and RAID50. He added,"I found that putting the WAL on its own array (in my case a mirror on the other RAID controller channel) helped quite a bit. I don't think it is easy to split off index files to alternate locations with Postgres." "Increasing the number of checkpoint segments was one of the biggest improvements I observed for mass-insert performance (as tested while doing a restore on a multi-million row database.)" "The combination of having the WAL on a separate disk, and letting that grow to be quite large has been very good for my performance and also for reducing disk bandwidth requirements."
|
|||||||||||||||||||||||||||||||||
Comments and Corrections are welcome. Suggestions and contributions of items are also welcome. Send them in! Copyright A. Elein Mustain 2003, 2004, 2005, 2006, 2007, 2008, 2009 |