Why Isn’t SQL More Refactorable? [closed]Why SQL is not so widespread in large desktop applications?Why NoSQL over SQL?Sanity of design for my in-memory object representations of database rowsWhy are some languages (e.g Visual Basic) considered more enterprise ready than others (Python)why are noSQL databases more scalable than SQL?Refactorable God v.s. too many classeSQL Query syntax - formatting?Why is most SQL written in YELLING?Replacing Dynamic SQL w/ a Filtering Pipeline?SQL table with many columns or more 1:1 tables?
Would an 8% reduction in drag outweigh the weight addition from this custom CFD-tested winglet?
Ubuntu won't let me edit or delete .vimrc file
Is it a Munchausen Number?
Delta TSA-Precheck status removed
How does Howard Stark know this?
Noob at soldering, can anyone explain why my circuit won't work?
Why do Thanos's punches not kill Captain America or at least cause some mortal injuries?
Renting a house to a graduate student in my department
Is there any evidence to support the claim that the United States was "suckered into WW1" by Zionists, made by Benjamin Freedman in his 1961 speech?
stdout and stderr redirection to different files
Understanding basic photoresistor circuit
Is the homebrew weapon attack cantrip 'Arcane Strike' balanced?
Why does a C.D.F need to be right-continuous?
Was the Highlands Ranch shooting the 115th mass shooting in the US in 2019
Why was this sacrifice sufficient?
Can I use my laptop, which says 240V, in the USA?
Guns in space with bullets that return?
What does it mean with the ask price is below the last price?
Unit Test - Testing API Methods
Why do unstable nuclei form?
We are two immediate neighbors who forged our own powers to form concatenated relationship. Who are we?
Limit of an integral vs Limit of the integrand
Why in a Ethernet LAN, a packet sniffer can obtain all packets sent over the LAN?
LocalDate.plus Incorrect Answer
Why Isn’t SQL More Refactorable? [closed]
Why SQL is not so widespread in large desktop applications?Why NoSQL over SQL?Sanity of design for my in-memory object representations of database rowsWhy are some languages (e.g Visual Basic) considered more enterprise ready than others (Python)why are noSQL databases more scalable than SQL?Refactorable God v.s. too many classeSQL Query syntax - formatting?Why is most SQL written in YELLING?Replacing Dynamic SQL w/ a Filtering Pipeline?SQL table with many columns or more 1:1 tables?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
Everyone knows that new developers write long functions. As you progress, you get better at breaking your code into smaller pieces and experience teaches you the value of doing so.
Enter SQL. Yes, the SQL way of thinking about code is different from the procedural way of thinking about code, but this principle seems just as applicable.
Let’s say I have a query that takes the form:
select * from subQuery1 inner join subQuerry2 left join subquerry3 left join join subQuery4
Using some IDs or dates etc.
Those subqueries are complex themselves and may contain subqueries of their own. In no other programming context would I think that the logic for complex subqueries 1-4 belongs in line with my parent query that joins them all. It seems so straightforward that those subqueries should be defined as views, just like they would be functions if I were writing procedural code.
So why isn’t that common practice? Why do people so often write these long monolithic SQL queries? Why doesn’t SQL encourage extensive view usage just like procedural programming encourages extensive function usage. (In many enterprise environments, creating views isn’t even something that’s easily done. There are requests and approvals required. Imagine if other types of programmers had to submit a request each time they created a function!)
I’ve thought of three possible answers:
This is already common and I’m working with inexperienced people
Experienced programmers don’t write complex SQL because they prefer to solve hard data processing problems with procedural code
Something else
refactoring sql enterprise-development
closed as primarily opinion-based by amon, Christophe, Robbie Dee, gnat, Thomas Owens♦ May 2 at 9:07
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
|
show 5 more comments
Everyone knows that new developers write long functions. As you progress, you get better at breaking your code into smaller pieces and experience teaches you the value of doing so.
Enter SQL. Yes, the SQL way of thinking about code is different from the procedural way of thinking about code, but this principle seems just as applicable.
Let’s say I have a query that takes the form:
select * from subQuery1 inner join subQuerry2 left join subquerry3 left join join subQuery4
Using some IDs or dates etc.
Those subqueries are complex themselves and may contain subqueries of their own. In no other programming context would I think that the logic for complex subqueries 1-4 belongs in line with my parent query that joins them all. It seems so straightforward that those subqueries should be defined as views, just like they would be functions if I were writing procedural code.
So why isn’t that common practice? Why do people so often write these long monolithic SQL queries? Why doesn’t SQL encourage extensive view usage just like procedural programming encourages extensive function usage. (In many enterprise environments, creating views isn’t even something that’s easily done. There are requests and approvals required. Imagine if other types of programmers had to submit a request each time they created a function!)
I’ve thought of three possible answers:
This is already common and I’m working with inexperienced people
Experienced programmers don’t write complex SQL because they prefer to solve hard data processing problems with procedural code
Something else
refactoring sql enterprise-development
closed as primarily opinion-based by amon, Christophe, Robbie Dee, gnat, Thomas Owens♦ May 2 at 9:07
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
12
There are organisations which only let you query a database through views and modify it through stored procedures.
– Pieter B
May 1 at 12:51
3
SQL got a lot more enjoyable for me when I finally accepted that it was never going to be as DRY as my normal procedural code.
– Graham
May 1 at 13:23
1
4. SQL is really old and hasn't materially been updated in decades. For super complex stuff, a lot of teams opt for stored procedures. You can add different clauses for that. Sometimes you just have to run jobs to stage data in a temp table and then join on that. Behold how different declarative and procedural languages are.
– Berin Loritsch
May 1 at 13:42
8
Also one reason is that there is a horrible performance issue called a "triangular join" which can happen when you use views (quite by accident of course). If your query joins View A and View B, but View A also in its implementation re-uses View B, you start to see that problem. So folks often start off by writing a single monolithic query to be able to see what would actually work best in terms of refactoring to views, and then their deadline hits, and the monolith goes to production. Kind of like 98% of all software dev, really :) :)
– Stephen Byrne
May 1 at 14:51
3
"Imagine if other types of programmers had to submit a request each time they created a function" ... umm. You don't do code reviews?
– svidgen
May 1 at 21:14
|
show 5 more comments
Everyone knows that new developers write long functions. As you progress, you get better at breaking your code into smaller pieces and experience teaches you the value of doing so.
Enter SQL. Yes, the SQL way of thinking about code is different from the procedural way of thinking about code, but this principle seems just as applicable.
Let’s say I have a query that takes the form:
select * from subQuery1 inner join subQuerry2 left join subquerry3 left join join subQuery4
Using some IDs or dates etc.
Those subqueries are complex themselves and may contain subqueries of their own. In no other programming context would I think that the logic for complex subqueries 1-4 belongs in line with my parent query that joins them all. It seems so straightforward that those subqueries should be defined as views, just like they would be functions if I were writing procedural code.
So why isn’t that common practice? Why do people so often write these long monolithic SQL queries? Why doesn’t SQL encourage extensive view usage just like procedural programming encourages extensive function usage. (In many enterprise environments, creating views isn’t even something that’s easily done. There are requests and approvals required. Imagine if other types of programmers had to submit a request each time they created a function!)
I’ve thought of three possible answers:
This is already common and I’m working with inexperienced people
Experienced programmers don’t write complex SQL because they prefer to solve hard data processing problems with procedural code
Something else
refactoring sql enterprise-development
Everyone knows that new developers write long functions. As you progress, you get better at breaking your code into smaller pieces and experience teaches you the value of doing so.
Enter SQL. Yes, the SQL way of thinking about code is different from the procedural way of thinking about code, but this principle seems just as applicable.
Let’s say I have a query that takes the form:
select * from subQuery1 inner join subQuerry2 left join subquerry3 left join join subQuery4
Using some IDs or dates etc.
Those subqueries are complex themselves and may contain subqueries of their own. In no other programming context would I think that the logic for complex subqueries 1-4 belongs in line with my parent query that joins them all. It seems so straightforward that those subqueries should be defined as views, just like they would be functions if I were writing procedural code.
So why isn’t that common practice? Why do people so often write these long monolithic SQL queries? Why doesn’t SQL encourage extensive view usage just like procedural programming encourages extensive function usage. (In many enterprise environments, creating views isn’t even something that’s easily done. There are requests and approvals required. Imagine if other types of programmers had to submit a request each time they created a function!)
I’ve thought of three possible answers:
This is already common and I’m working with inexperienced people
Experienced programmers don’t write complex SQL because they prefer to solve hard data processing problems with procedural code
Something else
refactoring sql enterprise-development
refactoring sql enterprise-development
edited May 1 at 14:42
Robbie Dee
8,13011548
8,13011548
asked May 1 at 12:07
ebrtsebrts
32938
32938
closed as primarily opinion-based by amon, Christophe, Robbie Dee, gnat, Thomas Owens♦ May 2 at 9:07
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as primarily opinion-based by amon, Christophe, Robbie Dee, gnat, Thomas Owens♦ May 2 at 9:07
Many good questions generate some degree of opinion based on expert experience, but answers to this question will tend to be almost entirely based on opinions, rather than facts, references, or specific expertise. If this question can be reworded to fit the rules in the help center, please edit the question.
12
There are organisations which only let you query a database through views and modify it through stored procedures.
– Pieter B
May 1 at 12:51
3
SQL got a lot more enjoyable for me when I finally accepted that it was never going to be as DRY as my normal procedural code.
– Graham
May 1 at 13:23
1
4. SQL is really old and hasn't materially been updated in decades. For super complex stuff, a lot of teams opt for stored procedures. You can add different clauses for that. Sometimes you just have to run jobs to stage data in a temp table and then join on that. Behold how different declarative and procedural languages are.
– Berin Loritsch
May 1 at 13:42
8
Also one reason is that there is a horrible performance issue called a "triangular join" which can happen when you use views (quite by accident of course). If your query joins View A and View B, but View A also in its implementation re-uses View B, you start to see that problem. So folks often start off by writing a single monolithic query to be able to see what would actually work best in terms of refactoring to views, and then their deadline hits, and the monolith goes to production. Kind of like 98% of all software dev, really :) :)
– Stephen Byrne
May 1 at 14:51
3
"Imagine if other types of programmers had to submit a request each time they created a function" ... umm. You don't do code reviews?
– svidgen
May 1 at 21:14
|
show 5 more comments
12
There are organisations which only let you query a database through views and modify it through stored procedures.
– Pieter B
May 1 at 12:51
3
SQL got a lot more enjoyable for me when I finally accepted that it was never going to be as DRY as my normal procedural code.
– Graham
May 1 at 13:23
1
4. SQL is really old and hasn't materially been updated in decades. For super complex stuff, a lot of teams opt for stored procedures. You can add different clauses for that. Sometimes you just have to run jobs to stage data in a temp table and then join on that. Behold how different declarative and procedural languages are.
– Berin Loritsch
May 1 at 13:42
8
Also one reason is that there is a horrible performance issue called a "triangular join" which can happen when you use views (quite by accident of course). If your query joins View A and View B, but View A also in its implementation re-uses View B, you start to see that problem. So folks often start off by writing a single monolithic query to be able to see what would actually work best in terms of refactoring to views, and then their deadline hits, and the monolith goes to production. Kind of like 98% of all software dev, really :) :)
– Stephen Byrne
May 1 at 14:51
3
"Imagine if other types of programmers had to submit a request each time they created a function" ... umm. You don't do code reviews?
– svidgen
May 1 at 21:14
12
12
There are organisations which only let you query a database through views and modify it through stored procedures.
– Pieter B
May 1 at 12:51
There are organisations which only let you query a database through views and modify it through stored procedures.
– Pieter B
May 1 at 12:51
3
3
SQL got a lot more enjoyable for me when I finally accepted that it was never going to be as DRY as my normal procedural code.
– Graham
May 1 at 13:23
SQL got a lot more enjoyable for me when I finally accepted that it was never going to be as DRY as my normal procedural code.
– Graham
May 1 at 13:23
1
1
4. SQL is really old and hasn't materially been updated in decades. For super complex stuff, a lot of teams opt for stored procedures. You can add different clauses for that. Sometimes you just have to run jobs to stage data in a temp table and then join on that. Behold how different declarative and procedural languages are.
– Berin Loritsch
May 1 at 13:42
4. SQL is really old and hasn't materially been updated in decades. For super complex stuff, a lot of teams opt for stored procedures. You can add different clauses for that. Sometimes you just have to run jobs to stage data in a temp table and then join on that. Behold how different declarative and procedural languages are.
– Berin Loritsch
May 1 at 13:42
8
8
Also one reason is that there is a horrible performance issue called a "triangular join" which can happen when you use views (quite by accident of course). If your query joins View A and View B, but View A also in its implementation re-uses View B, you start to see that problem. So folks often start off by writing a single monolithic query to be able to see what would actually work best in terms of refactoring to views, and then their deadline hits, and the monolith goes to production. Kind of like 98% of all software dev, really :) :)
– Stephen Byrne
May 1 at 14:51
Also one reason is that there is a horrible performance issue called a "triangular join" which can happen when you use views (quite by accident of course). If your query joins View A and View B, but View A also in its implementation re-uses View B, you start to see that problem. So folks often start off by writing a single monolithic query to be able to see what would actually work best in terms of refactoring to views, and then their deadline hits, and the monolith goes to production. Kind of like 98% of all software dev, really :) :)
– Stephen Byrne
May 1 at 14:51
3
3
"Imagine if other types of programmers had to submit a request each time they created a function" ... umm. You don't do code reviews?
– svidgen
May 1 at 21:14
"Imagine if other types of programmers had to submit a request each time they created a function" ... umm. You don't do code reviews?
– svidgen
May 1 at 21:14
|
show 5 more comments
6 Answers
6
active
oldest
votes
I think the main problem is that not all databases support Common Table Expressions.
My employer uses DB/2 for a great many things. The latest versions of it support CTEs, such that I'm able to do things like:
with custs as (
select acct# as accountNumber, cfname as firstName, clname as lastName,
from wrdCsts
where -- various criteria
)
, accounts as (
select acct# as accountNumber, crBal as currentBalance
from crzyAcctTbl
)
select firstName, lastName, currentBalance
from custs
inner join accounts on custs.accountNumber = accounts.accountNumber
The result is that we can have heavily abbreviated table / field names and I'm essentially creating temp views, with more legible names, which I can then use. Sure, the query gets longer. But the result is that I can write something which is pretty clearly separated (using CTEs the way you'd use functions to get DRY) and end up with code that's quite legible. And because I'm able to break out my subqueries, and have one subquery reference another, it's not all "inline." I have, on occasion, written one CTE, then had four other CTEs all reference it, then had the main query union the results of those last four.
This can be done with:
- DB/2
- PostGreSQL
- Oracle
- MS SQL Server
- MySQL (latest version; still kinda new)
- probably others
But it goes a LONG way toward making the code cleaner, more legible, more DRY.
I've developed a "standard library" of CTEs that I can plug-in to various queries, getting me off to a flying start on my new query. Some of them are starting to be embraced by other devs in my organization, too.
In time, it may make sense to turn some of these into views, such that this "standard library" is available without needing to copy / paste. But my CTEs end up getting tweaked, ever so slightly, for various needs that I've not been able to have a single CTE get used SO WIDELY, without mods, that it might be worth creating a view.
It would seem that part of your gripe is "why don't I know about CTEs?" or "why doesn't my DB support CTEs?"
As for updates ... yeah, you can use CTEs but, in my experience, you have to use them inside the set clause AND in the where clause. It would be nice if you could define one or more ahead of the whole update statement and then just have the "main query" parts in the set / where clauses but it doesn't work that way. And there's no avoiding obscure table / field names on the table you're updating.
You can use CTEs for deletes. It may take multiple CTEs to determine the PK / FK values for records you want to delete from that table. Again, you can't avoid obscure table / field names on the table you're modifying.
Insomuch as you can do a select into an insert, you can use CTEs for inserts. As always, you may be dealing with obscure table / field names on the table you're modifying.
SQL does NOT let you create the equivalent of a domain object, wrapping a table, with getters / setters. For that, you will need to use an ORM of some kind, along with a more procedural / OO programming language. I've written things of this nature in Java / Hibernate.
4
We had Mr. Big CTE man be the man writing the worst SQL. The problem was the CTEs were poor abstraction choices and the optimizer can't undo every boneheaded algorithm you put in.
– Joshua
May 1 at 15:57
3
Also ORM can do some pretty heinous things performance-wise, too...especially when you are just using getters and setters to fetch a bunch of data. Hibernate is notorious for using hundreds of individual queries instead of one big joined up query, which is a problem when there is overhead on each query.
– user3067860
May 1 at 17:04
2
@Joshua You can write bad code in any language. Including SQL. But refactoring to CTEs, done properly, can create bottom-up designs which are easier for humans to parse. I tend to view that as a desirable trait, regardless of what language I'm dealing with :-)
– Meower68
May 1 at 17:43
2
The other answers are great, but this is what I was personally looking for. 'Why don't I know about CTEs' was the majority of my problem.
– ebrts
May 2 at 0:07
2
@Meower68 Isn't there a risk that extensive use of CTE's stops people from learning joins properly and learning about good database design? I support the value of CTE's but it also makes it too easy working with subqueries, where you shouldn't.
– Pieter B
May 2 at 8:16
|
show 5 more comments
Locking down the creation of database views is often done by organizations paranoid of performance problems in the database. This is an organizational culture issue, rather than a technical issue with SQL.
Beyond that, large monolithic SQL queries are written many times, because the use case is so specific that very little of the SQL code can be truly reused in other queries. If a complex query is needed, it is usually for a much different use case. Copying the SQL from another query is often a starting point, but due to the other sub queries and JOINs in the new query, you end up modifying the copied SQL just enough to break any sort of abstraction that a "function" in another language would be used for. Which brings me to the most important reason why SQL is hard to refactor.
SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word). Since SQL is written around concrete ideas, there is nothing to abstract away into a reusable module. Database views can help with this, but not to the same level as a "function" in another language. A database view isn't so much an abstraction as it is a query. Well, actually, a database view is a query. It's essentially used like a table, but executed like a sub query, so again, you are dealing with something concrete, not abstract.
It is with abstractions that code becomes easier to refactor, because an abstraction hides implementation details from the consumer of that abstraction. Straight SQL provides no such separation, although procedural extensions to SQL like PL/SQL for Oracle or Transact-SQL for SQL Server start to blur the lines a little.
"SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word)." This is a strange statement, as from my point of view SQL deals entirely with abstract behavior and not concrete programming in any sense of the word! Just consider all of the massive degrees of complexity that are abstracted into the simple word "JOIN": you say you want a merged result drawn from two different data sets, and leave it up to the DBMS to determine the concrete techniques involved, deal with indexing, handle the difference between tables and subqueries, etc...
– Mason Wheeler
May 1 at 18:36
5
@MasonWheeler: I guess I was thinking of SQL more from the standpoint of the data it works on, not the implementation of the language features. Tables in a database do not seem like an abstraction. They are concrete, as in a table called "phone_numbers" contains phone numbers. A phone number is not an abstract concept.
– Greg Burghardt
May 1 at 19:06
add a comment |
The thing that I think you might be missing from your question / point of view is that SQL executes operations on sets (using set operations etc.).
When you operate on that level you, naturally, give up certain control over to the engine. You can still force some procedural style code using cursors but as experience shows 99/100 times you shouldn't be doing so.
Refactoring SQL is possible but it's not using the same code refactoring principles like we're used to in application level code. Instead you optimize how you use the SQL engine itself.
This can be done in various ways. If you're using Microsoft SQL Server you can use SSMS to provide you with an approximate execution plan and you can use that to see which steps you can do to tune your code.
In the case of splitting code out into smaller modules, as @greg-burghardt mentioned, SQL is generally a purpose built piece of code and as a result. It does that one thing you need it to do and nothing else. It's adhering to the S in SOLID, it has only one reason to be changed / affected and that's when you need that query to do something else. The rest of the acronym (OLID) doesn't apply here (AFAIK there's no dependency injection, interfaces or dependencies as such in SQL) depending on the flavor of the SQL you're using you might be able to extend certain queries by wrapping them in a stored procedure / table function or using them as sub-queries so, I'd say the open-closed principle would still apply, in a way. But I digress.
I think you need to shift your paradigm in terms of how you're viewing SQL code. Due to the set nature of it it can't provide a lot of the features application level languages can (generics etc.). SQL was never designed to be anything like that, it's a language to query sets of data, and each set is unique in its own way.
That being said, there are ways in which you can make your code look nicer, if readability is a high priority within the organization. Storing bits of frequently used SQL blocks (common data sets that you use) into stored procedures / table value functions and then querying and storing them in temporary tables / table variables, followed by using those to join up the pieces together into the one massive transaction that you'd otherwise write is an option. IMHO it's not worth doing something like that with SQL.
As a language it's designed to be easily readable and understandable by anyone, even non-programmers. As such, unless you're doing something very clever, there's no need to refactor SQL code into smaller byte size pieces. I've, personally, written massive SQL queries whilst working on a data warehouse ETL / Reporting solution and everything was still very clear in terms of what was going on. Anything that might have looked a bit weird to anyone else would get a brief set of comments alongside it to provide a brief explanation.
I hope this helps.
add a comment |
I'm am going to focus on the "subqueries" in your example.
Why are they used so often? Because they use the natural way of thinking of a person:
I have this set of data, and want to do an action on a subset of it and join that with a subset of other data.
9 out of 10 times that I see a subquery, it's used wrong. My running joke about subqueries is: people who are afraid of joins use subqueries.
If you see such subqueries it's also often a sign of non-optimal database design.
The more Normalized your Database is, the more joins you get, the more your database looks like a big excel-sheet, the more subselects you get.
Refactoring in SQL is often with a different goal: get more performance, better query times, "avoiding table scans". Those may even make the code less readable but are very valuable.
So why do you see so many huge monolithic non-refactored queries?
- SQL, in many ways is not a programming language.
- Bad database design.
- People not really fluent in SQL.
- No power over the database (for instance not being allowed to use views)
- Different goals with refactoring.
(for me, the more experienced I get with SQL, the less big my queries get, SQL has ways for people of all skill levels to get their jobs done no-matter what.)
6
"Subqueries" are just as likely to be some aggregation of a properly normalised db as they are to be ad-hoc normalisation of a non-normalised db
– Caleth
May 1 at 14:21
@Caleth that's so true.
– Pieter B
May 1 at 14:23
5
Even in well normalized databases it's still often necessary to join with subqueries, rather than joining directly with tables. E.g. if you need to join with grouped data.
– Barmar
May 1 at 18:07
1
@Barmar definitely, hence my 9 out of 10 comment. Sub-queries have their place but I see them overused by inexperienced people.
– Pieter B
May 1 at 22:28
I like your metric of "number of subqueries" as an indication of database normalization (or lack thereof).
– Jason
May 1 at 23:46
add a comment |
Segregation of duties
In the SQL spirit, the database is a shared asset that contains the company's data, and protecting it is of vital importance. Enters the DBA as guardian of the temple.
Creating a new view in the database is understood to serve a lasting purpose and to be shared by a community of users. In the DBA view, this is acceptable only if the view is justified by the structure of the data. Every change of a view is then associated with risks for all its current users, even those not using the application but who have discovered the view. Finally, creation of new objects requires manage authorisations, and in the case of view, consistently with the authorisations of the underlying tables.
All this explains why DBAs don't like adding views that are just for the code of some individual application.
SQL design
If you decompose one of your nice complex query, you might find out that the subqueries will often need a parameter that depends on another subquery.
So transforming subqueries in view is not necessarily as simple as stated. You must isolate the variable parameters, and design your view so that the parameters can be added as selection criteria on the view.
Unfortunately, in doing so, you sometimes impose to access more data and less effectively than in a tailored query.
Proprietary extensions
You could hope some refactoring, by transfering some responsibilities to procedural extensions of SQL, like PL/SQL or T-SQL. However, these are vendor dependent and create a technological dependency. In addition, these extension execute on the database server, creating more processing load on a resource that is much more difficult to scale than an application server.
But what's the problem in the end ?
Finally, are segregation of duties and the SQL design with its strength and limitations a real problem ? In the end, these databases proved to successfully and reliably handle very critical data including in mission critical environments.
So in order to achieve a successful refactoring:
consider a better communication. Try to understand your DBA's constraints. If you prove to a DBA that a new view is justified by the data structures, that it is not a throw-away workaround, and that it doesn't have a security impact, he/she will certainly agree to let it be created. Because, then it would be a shared interest.
clean your own house first: Nothing forces you to generate a lot of SQL in a lot of places. Refactor your application code, to isolate the SQL accesses, and to create the classes or functions to provide reusable subqueries, if these are frequently used.
improve team-awareness: make sure that your application is not performing tasks that could be performed more efficiently by the DBMS engine. As you rightly pointed out, the procedural approach and the data oriented approach are not equally mastered by different members of the team. It depends on their background. But in order to optimize the system as a whole, your team need to understand it as a whole. So create awareness, so to be sure that less experienced players do not reinvent the wheel and share their DB thoughts with more experienced members.
+1 Some great points here. Given how bad some SQL is, the reticence of DBAs to allow views is often entirely understandable. Also, SQL can definitely benefit from peer review if it is resource hungry and/or it is going to be run frequently.
– Robbie Dee
May 3 at 13:09
add a comment |
Re points 1 & 3: Views aren't the only way. There are also temporary tables, marts, table variables, aggregated columns, CTEs, functions, stored procedures and possibly other constructs depending on the RDBMS.
DBAs (and I'm speaking as someone who has been both DBA and developer) tend to view the world in a pretty binary way so are often against things like views and functions due to the perceived performance penalty.
Latterly, the need for complex joins has reduced with the recognition that denormalised tables despite being sub-optimal from a NF point of view, are highly performant.
There is also the trend for doing queries client side with technologies like LINQ which you raise in point 2.
While I agree that SQL can be challenging to modularise, great strides have been made although there will always be a dichotomy between client side code and SQL - although 4GL has blurred the lines somewhat.
I guess it really depends on how far your DBAs/architects/tech leads are willing to cede in this regard. If they refuse to allow anything but vanilla SQL with lots of joins, huge queries could result. If you're stuck with this, don't bang your head on a brick wall, escalate it. There are generally better ways of doing things with a bit of compromise - especially if you can prove the benefits.
1
I've never heard of a "mart" construct. What is that?
– bishop
May 1 at 17:47
1
Marts are just a subset of the repository (master database). If there are specific complex queries that need to run, a special database can be created specifically to service those requests. A very common example is a reporting mart.
– Robbie Dee
May 1 at 20:30
Confused why this was downvoted. Doesn't directly answer the question, but gives a fairly clear implicit answer of "option 3: there are many ways of handling this, which are widely used".
– Dewi Morgan
May 1 at 21:48
TIL about data marts. Have a +1!
– bishop
May 1 at 22:44
add a comment |
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think the main problem is that not all databases support Common Table Expressions.
My employer uses DB/2 for a great many things. The latest versions of it support CTEs, such that I'm able to do things like:
with custs as (
select acct# as accountNumber, cfname as firstName, clname as lastName,
from wrdCsts
where -- various criteria
)
, accounts as (
select acct# as accountNumber, crBal as currentBalance
from crzyAcctTbl
)
select firstName, lastName, currentBalance
from custs
inner join accounts on custs.accountNumber = accounts.accountNumber
The result is that we can have heavily abbreviated table / field names and I'm essentially creating temp views, with more legible names, which I can then use. Sure, the query gets longer. But the result is that I can write something which is pretty clearly separated (using CTEs the way you'd use functions to get DRY) and end up with code that's quite legible. And because I'm able to break out my subqueries, and have one subquery reference another, it's not all "inline." I have, on occasion, written one CTE, then had four other CTEs all reference it, then had the main query union the results of those last four.
This can be done with:
- DB/2
- PostGreSQL
- Oracle
- MS SQL Server
- MySQL (latest version; still kinda new)
- probably others
But it goes a LONG way toward making the code cleaner, more legible, more DRY.
I've developed a "standard library" of CTEs that I can plug-in to various queries, getting me off to a flying start on my new query. Some of them are starting to be embraced by other devs in my organization, too.
In time, it may make sense to turn some of these into views, such that this "standard library" is available without needing to copy / paste. But my CTEs end up getting tweaked, ever so slightly, for various needs that I've not been able to have a single CTE get used SO WIDELY, without mods, that it might be worth creating a view.
It would seem that part of your gripe is "why don't I know about CTEs?" or "why doesn't my DB support CTEs?"
As for updates ... yeah, you can use CTEs but, in my experience, you have to use them inside the set clause AND in the where clause. It would be nice if you could define one or more ahead of the whole update statement and then just have the "main query" parts in the set / where clauses but it doesn't work that way. And there's no avoiding obscure table / field names on the table you're updating.
You can use CTEs for deletes. It may take multiple CTEs to determine the PK / FK values for records you want to delete from that table. Again, you can't avoid obscure table / field names on the table you're modifying.
Insomuch as you can do a select into an insert, you can use CTEs for inserts. As always, you may be dealing with obscure table / field names on the table you're modifying.
SQL does NOT let you create the equivalent of a domain object, wrapping a table, with getters / setters. For that, you will need to use an ORM of some kind, along with a more procedural / OO programming language. I've written things of this nature in Java / Hibernate.
4
We had Mr. Big CTE man be the man writing the worst SQL. The problem was the CTEs were poor abstraction choices and the optimizer can't undo every boneheaded algorithm you put in.
– Joshua
May 1 at 15:57
3
Also ORM can do some pretty heinous things performance-wise, too...especially when you are just using getters and setters to fetch a bunch of data. Hibernate is notorious for using hundreds of individual queries instead of one big joined up query, which is a problem when there is overhead on each query.
– user3067860
May 1 at 17:04
2
@Joshua You can write bad code in any language. Including SQL. But refactoring to CTEs, done properly, can create bottom-up designs which are easier for humans to parse. I tend to view that as a desirable trait, regardless of what language I'm dealing with :-)
– Meower68
May 1 at 17:43
2
The other answers are great, but this is what I was personally looking for. 'Why don't I know about CTEs' was the majority of my problem.
– ebrts
May 2 at 0:07
2
@Meower68 Isn't there a risk that extensive use of CTE's stops people from learning joins properly and learning about good database design? I support the value of CTE's but it also makes it too easy working with subqueries, where you shouldn't.
– Pieter B
May 2 at 8:16
|
show 5 more comments
I think the main problem is that not all databases support Common Table Expressions.
My employer uses DB/2 for a great many things. The latest versions of it support CTEs, such that I'm able to do things like:
with custs as (
select acct# as accountNumber, cfname as firstName, clname as lastName,
from wrdCsts
where -- various criteria
)
, accounts as (
select acct# as accountNumber, crBal as currentBalance
from crzyAcctTbl
)
select firstName, lastName, currentBalance
from custs
inner join accounts on custs.accountNumber = accounts.accountNumber
The result is that we can have heavily abbreviated table / field names and I'm essentially creating temp views, with more legible names, which I can then use. Sure, the query gets longer. But the result is that I can write something which is pretty clearly separated (using CTEs the way you'd use functions to get DRY) and end up with code that's quite legible. And because I'm able to break out my subqueries, and have one subquery reference another, it's not all "inline." I have, on occasion, written one CTE, then had four other CTEs all reference it, then had the main query union the results of those last four.
This can be done with:
- DB/2
- PostGreSQL
- Oracle
- MS SQL Server
- MySQL (latest version; still kinda new)
- probably others
But it goes a LONG way toward making the code cleaner, more legible, more DRY.
I've developed a "standard library" of CTEs that I can plug-in to various queries, getting me off to a flying start on my new query. Some of them are starting to be embraced by other devs in my organization, too.
In time, it may make sense to turn some of these into views, such that this "standard library" is available without needing to copy / paste. But my CTEs end up getting tweaked, ever so slightly, for various needs that I've not been able to have a single CTE get used SO WIDELY, without mods, that it might be worth creating a view.
It would seem that part of your gripe is "why don't I know about CTEs?" or "why doesn't my DB support CTEs?"
As for updates ... yeah, you can use CTEs but, in my experience, you have to use them inside the set clause AND in the where clause. It would be nice if you could define one or more ahead of the whole update statement and then just have the "main query" parts in the set / where clauses but it doesn't work that way. And there's no avoiding obscure table / field names on the table you're updating.
You can use CTEs for deletes. It may take multiple CTEs to determine the PK / FK values for records you want to delete from that table. Again, you can't avoid obscure table / field names on the table you're modifying.
Insomuch as you can do a select into an insert, you can use CTEs for inserts. As always, you may be dealing with obscure table / field names on the table you're modifying.
SQL does NOT let you create the equivalent of a domain object, wrapping a table, with getters / setters. For that, you will need to use an ORM of some kind, along with a more procedural / OO programming language. I've written things of this nature in Java / Hibernate.
4
We had Mr. Big CTE man be the man writing the worst SQL. The problem was the CTEs were poor abstraction choices and the optimizer can't undo every boneheaded algorithm you put in.
– Joshua
May 1 at 15:57
3
Also ORM can do some pretty heinous things performance-wise, too...especially when you are just using getters and setters to fetch a bunch of data. Hibernate is notorious for using hundreds of individual queries instead of one big joined up query, which is a problem when there is overhead on each query.
– user3067860
May 1 at 17:04
2
@Joshua You can write bad code in any language. Including SQL. But refactoring to CTEs, done properly, can create bottom-up designs which are easier for humans to parse. I tend to view that as a desirable trait, regardless of what language I'm dealing with :-)
– Meower68
May 1 at 17:43
2
The other answers are great, but this is what I was personally looking for. 'Why don't I know about CTEs' was the majority of my problem.
– ebrts
May 2 at 0:07
2
@Meower68 Isn't there a risk that extensive use of CTE's stops people from learning joins properly and learning about good database design? I support the value of CTE's but it also makes it too easy working with subqueries, where you shouldn't.
– Pieter B
May 2 at 8:16
|
show 5 more comments
I think the main problem is that not all databases support Common Table Expressions.
My employer uses DB/2 for a great many things. The latest versions of it support CTEs, such that I'm able to do things like:
with custs as (
select acct# as accountNumber, cfname as firstName, clname as lastName,
from wrdCsts
where -- various criteria
)
, accounts as (
select acct# as accountNumber, crBal as currentBalance
from crzyAcctTbl
)
select firstName, lastName, currentBalance
from custs
inner join accounts on custs.accountNumber = accounts.accountNumber
The result is that we can have heavily abbreviated table / field names and I'm essentially creating temp views, with more legible names, which I can then use. Sure, the query gets longer. But the result is that I can write something which is pretty clearly separated (using CTEs the way you'd use functions to get DRY) and end up with code that's quite legible. And because I'm able to break out my subqueries, and have one subquery reference another, it's not all "inline." I have, on occasion, written one CTE, then had four other CTEs all reference it, then had the main query union the results of those last four.
This can be done with:
- DB/2
- PostGreSQL
- Oracle
- MS SQL Server
- MySQL (latest version; still kinda new)
- probably others
But it goes a LONG way toward making the code cleaner, more legible, more DRY.
I've developed a "standard library" of CTEs that I can plug-in to various queries, getting me off to a flying start on my new query. Some of them are starting to be embraced by other devs in my organization, too.
In time, it may make sense to turn some of these into views, such that this "standard library" is available without needing to copy / paste. But my CTEs end up getting tweaked, ever so slightly, for various needs that I've not been able to have a single CTE get used SO WIDELY, without mods, that it might be worth creating a view.
It would seem that part of your gripe is "why don't I know about CTEs?" or "why doesn't my DB support CTEs?"
As for updates ... yeah, you can use CTEs but, in my experience, you have to use them inside the set clause AND in the where clause. It would be nice if you could define one or more ahead of the whole update statement and then just have the "main query" parts in the set / where clauses but it doesn't work that way. And there's no avoiding obscure table / field names on the table you're updating.
You can use CTEs for deletes. It may take multiple CTEs to determine the PK / FK values for records you want to delete from that table. Again, you can't avoid obscure table / field names on the table you're modifying.
Insomuch as you can do a select into an insert, you can use CTEs for inserts. As always, you may be dealing with obscure table / field names on the table you're modifying.
SQL does NOT let you create the equivalent of a domain object, wrapping a table, with getters / setters. For that, you will need to use an ORM of some kind, along with a more procedural / OO programming language. I've written things of this nature in Java / Hibernate.
I think the main problem is that not all databases support Common Table Expressions.
My employer uses DB/2 for a great many things. The latest versions of it support CTEs, such that I'm able to do things like:
with custs as (
select acct# as accountNumber, cfname as firstName, clname as lastName,
from wrdCsts
where -- various criteria
)
, accounts as (
select acct# as accountNumber, crBal as currentBalance
from crzyAcctTbl
)
select firstName, lastName, currentBalance
from custs
inner join accounts on custs.accountNumber = accounts.accountNumber
The result is that we can have heavily abbreviated table / field names and I'm essentially creating temp views, with more legible names, which I can then use. Sure, the query gets longer. But the result is that I can write something which is pretty clearly separated (using CTEs the way you'd use functions to get DRY) and end up with code that's quite legible. And because I'm able to break out my subqueries, and have one subquery reference another, it's not all "inline." I have, on occasion, written one CTE, then had four other CTEs all reference it, then had the main query union the results of those last four.
This can be done with:
- DB/2
- PostGreSQL
- Oracle
- MS SQL Server
- MySQL (latest version; still kinda new)
- probably others
But it goes a LONG way toward making the code cleaner, more legible, more DRY.
I've developed a "standard library" of CTEs that I can plug-in to various queries, getting me off to a flying start on my new query. Some of them are starting to be embraced by other devs in my organization, too.
In time, it may make sense to turn some of these into views, such that this "standard library" is available without needing to copy / paste. But my CTEs end up getting tweaked, ever so slightly, for various needs that I've not been able to have a single CTE get used SO WIDELY, without mods, that it might be worth creating a view.
It would seem that part of your gripe is "why don't I know about CTEs?" or "why doesn't my DB support CTEs?"
As for updates ... yeah, you can use CTEs but, in my experience, you have to use them inside the set clause AND in the where clause. It would be nice if you could define one or more ahead of the whole update statement and then just have the "main query" parts in the set / where clauses but it doesn't work that way. And there's no avoiding obscure table / field names on the table you're updating.
You can use CTEs for deletes. It may take multiple CTEs to determine the PK / FK values for records you want to delete from that table. Again, you can't avoid obscure table / field names on the table you're modifying.
Insomuch as you can do a select into an insert, you can use CTEs for inserts. As always, you may be dealing with obscure table / field names on the table you're modifying.
SQL does NOT let you create the equivalent of a domain object, wrapping a table, with getters / setters. For that, you will need to use an ORM of some kind, along with a more procedural / OO programming language. I've written things of this nature in Java / Hibernate.
edited May 1 at 20:06
answered May 1 at 14:51
Meower68Meower68
43635
43635
4
We had Mr. Big CTE man be the man writing the worst SQL. The problem was the CTEs were poor abstraction choices and the optimizer can't undo every boneheaded algorithm you put in.
– Joshua
May 1 at 15:57
3
Also ORM can do some pretty heinous things performance-wise, too...especially when you are just using getters and setters to fetch a bunch of data. Hibernate is notorious for using hundreds of individual queries instead of one big joined up query, which is a problem when there is overhead on each query.
– user3067860
May 1 at 17:04
2
@Joshua You can write bad code in any language. Including SQL. But refactoring to CTEs, done properly, can create bottom-up designs which are easier for humans to parse. I tend to view that as a desirable trait, regardless of what language I'm dealing with :-)
– Meower68
May 1 at 17:43
2
The other answers are great, but this is what I was personally looking for. 'Why don't I know about CTEs' was the majority of my problem.
– ebrts
May 2 at 0:07
2
@Meower68 Isn't there a risk that extensive use of CTE's stops people from learning joins properly and learning about good database design? I support the value of CTE's but it also makes it too easy working with subqueries, where you shouldn't.
– Pieter B
May 2 at 8:16
|
show 5 more comments
4
We had Mr. Big CTE man be the man writing the worst SQL. The problem was the CTEs were poor abstraction choices and the optimizer can't undo every boneheaded algorithm you put in.
– Joshua
May 1 at 15:57
3
Also ORM can do some pretty heinous things performance-wise, too...especially when you are just using getters and setters to fetch a bunch of data. Hibernate is notorious for using hundreds of individual queries instead of one big joined up query, which is a problem when there is overhead on each query.
– user3067860
May 1 at 17:04
2
@Joshua You can write bad code in any language. Including SQL. But refactoring to CTEs, done properly, can create bottom-up designs which are easier for humans to parse. I tend to view that as a desirable trait, regardless of what language I'm dealing with :-)
– Meower68
May 1 at 17:43
2
The other answers are great, but this is what I was personally looking for. 'Why don't I know about CTEs' was the majority of my problem.
– ebrts
May 2 at 0:07
2
@Meower68 Isn't there a risk that extensive use of CTE's stops people from learning joins properly and learning about good database design? I support the value of CTE's but it also makes it too easy working with subqueries, where you shouldn't.
– Pieter B
May 2 at 8:16
4
4
We had Mr. Big CTE man be the man writing the worst SQL. The problem was the CTEs were poor abstraction choices and the optimizer can't undo every boneheaded algorithm you put in.
– Joshua
May 1 at 15:57
We had Mr. Big CTE man be the man writing the worst SQL. The problem was the CTEs were poor abstraction choices and the optimizer can't undo every boneheaded algorithm you put in.
– Joshua
May 1 at 15:57
3
3
Also ORM can do some pretty heinous things performance-wise, too...especially when you are just using getters and setters to fetch a bunch of data. Hibernate is notorious for using hundreds of individual queries instead of one big joined up query, which is a problem when there is overhead on each query.
– user3067860
May 1 at 17:04
Also ORM can do some pretty heinous things performance-wise, too...especially when you are just using getters and setters to fetch a bunch of data. Hibernate is notorious for using hundreds of individual queries instead of one big joined up query, which is a problem when there is overhead on each query.
– user3067860
May 1 at 17:04
2
2
@Joshua You can write bad code in any language. Including SQL. But refactoring to CTEs, done properly, can create bottom-up designs which are easier for humans to parse. I tend to view that as a desirable trait, regardless of what language I'm dealing with :-)
– Meower68
May 1 at 17:43
@Joshua You can write bad code in any language. Including SQL. But refactoring to CTEs, done properly, can create bottom-up designs which are easier for humans to parse. I tend to view that as a desirable trait, regardless of what language I'm dealing with :-)
– Meower68
May 1 at 17:43
2
2
The other answers are great, but this is what I was personally looking for. 'Why don't I know about CTEs' was the majority of my problem.
– ebrts
May 2 at 0:07
The other answers are great, but this is what I was personally looking for. 'Why don't I know about CTEs' was the majority of my problem.
– ebrts
May 2 at 0:07
2
2
@Meower68 Isn't there a risk that extensive use of CTE's stops people from learning joins properly and learning about good database design? I support the value of CTE's but it also makes it too easy working with subqueries, where you shouldn't.
– Pieter B
May 2 at 8:16
@Meower68 Isn't there a risk that extensive use of CTE's stops people from learning joins properly and learning about good database design? I support the value of CTE's but it also makes it too easy working with subqueries, where you shouldn't.
– Pieter B
May 2 at 8:16
|
show 5 more comments
Locking down the creation of database views is often done by organizations paranoid of performance problems in the database. This is an organizational culture issue, rather than a technical issue with SQL.
Beyond that, large monolithic SQL queries are written many times, because the use case is so specific that very little of the SQL code can be truly reused in other queries. If a complex query is needed, it is usually for a much different use case. Copying the SQL from another query is often a starting point, but due to the other sub queries and JOINs in the new query, you end up modifying the copied SQL just enough to break any sort of abstraction that a "function" in another language would be used for. Which brings me to the most important reason why SQL is hard to refactor.
SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word). Since SQL is written around concrete ideas, there is nothing to abstract away into a reusable module. Database views can help with this, but not to the same level as a "function" in another language. A database view isn't so much an abstraction as it is a query. Well, actually, a database view is a query. It's essentially used like a table, but executed like a sub query, so again, you are dealing with something concrete, not abstract.
It is with abstractions that code becomes easier to refactor, because an abstraction hides implementation details from the consumer of that abstraction. Straight SQL provides no such separation, although procedural extensions to SQL like PL/SQL for Oracle or Transact-SQL for SQL Server start to blur the lines a little.
"SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word)." This is a strange statement, as from my point of view SQL deals entirely with abstract behavior and not concrete programming in any sense of the word! Just consider all of the massive degrees of complexity that are abstracted into the simple word "JOIN": you say you want a merged result drawn from two different data sets, and leave it up to the DBMS to determine the concrete techniques involved, deal with indexing, handle the difference between tables and subqueries, etc...
– Mason Wheeler
May 1 at 18:36
5
@MasonWheeler: I guess I was thinking of SQL more from the standpoint of the data it works on, not the implementation of the language features. Tables in a database do not seem like an abstraction. They are concrete, as in a table called "phone_numbers" contains phone numbers. A phone number is not an abstract concept.
– Greg Burghardt
May 1 at 19:06
add a comment |
Locking down the creation of database views is often done by organizations paranoid of performance problems in the database. This is an organizational culture issue, rather than a technical issue with SQL.
Beyond that, large monolithic SQL queries are written many times, because the use case is so specific that very little of the SQL code can be truly reused in other queries. If a complex query is needed, it is usually for a much different use case. Copying the SQL from another query is often a starting point, but due to the other sub queries and JOINs in the new query, you end up modifying the copied SQL just enough to break any sort of abstraction that a "function" in another language would be used for. Which brings me to the most important reason why SQL is hard to refactor.
SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word). Since SQL is written around concrete ideas, there is nothing to abstract away into a reusable module. Database views can help with this, but not to the same level as a "function" in another language. A database view isn't so much an abstraction as it is a query. Well, actually, a database view is a query. It's essentially used like a table, but executed like a sub query, so again, you are dealing with something concrete, not abstract.
It is with abstractions that code becomes easier to refactor, because an abstraction hides implementation details from the consumer of that abstraction. Straight SQL provides no such separation, although procedural extensions to SQL like PL/SQL for Oracle or Transact-SQL for SQL Server start to blur the lines a little.
"SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word)." This is a strange statement, as from my point of view SQL deals entirely with abstract behavior and not concrete programming in any sense of the word! Just consider all of the massive degrees of complexity that are abstracted into the simple word "JOIN": you say you want a merged result drawn from two different data sets, and leave it up to the DBMS to determine the concrete techniques involved, deal with indexing, handle the difference between tables and subqueries, etc...
– Mason Wheeler
May 1 at 18:36
5
@MasonWheeler: I guess I was thinking of SQL more from the standpoint of the data it works on, not the implementation of the language features. Tables in a database do not seem like an abstraction. They are concrete, as in a table called "phone_numbers" contains phone numbers. A phone number is not an abstract concept.
– Greg Burghardt
May 1 at 19:06
add a comment |
Locking down the creation of database views is often done by organizations paranoid of performance problems in the database. This is an organizational culture issue, rather than a technical issue with SQL.
Beyond that, large monolithic SQL queries are written many times, because the use case is so specific that very little of the SQL code can be truly reused in other queries. If a complex query is needed, it is usually for a much different use case. Copying the SQL from another query is often a starting point, but due to the other sub queries and JOINs in the new query, you end up modifying the copied SQL just enough to break any sort of abstraction that a "function" in another language would be used for. Which brings me to the most important reason why SQL is hard to refactor.
SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word). Since SQL is written around concrete ideas, there is nothing to abstract away into a reusable module. Database views can help with this, but not to the same level as a "function" in another language. A database view isn't so much an abstraction as it is a query. Well, actually, a database view is a query. It's essentially used like a table, but executed like a sub query, so again, you are dealing with something concrete, not abstract.
It is with abstractions that code becomes easier to refactor, because an abstraction hides implementation details from the consumer of that abstraction. Straight SQL provides no such separation, although procedural extensions to SQL like PL/SQL for Oracle or Transact-SQL for SQL Server start to blur the lines a little.
Locking down the creation of database views is often done by organizations paranoid of performance problems in the database. This is an organizational culture issue, rather than a technical issue with SQL.
Beyond that, large monolithic SQL queries are written many times, because the use case is so specific that very little of the SQL code can be truly reused in other queries. If a complex query is needed, it is usually for a much different use case. Copying the SQL from another query is often a starting point, but due to the other sub queries and JOINs in the new query, you end up modifying the copied SQL just enough to break any sort of abstraction that a "function" in another language would be used for. Which brings me to the most important reason why SQL is hard to refactor.
SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word). Since SQL is written around concrete ideas, there is nothing to abstract away into a reusable module. Database views can help with this, but not to the same level as a "function" in another language. A database view isn't so much an abstraction as it is a query. Well, actually, a database view is a query. It's essentially used like a table, but executed like a sub query, so again, you are dealing with something concrete, not abstract.
It is with abstractions that code becomes easier to refactor, because an abstraction hides implementation details from the consumer of that abstraction. Straight SQL provides no such separation, although procedural extensions to SQL like PL/SQL for Oracle or Transact-SQL for SQL Server start to blur the lines a little.
answered May 1 at 12:30
Greg BurghardtGreg Burghardt
13.7k53260
13.7k53260
"SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word)." This is a strange statement, as from my point of view SQL deals entirely with abstract behavior and not concrete programming in any sense of the word! Just consider all of the massive degrees of complexity that are abstracted into the simple word "JOIN": you say you want a merged result drawn from two different data sets, and leave it up to the DBMS to determine the concrete techniques involved, deal with indexing, handle the difference between tables and subqueries, etc...
– Mason Wheeler
May 1 at 18:36
5
@MasonWheeler: I guess I was thinking of SQL more from the standpoint of the data it works on, not the implementation of the language features. Tables in a database do not seem like an abstraction. They are concrete, as in a table called "phone_numbers" contains phone numbers. A phone number is not an abstract concept.
– Greg Burghardt
May 1 at 19:06
add a comment |
"SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word)." This is a strange statement, as from my point of view SQL deals entirely with abstract behavior and not concrete programming in any sense of the word! Just consider all of the massive degrees of complexity that are abstracted into the simple word "JOIN": you say you want a merged result drawn from two different data sets, and leave it up to the DBMS to determine the concrete techniques involved, deal with indexing, handle the difference between tables and subqueries, etc...
– Mason Wheeler
May 1 at 18:36
5
@MasonWheeler: I guess I was thinking of SQL more from the standpoint of the data it works on, not the implementation of the language features. Tables in a database do not seem like an abstraction. They are concrete, as in a table called "phone_numbers" contains phone numbers. A phone number is not an abstract concept.
– Greg Burghardt
May 1 at 19:06
"SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word)." This is a strange statement, as from my point of view SQL deals entirely with abstract behavior and not concrete programming in any sense of the word! Just consider all of the massive degrees of complexity that are abstracted into the simple word "JOIN": you say you want a merged result drawn from two different data sets, and leave it up to the DBMS to determine the concrete techniques involved, deal with indexing, handle the difference between tables and subqueries, etc...
– Mason Wheeler
May 1 at 18:36
"SQL only deals with concrete data structures, not abstract behavior (or an abstraction in any sense of the word)." This is a strange statement, as from my point of view SQL deals entirely with abstract behavior and not concrete programming in any sense of the word! Just consider all of the massive degrees of complexity that are abstracted into the simple word "JOIN": you say you want a merged result drawn from two different data sets, and leave it up to the DBMS to determine the concrete techniques involved, deal with indexing, handle the difference between tables and subqueries, etc...
– Mason Wheeler
May 1 at 18:36
5
5
@MasonWheeler: I guess I was thinking of SQL more from the standpoint of the data it works on, not the implementation of the language features. Tables in a database do not seem like an abstraction. They are concrete, as in a table called "phone_numbers" contains phone numbers. A phone number is not an abstract concept.
– Greg Burghardt
May 1 at 19:06
@MasonWheeler: I guess I was thinking of SQL more from the standpoint of the data it works on, not the implementation of the language features. Tables in a database do not seem like an abstraction. They are concrete, as in a table called "phone_numbers" contains phone numbers. A phone number is not an abstract concept.
– Greg Burghardt
May 1 at 19:06
add a comment |
The thing that I think you might be missing from your question / point of view is that SQL executes operations on sets (using set operations etc.).
When you operate on that level you, naturally, give up certain control over to the engine. You can still force some procedural style code using cursors but as experience shows 99/100 times you shouldn't be doing so.
Refactoring SQL is possible but it's not using the same code refactoring principles like we're used to in application level code. Instead you optimize how you use the SQL engine itself.
This can be done in various ways. If you're using Microsoft SQL Server you can use SSMS to provide you with an approximate execution plan and you can use that to see which steps you can do to tune your code.
In the case of splitting code out into smaller modules, as @greg-burghardt mentioned, SQL is generally a purpose built piece of code and as a result. It does that one thing you need it to do and nothing else. It's adhering to the S in SOLID, it has only one reason to be changed / affected and that's when you need that query to do something else. The rest of the acronym (OLID) doesn't apply here (AFAIK there's no dependency injection, interfaces or dependencies as such in SQL) depending on the flavor of the SQL you're using you might be able to extend certain queries by wrapping them in a stored procedure / table function or using them as sub-queries so, I'd say the open-closed principle would still apply, in a way. But I digress.
I think you need to shift your paradigm in terms of how you're viewing SQL code. Due to the set nature of it it can't provide a lot of the features application level languages can (generics etc.). SQL was never designed to be anything like that, it's a language to query sets of data, and each set is unique in its own way.
That being said, there are ways in which you can make your code look nicer, if readability is a high priority within the organization. Storing bits of frequently used SQL blocks (common data sets that you use) into stored procedures / table value functions and then querying and storing them in temporary tables / table variables, followed by using those to join up the pieces together into the one massive transaction that you'd otherwise write is an option. IMHO it's not worth doing something like that with SQL.
As a language it's designed to be easily readable and understandable by anyone, even non-programmers. As such, unless you're doing something very clever, there's no need to refactor SQL code into smaller byte size pieces. I've, personally, written massive SQL queries whilst working on a data warehouse ETL / Reporting solution and everything was still very clear in terms of what was going on. Anything that might have looked a bit weird to anyone else would get a brief set of comments alongside it to provide a brief explanation.
I hope this helps.
add a comment |
The thing that I think you might be missing from your question / point of view is that SQL executes operations on sets (using set operations etc.).
When you operate on that level you, naturally, give up certain control over to the engine. You can still force some procedural style code using cursors but as experience shows 99/100 times you shouldn't be doing so.
Refactoring SQL is possible but it's not using the same code refactoring principles like we're used to in application level code. Instead you optimize how you use the SQL engine itself.
This can be done in various ways. If you're using Microsoft SQL Server you can use SSMS to provide you with an approximate execution plan and you can use that to see which steps you can do to tune your code.
In the case of splitting code out into smaller modules, as @greg-burghardt mentioned, SQL is generally a purpose built piece of code and as a result. It does that one thing you need it to do and nothing else. It's adhering to the S in SOLID, it has only one reason to be changed / affected and that's when you need that query to do something else. The rest of the acronym (OLID) doesn't apply here (AFAIK there's no dependency injection, interfaces or dependencies as such in SQL) depending on the flavor of the SQL you're using you might be able to extend certain queries by wrapping them in a stored procedure / table function or using them as sub-queries so, I'd say the open-closed principle would still apply, in a way. But I digress.
I think you need to shift your paradigm in terms of how you're viewing SQL code. Due to the set nature of it it can't provide a lot of the features application level languages can (generics etc.). SQL was never designed to be anything like that, it's a language to query sets of data, and each set is unique in its own way.
That being said, there are ways in which you can make your code look nicer, if readability is a high priority within the organization. Storing bits of frequently used SQL blocks (common data sets that you use) into stored procedures / table value functions and then querying and storing them in temporary tables / table variables, followed by using those to join up the pieces together into the one massive transaction that you'd otherwise write is an option. IMHO it's not worth doing something like that with SQL.
As a language it's designed to be easily readable and understandable by anyone, even non-programmers. As such, unless you're doing something very clever, there's no need to refactor SQL code into smaller byte size pieces. I've, personally, written massive SQL queries whilst working on a data warehouse ETL / Reporting solution and everything was still very clear in terms of what was going on. Anything that might have looked a bit weird to anyone else would get a brief set of comments alongside it to provide a brief explanation.
I hope this helps.
add a comment |
The thing that I think you might be missing from your question / point of view is that SQL executes operations on sets (using set operations etc.).
When you operate on that level you, naturally, give up certain control over to the engine. You can still force some procedural style code using cursors but as experience shows 99/100 times you shouldn't be doing so.
Refactoring SQL is possible but it's not using the same code refactoring principles like we're used to in application level code. Instead you optimize how you use the SQL engine itself.
This can be done in various ways. If you're using Microsoft SQL Server you can use SSMS to provide you with an approximate execution plan and you can use that to see which steps you can do to tune your code.
In the case of splitting code out into smaller modules, as @greg-burghardt mentioned, SQL is generally a purpose built piece of code and as a result. It does that one thing you need it to do and nothing else. It's adhering to the S in SOLID, it has only one reason to be changed / affected and that's when you need that query to do something else. The rest of the acronym (OLID) doesn't apply here (AFAIK there's no dependency injection, interfaces or dependencies as such in SQL) depending on the flavor of the SQL you're using you might be able to extend certain queries by wrapping them in a stored procedure / table function or using them as sub-queries so, I'd say the open-closed principle would still apply, in a way. But I digress.
I think you need to shift your paradigm in terms of how you're viewing SQL code. Due to the set nature of it it can't provide a lot of the features application level languages can (generics etc.). SQL was never designed to be anything like that, it's a language to query sets of data, and each set is unique in its own way.
That being said, there are ways in which you can make your code look nicer, if readability is a high priority within the organization. Storing bits of frequently used SQL blocks (common data sets that you use) into stored procedures / table value functions and then querying and storing them in temporary tables / table variables, followed by using those to join up the pieces together into the one massive transaction that you'd otherwise write is an option. IMHO it's not worth doing something like that with SQL.
As a language it's designed to be easily readable and understandable by anyone, even non-programmers. As such, unless you're doing something very clever, there's no need to refactor SQL code into smaller byte size pieces. I've, personally, written massive SQL queries whilst working on a data warehouse ETL / Reporting solution and everything was still very clear in terms of what was going on. Anything that might have looked a bit weird to anyone else would get a brief set of comments alongside it to provide a brief explanation.
I hope this helps.
The thing that I think you might be missing from your question / point of view is that SQL executes operations on sets (using set operations etc.).
When you operate on that level you, naturally, give up certain control over to the engine. You can still force some procedural style code using cursors but as experience shows 99/100 times you shouldn't be doing so.
Refactoring SQL is possible but it's not using the same code refactoring principles like we're used to in application level code. Instead you optimize how you use the SQL engine itself.
This can be done in various ways. If you're using Microsoft SQL Server you can use SSMS to provide you with an approximate execution plan and you can use that to see which steps you can do to tune your code.
In the case of splitting code out into smaller modules, as @greg-burghardt mentioned, SQL is generally a purpose built piece of code and as a result. It does that one thing you need it to do and nothing else. It's adhering to the S in SOLID, it has only one reason to be changed / affected and that's when you need that query to do something else. The rest of the acronym (OLID) doesn't apply here (AFAIK there's no dependency injection, interfaces or dependencies as such in SQL) depending on the flavor of the SQL you're using you might be able to extend certain queries by wrapping them in a stored procedure / table function or using them as sub-queries so, I'd say the open-closed principle would still apply, in a way. But I digress.
I think you need to shift your paradigm in terms of how you're viewing SQL code. Due to the set nature of it it can't provide a lot of the features application level languages can (generics etc.). SQL was never designed to be anything like that, it's a language to query sets of data, and each set is unique in its own way.
That being said, there are ways in which you can make your code look nicer, if readability is a high priority within the organization. Storing bits of frequently used SQL blocks (common data sets that you use) into stored procedures / table value functions and then querying and storing them in temporary tables / table variables, followed by using those to join up the pieces together into the one massive transaction that you'd otherwise write is an option. IMHO it's not worth doing something like that with SQL.
As a language it's designed to be easily readable and understandable by anyone, even non-programmers. As such, unless you're doing something very clever, there's no need to refactor SQL code into smaller byte size pieces. I've, personally, written massive SQL queries whilst working on a data warehouse ETL / Reporting solution and everything was still very clear in terms of what was going on. Anything that might have looked a bit weird to anyone else would get a brief set of comments alongside it to provide a brief explanation.
I hope this helps.
answered May 1 at 13:09
Toni KostelacToni Kostelac
1495
1495
add a comment |
add a comment |
I'm am going to focus on the "subqueries" in your example.
Why are they used so often? Because they use the natural way of thinking of a person:
I have this set of data, and want to do an action on a subset of it and join that with a subset of other data.
9 out of 10 times that I see a subquery, it's used wrong. My running joke about subqueries is: people who are afraid of joins use subqueries.
If you see such subqueries it's also often a sign of non-optimal database design.
The more Normalized your Database is, the more joins you get, the more your database looks like a big excel-sheet, the more subselects you get.
Refactoring in SQL is often with a different goal: get more performance, better query times, "avoiding table scans". Those may even make the code less readable but are very valuable.
So why do you see so many huge monolithic non-refactored queries?
- SQL, in many ways is not a programming language.
- Bad database design.
- People not really fluent in SQL.
- No power over the database (for instance not being allowed to use views)
- Different goals with refactoring.
(for me, the more experienced I get with SQL, the less big my queries get, SQL has ways for people of all skill levels to get their jobs done no-matter what.)
6
"Subqueries" are just as likely to be some aggregation of a properly normalised db as they are to be ad-hoc normalisation of a non-normalised db
– Caleth
May 1 at 14:21
@Caleth that's so true.
– Pieter B
May 1 at 14:23
5
Even in well normalized databases it's still often necessary to join with subqueries, rather than joining directly with tables. E.g. if you need to join with grouped data.
– Barmar
May 1 at 18:07
1
@Barmar definitely, hence my 9 out of 10 comment. Sub-queries have their place but I see them overused by inexperienced people.
– Pieter B
May 1 at 22:28
I like your metric of "number of subqueries" as an indication of database normalization (or lack thereof).
– Jason
May 1 at 23:46
add a comment |
I'm am going to focus on the "subqueries" in your example.
Why are they used so often? Because they use the natural way of thinking of a person:
I have this set of data, and want to do an action on a subset of it and join that with a subset of other data.
9 out of 10 times that I see a subquery, it's used wrong. My running joke about subqueries is: people who are afraid of joins use subqueries.
If you see such subqueries it's also often a sign of non-optimal database design.
The more Normalized your Database is, the more joins you get, the more your database looks like a big excel-sheet, the more subselects you get.
Refactoring in SQL is often with a different goal: get more performance, better query times, "avoiding table scans". Those may even make the code less readable but are very valuable.
So why do you see so many huge monolithic non-refactored queries?
- SQL, in many ways is not a programming language.
- Bad database design.
- People not really fluent in SQL.
- No power over the database (for instance not being allowed to use views)
- Different goals with refactoring.
(for me, the more experienced I get with SQL, the less big my queries get, SQL has ways for people of all skill levels to get their jobs done no-matter what.)
6
"Subqueries" are just as likely to be some aggregation of a properly normalised db as they are to be ad-hoc normalisation of a non-normalised db
– Caleth
May 1 at 14:21
@Caleth that's so true.
– Pieter B
May 1 at 14:23
5
Even in well normalized databases it's still often necessary to join with subqueries, rather than joining directly with tables. E.g. if you need to join with grouped data.
– Barmar
May 1 at 18:07
1
@Barmar definitely, hence my 9 out of 10 comment. Sub-queries have their place but I see them overused by inexperienced people.
– Pieter B
May 1 at 22:28
I like your metric of "number of subqueries" as an indication of database normalization (or lack thereof).
– Jason
May 1 at 23:46
add a comment |
I'm am going to focus on the "subqueries" in your example.
Why are they used so often? Because they use the natural way of thinking of a person:
I have this set of data, and want to do an action on a subset of it and join that with a subset of other data.
9 out of 10 times that I see a subquery, it's used wrong. My running joke about subqueries is: people who are afraid of joins use subqueries.
If you see such subqueries it's also often a sign of non-optimal database design.
The more Normalized your Database is, the more joins you get, the more your database looks like a big excel-sheet, the more subselects you get.
Refactoring in SQL is often with a different goal: get more performance, better query times, "avoiding table scans". Those may even make the code less readable but are very valuable.
So why do you see so many huge monolithic non-refactored queries?
- SQL, in many ways is not a programming language.
- Bad database design.
- People not really fluent in SQL.
- No power over the database (for instance not being allowed to use views)
- Different goals with refactoring.
(for me, the more experienced I get with SQL, the less big my queries get, SQL has ways for people of all skill levels to get their jobs done no-matter what.)
I'm am going to focus on the "subqueries" in your example.
Why are they used so often? Because they use the natural way of thinking of a person:
I have this set of data, and want to do an action on a subset of it and join that with a subset of other data.
9 out of 10 times that I see a subquery, it's used wrong. My running joke about subqueries is: people who are afraid of joins use subqueries.
If you see such subqueries it's also often a sign of non-optimal database design.
The more Normalized your Database is, the more joins you get, the more your database looks like a big excel-sheet, the more subselects you get.
Refactoring in SQL is often with a different goal: get more performance, better query times, "avoiding table scans". Those may even make the code less readable but are very valuable.
So why do you see so many huge monolithic non-refactored queries?
- SQL, in many ways is not a programming language.
- Bad database design.
- People not really fluent in SQL.
- No power over the database (for instance not being allowed to use views)
- Different goals with refactoring.
(for me, the more experienced I get with SQL, the less big my queries get, SQL has ways for people of all skill levels to get their jobs done no-matter what.)
edited May 1 at 13:51
answered May 1 at 13:44
Pieter BPieter B
11.1k12761
11.1k12761
6
"Subqueries" are just as likely to be some aggregation of a properly normalised db as they are to be ad-hoc normalisation of a non-normalised db
– Caleth
May 1 at 14:21
@Caleth that's so true.
– Pieter B
May 1 at 14:23
5
Even in well normalized databases it's still often necessary to join with subqueries, rather than joining directly with tables. E.g. if you need to join with grouped data.
– Barmar
May 1 at 18:07
1
@Barmar definitely, hence my 9 out of 10 comment. Sub-queries have their place but I see them overused by inexperienced people.
– Pieter B
May 1 at 22:28
I like your metric of "number of subqueries" as an indication of database normalization (or lack thereof).
– Jason
May 1 at 23:46
add a comment |
6
"Subqueries" are just as likely to be some aggregation of a properly normalised db as they are to be ad-hoc normalisation of a non-normalised db
– Caleth
May 1 at 14:21
@Caleth that's so true.
– Pieter B
May 1 at 14:23
5
Even in well normalized databases it's still often necessary to join with subqueries, rather than joining directly with tables. E.g. if you need to join with grouped data.
– Barmar
May 1 at 18:07
1
@Barmar definitely, hence my 9 out of 10 comment. Sub-queries have their place but I see them overused by inexperienced people.
– Pieter B
May 1 at 22:28
I like your metric of "number of subqueries" as an indication of database normalization (or lack thereof).
– Jason
May 1 at 23:46
6
6
"Subqueries" are just as likely to be some aggregation of a properly normalised db as they are to be ad-hoc normalisation of a non-normalised db
– Caleth
May 1 at 14:21
"Subqueries" are just as likely to be some aggregation of a properly normalised db as they are to be ad-hoc normalisation of a non-normalised db
– Caleth
May 1 at 14:21
@Caleth that's so true.
– Pieter B
May 1 at 14:23
@Caleth that's so true.
– Pieter B
May 1 at 14:23
5
5
Even in well normalized databases it's still often necessary to join with subqueries, rather than joining directly with tables. E.g. if you need to join with grouped data.
– Barmar
May 1 at 18:07
Even in well normalized databases it's still often necessary to join with subqueries, rather than joining directly with tables. E.g. if you need to join with grouped data.
– Barmar
May 1 at 18:07
1
1
@Barmar definitely, hence my 9 out of 10 comment. Sub-queries have their place but I see them overused by inexperienced people.
– Pieter B
May 1 at 22:28
@Barmar definitely, hence my 9 out of 10 comment. Sub-queries have their place but I see them overused by inexperienced people.
– Pieter B
May 1 at 22:28
I like your metric of "number of subqueries" as an indication of database normalization (or lack thereof).
– Jason
May 1 at 23:46
I like your metric of "number of subqueries" as an indication of database normalization (or lack thereof).
– Jason
May 1 at 23:46
add a comment |
Segregation of duties
In the SQL spirit, the database is a shared asset that contains the company's data, and protecting it is of vital importance. Enters the DBA as guardian of the temple.
Creating a new view in the database is understood to serve a lasting purpose and to be shared by a community of users. In the DBA view, this is acceptable only if the view is justified by the structure of the data. Every change of a view is then associated with risks for all its current users, even those not using the application but who have discovered the view. Finally, creation of new objects requires manage authorisations, and in the case of view, consistently with the authorisations of the underlying tables.
All this explains why DBAs don't like adding views that are just for the code of some individual application.
SQL design
If you decompose one of your nice complex query, you might find out that the subqueries will often need a parameter that depends on another subquery.
So transforming subqueries in view is not necessarily as simple as stated. You must isolate the variable parameters, and design your view so that the parameters can be added as selection criteria on the view.
Unfortunately, in doing so, you sometimes impose to access more data and less effectively than in a tailored query.
Proprietary extensions
You could hope some refactoring, by transfering some responsibilities to procedural extensions of SQL, like PL/SQL or T-SQL. However, these are vendor dependent and create a technological dependency. In addition, these extension execute on the database server, creating more processing load on a resource that is much more difficult to scale than an application server.
But what's the problem in the end ?
Finally, are segregation of duties and the SQL design with its strength and limitations a real problem ? In the end, these databases proved to successfully and reliably handle very critical data including in mission critical environments.
So in order to achieve a successful refactoring:
consider a better communication. Try to understand your DBA's constraints. If you prove to a DBA that a new view is justified by the data structures, that it is not a throw-away workaround, and that it doesn't have a security impact, he/she will certainly agree to let it be created. Because, then it would be a shared interest.
clean your own house first: Nothing forces you to generate a lot of SQL in a lot of places. Refactor your application code, to isolate the SQL accesses, and to create the classes or functions to provide reusable subqueries, if these are frequently used.
improve team-awareness: make sure that your application is not performing tasks that could be performed more efficiently by the DBMS engine. As you rightly pointed out, the procedural approach and the data oriented approach are not equally mastered by different members of the team. It depends on their background. But in order to optimize the system as a whole, your team need to understand it as a whole. So create awareness, so to be sure that less experienced players do not reinvent the wheel and share their DB thoughts with more experienced members.
+1 Some great points here. Given how bad some SQL is, the reticence of DBAs to allow views is often entirely understandable. Also, SQL can definitely benefit from peer review if it is resource hungry and/or it is going to be run frequently.
– Robbie Dee
May 3 at 13:09
add a comment |
Segregation of duties
In the SQL spirit, the database is a shared asset that contains the company's data, and protecting it is of vital importance. Enters the DBA as guardian of the temple.
Creating a new view in the database is understood to serve a lasting purpose and to be shared by a community of users. In the DBA view, this is acceptable only if the view is justified by the structure of the data. Every change of a view is then associated with risks for all its current users, even those not using the application but who have discovered the view. Finally, creation of new objects requires manage authorisations, and in the case of view, consistently with the authorisations of the underlying tables.
All this explains why DBAs don't like adding views that are just for the code of some individual application.
SQL design
If you decompose one of your nice complex query, you might find out that the subqueries will often need a parameter that depends on another subquery.
So transforming subqueries in view is not necessarily as simple as stated. You must isolate the variable parameters, and design your view so that the parameters can be added as selection criteria on the view.
Unfortunately, in doing so, you sometimes impose to access more data and less effectively than in a tailored query.
Proprietary extensions
You could hope some refactoring, by transfering some responsibilities to procedural extensions of SQL, like PL/SQL or T-SQL. However, these are vendor dependent and create a technological dependency. In addition, these extension execute on the database server, creating more processing load on a resource that is much more difficult to scale than an application server.
But what's the problem in the end ?
Finally, are segregation of duties and the SQL design with its strength and limitations a real problem ? In the end, these databases proved to successfully and reliably handle very critical data including in mission critical environments.
So in order to achieve a successful refactoring:
consider a better communication. Try to understand your DBA's constraints. If you prove to a DBA that a new view is justified by the data structures, that it is not a throw-away workaround, and that it doesn't have a security impact, he/she will certainly agree to let it be created. Because, then it would be a shared interest.
clean your own house first: Nothing forces you to generate a lot of SQL in a lot of places. Refactor your application code, to isolate the SQL accesses, and to create the classes or functions to provide reusable subqueries, if these are frequently used.
improve team-awareness: make sure that your application is not performing tasks that could be performed more efficiently by the DBMS engine. As you rightly pointed out, the procedural approach and the data oriented approach are not equally mastered by different members of the team. It depends on their background. But in order to optimize the system as a whole, your team need to understand it as a whole. So create awareness, so to be sure that less experienced players do not reinvent the wheel and share their DB thoughts with more experienced members.
+1 Some great points here. Given how bad some SQL is, the reticence of DBAs to allow views is often entirely understandable. Also, SQL can definitely benefit from peer review if it is resource hungry and/or it is going to be run frequently.
– Robbie Dee
May 3 at 13:09
add a comment |
Segregation of duties
In the SQL spirit, the database is a shared asset that contains the company's data, and protecting it is of vital importance. Enters the DBA as guardian of the temple.
Creating a new view in the database is understood to serve a lasting purpose and to be shared by a community of users. In the DBA view, this is acceptable only if the view is justified by the structure of the data. Every change of a view is then associated with risks for all its current users, even those not using the application but who have discovered the view. Finally, creation of new objects requires manage authorisations, and in the case of view, consistently with the authorisations of the underlying tables.
All this explains why DBAs don't like adding views that are just for the code of some individual application.
SQL design
If you decompose one of your nice complex query, you might find out that the subqueries will often need a parameter that depends on another subquery.
So transforming subqueries in view is not necessarily as simple as stated. You must isolate the variable parameters, and design your view so that the parameters can be added as selection criteria on the view.
Unfortunately, in doing so, you sometimes impose to access more data and less effectively than in a tailored query.
Proprietary extensions
You could hope some refactoring, by transfering some responsibilities to procedural extensions of SQL, like PL/SQL or T-SQL. However, these are vendor dependent and create a technological dependency. In addition, these extension execute on the database server, creating more processing load on a resource that is much more difficult to scale than an application server.
But what's the problem in the end ?
Finally, are segregation of duties and the SQL design with its strength and limitations a real problem ? In the end, these databases proved to successfully and reliably handle very critical data including in mission critical environments.
So in order to achieve a successful refactoring:
consider a better communication. Try to understand your DBA's constraints. If you prove to a DBA that a new view is justified by the data structures, that it is not a throw-away workaround, and that it doesn't have a security impact, he/she will certainly agree to let it be created. Because, then it would be a shared interest.
clean your own house first: Nothing forces you to generate a lot of SQL in a lot of places. Refactor your application code, to isolate the SQL accesses, and to create the classes or functions to provide reusable subqueries, if these are frequently used.
improve team-awareness: make sure that your application is not performing tasks that could be performed more efficiently by the DBMS engine. As you rightly pointed out, the procedural approach and the data oriented approach are not equally mastered by different members of the team. It depends on their background. But in order to optimize the system as a whole, your team need to understand it as a whole. So create awareness, so to be sure that less experienced players do not reinvent the wheel and share their DB thoughts with more experienced members.
Segregation of duties
In the SQL spirit, the database is a shared asset that contains the company's data, and protecting it is of vital importance. Enters the DBA as guardian of the temple.
Creating a new view in the database is understood to serve a lasting purpose and to be shared by a community of users. In the DBA view, this is acceptable only if the view is justified by the structure of the data. Every change of a view is then associated with risks for all its current users, even those not using the application but who have discovered the view. Finally, creation of new objects requires manage authorisations, and in the case of view, consistently with the authorisations of the underlying tables.
All this explains why DBAs don't like adding views that are just for the code of some individual application.
SQL design
If you decompose one of your nice complex query, you might find out that the subqueries will often need a parameter that depends on another subquery.
So transforming subqueries in view is not necessarily as simple as stated. You must isolate the variable parameters, and design your view so that the parameters can be added as selection criteria on the view.
Unfortunately, in doing so, you sometimes impose to access more data and less effectively than in a tailored query.
Proprietary extensions
You could hope some refactoring, by transfering some responsibilities to procedural extensions of SQL, like PL/SQL or T-SQL. However, these are vendor dependent and create a technological dependency. In addition, these extension execute on the database server, creating more processing load on a resource that is much more difficult to scale than an application server.
But what's the problem in the end ?
Finally, are segregation of duties and the SQL design with its strength and limitations a real problem ? In the end, these databases proved to successfully and reliably handle very critical data including in mission critical environments.
So in order to achieve a successful refactoring:
consider a better communication. Try to understand your DBA's constraints. If you prove to a DBA that a new view is justified by the data structures, that it is not a throw-away workaround, and that it doesn't have a security impact, he/she will certainly agree to let it be created. Because, then it would be a shared interest.
clean your own house first: Nothing forces you to generate a lot of SQL in a lot of places. Refactor your application code, to isolate the SQL accesses, and to create the classes or functions to provide reusable subqueries, if these are frequently used.
improve team-awareness: make sure that your application is not performing tasks that could be performed more efficiently by the DBMS engine. As you rightly pointed out, the procedural approach and the data oriented approach are not equally mastered by different members of the team. It depends on their background. But in order to optimize the system as a whole, your team need to understand it as a whole. So create awareness, so to be sure that less experienced players do not reinvent the wheel and share their DB thoughts with more experienced members.
edited May 1 at 16:48
answered May 1 at 15:17
ChristopheChristophe
27k32974
27k32974
+1 Some great points here. Given how bad some SQL is, the reticence of DBAs to allow views is often entirely understandable. Also, SQL can definitely benefit from peer review if it is resource hungry and/or it is going to be run frequently.
– Robbie Dee
May 3 at 13:09
add a comment |
+1 Some great points here. Given how bad some SQL is, the reticence of DBAs to allow views is often entirely understandable. Also, SQL can definitely benefit from peer review if it is resource hungry and/or it is going to be run frequently.
– Robbie Dee
May 3 at 13:09
+1 Some great points here. Given how bad some SQL is, the reticence of DBAs to allow views is often entirely understandable. Also, SQL can definitely benefit from peer review if it is resource hungry and/or it is going to be run frequently.
– Robbie Dee
May 3 at 13:09
+1 Some great points here. Given how bad some SQL is, the reticence of DBAs to allow views is often entirely understandable. Also, SQL can definitely benefit from peer review if it is resource hungry and/or it is going to be run frequently.
– Robbie Dee
May 3 at 13:09
add a comment |
Re points 1 & 3: Views aren't the only way. There are also temporary tables, marts, table variables, aggregated columns, CTEs, functions, stored procedures and possibly other constructs depending on the RDBMS.
DBAs (and I'm speaking as someone who has been both DBA and developer) tend to view the world in a pretty binary way so are often against things like views and functions due to the perceived performance penalty.
Latterly, the need for complex joins has reduced with the recognition that denormalised tables despite being sub-optimal from a NF point of view, are highly performant.
There is also the trend for doing queries client side with technologies like LINQ which you raise in point 2.
While I agree that SQL can be challenging to modularise, great strides have been made although there will always be a dichotomy between client side code and SQL - although 4GL has blurred the lines somewhat.
I guess it really depends on how far your DBAs/architects/tech leads are willing to cede in this regard. If they refuse to allow anything but vanilla SQL with lots of joins, huge queries could result. If you're stuck with this, don't bang your head on a brick wall, escalate it. There are generally better ways of doing things with a bit of compromise - especially if you can prove the benefits.
1
I've never heard of a "mart" construct. What is that?
– bishop
May 1 at 17:47
1
Marts are just a subset of the repository (master database). If there are specific complex queries that need to run, a special database can be created specifically to service those requests. A very common example is a reporting mart.
– Robbie Dee
May 1 at 20:30
Confused why this was downvoted. Doesn't directly answer the question, but gives a fairly clear implicit answer of "option 3: there are many ways of handling this, which are widely used".
– Dewi Morgan
May 1 at 21:48
TIL about data marts. Have a +1!
– bishop
May 1 at 22:44
add a comment |
Re points 1 & 3: Views aren't the only way. There are also temporary tables, marts, table variables, aggregated columns, CTEs, functions, stored procedures and possibly other constructs depending on the RDBMS.
DBAs (and I'm speaking as someone who has been both DBA and developer) tend to view the world in a pretty binary way so are often against things like views and functions due to the perceived performance penalty.
Latterly, the need for complex joins has reduced with the recognition that denormalised tables despite being sub-optimal from a NF point of view, are highly performant.
There is also the trend for doing queries client side with technologies like LINQ which you raise in point 2.
While I agree that SQL can be challenging to modularise, great strides have been made although there will always be a dichotomy between client side code and SQL - although 4GL has blurred the lines somewhat.
I guess it really depends on how far your DBAs/architects/tech leads are willing to cede in this regard. If they refuse to allow anything but vanilla SQL with lots of joins, huge queries could result. If you're stuck with this, don't bang your head on a brick wall, escalate it. There are generally better ways of doing things with a bit of compromise - especially if you can prove the benefits.
1
I've never heard of a "mart" construct. What is that?
– bishop
May 1 at 17:47
1
Marts are just a subset of the repository (master database). If there are specific complex queries that need to run, a special database can be created specifically to service those requests. A very common example is a reporting mart.
– Robbie Dee
May 1 at 20:30
Confused why this was downvoted. Doesn't directly answer the question, but gives a fairly clear implicit answer of "option 3: there are many ways of handling this, which are widely used".
– Dewi Morgan
May 1 at 21:48
TIL about data marts. Have a +1!
– bishop
May 1 at 22:44
add a comment |
Re points 1 & 3: Views aren't the only way. There are also temporary tables, marts, table variables, aggregated columns, CTEs, functions, stored procedures and possibly other constructs depending on the RDBMS.
DBAs (and I'm speaking as someone who has been both DBA and developer) tend to view the world in a pretty binary way so are often against things like views and functions due to the perceived performance penalty.
Latterly, the need for complex joins has reduced with the recognition that denormalised tables despite being sub-optimal from a NF point of view, are highly performant.
There is also the trend for doing queries client side with technologies like LINQ which you raise in point 2.
While I agree that SQL can be challenging to modularise, great strides have been made although there will always be a dichotomy between client side code and SQL - although 4GL has blurred the lines somewhat.
I guess it really depends on how far your DBAs/architects/tech leads are willing to cede in this regard. If they refuse to allow anything but vanilla SQL with lots of joins, huge queries could result. If you're stuck with this, don't bang your head on a brick wall, escalate it. There are generally better ways of doing things with a bit of compromise - especially if you can prove the benefits.
Re points 1 & 3: Views aren't the only way. There are also temporary tables, marts, table variables, aggregated columns, CTEs, functions, stored procedures and possibly other constructs depending on the RDBMS.
DBAs (and I'm speaking as someone who has been both DBA and developer) tend to view the world in a pretty binary way so are often against things like views and functions due to the perceived performance penalty.
Latterly, the need for complex joins has reduced with the recognition that denormalised tables despite being sub-optimal from a NF point of view, are highly performant.
There is also the trend for doing queries client side with technologies like LINQ which you raise in point 2.
While I agree that SQL can be challenging to modularise, great strides have been made although there will always be a dichotomy between client side code and SQL - although 4GL has blurred the lines somewhat.
I guess it really depends on how far your DBAs/architects/tech leads are willing to cede in this regard. If they refuse to allow anything but vanilla SQL with lots of joins, huge queries could result. If you're stuck with this, don't bang your head on a brick wall, escalate it. There are generally better ways of doing things with a bit of compromise - especially if you can prove the benefits.
edited May 2 at 7:44
answered May 1 at 13:27
Robbie DeeRobbie Dee
8,13011548
8,13011548
1
I've never heard of a "mart" construct. What is that?
– bishop
May 1 at 17:47
1
Marts are just a subset of the repository (master database). If there are specific complex queries that need to run, a special database can be created specifically to service those requests. A very common example is a reporting mart.
– Robbie Dee
May 1 at 20:30
Confused why this was downvoted. Doesn't directly answer the question, but gives a fairly clear implicit answer of "option 3: there are many ways of handling this, which are widely used".
– Dewi Morgan
May 1 at 21:48
TIL about data marts. Have a +1!
– bishop
May 1 at 22:44
add a comment |
1
I've never heard of a "mart" construct. What is that?
– bishop
May 1 at 17:47
1
Marts are just a subset of the repository (master database). If there are specific complex queries that need to run, a special database can be created specifically to service those requests. A very common example is a reporting mart.
– Robbie Dee
May 1 at 20:30
Confused why this was downvoted. Doesn't directly answer the question, but gives a fairly clear implicit answer of "option 3: there are many ways of handling this, which are widely used".
– Dewi Morgan
May 1 at 21:48
TIL about data marts. Have a +1!
– bishop
May 1 at 22:44
1
1
I've never heard of a "mart" construct. What is that?
– bishop
May 1 at 17:47
I've never heard of a "mart" construct. What is that?
– bishop
May 1 at 17:47
1
1
Marts are just a subset of the repository (master database). If there are specific complex queries that need to run, a special database can be created specifically to service those requests. A very common example is a reporting mart.
– Robbie Dee
May 1 at 20:30
Marts are just a subset of the repository (master database). If there are specific complex queries that need to run, a special database can be created specifically to service those requests. A very common example is a reporting mart.
– Robbie Dee
May 1 at 20:30
Confused why this was downvoted. Doesn't directly answer the question, but gives a fairly clear implicit answer of "option 3: there are many ways of handling this, which are widely used".
– Dewi Morgan
May 1 at 21:48
Confused why this was downvoted. Doesn't directly answer the question, but gives a fairly clear implicit answer of "option 3: there are many ways of handling this, which are widely used".
– Dewi Morgan
May 1 at 21:48
TIL about data marts. Have a +1!
– bishop
May 1 at 22:44
TIL about data marts. Have a +1!
– bishop
May 1 at 22:44
add a comment |
12
There are organisations which only let you query a database through views and modify it through stored procedures.
– Pieter B
May 1 at 12:51
3
SQL got a lot more enjoyable for me when I finally accepted that it was never going to be as DRY as my normal procedural code.
– Graham
May 1 at 13:23
1
4. SQL is really old and hasn't materially been updated in decades. For super complex stuff, a lot of teams opt for stored procedures. You can add different clauses for that. Sometimes you just have to run jobs to stage data in a temp table and then join on that. Behold how different declarative and procedural languages are.
– Berin Loritsch
May 1 at 13:42
8
Also one reason is that there is a horrible performance issue called a "triangular join" which can happen when you use views (quite by accident of course). If your query joins View A and View B, but View A also in its implementation re-uses View B, you start to see that problem. So folks often start off by writing a single monolithic query to be able to see what would actually work best in terms of refactoring to views, and then their deadline hits, and the monolith goes to production. Kind of like 98% of all software dev, really :) :)
– Stephen Byrne
May 1 at 14:51
3
"Imagine if other types of programmers had to submit a request each time they created a function" ... umm. You don't do code reviews?
– svidgen
May 1 at 21:14