| Good day to all! |
At the interview they asked the question - what are the features of designing and creating a database for very large amounts of data.
Could not answer.
Who in the subject, enlighten on the issue for the future ..)
Interested in answers from both dba and developer
| https://en.wikipedia.org/wiki/Very_large_database |
| Freeboard |
I would decide that this is a trick question. Perhaps they wanted from you counter-questions about the definition of "very large amounts of data." For example, the question of how to work with this data: will it be dwh or oltp or a mixed system. What are the expectations for access speed / latency. What do they need from the database itself from CAP and so on.
Maybe. As a developer, I couldn’t find an answer ..) I decided that basically it was in the competence of dba - planning, some preliminary actions with a reserve for the subsequent amount of data.
| Freeboard |
It seems to me that there is a lot of work for dba here anyway, but the basic techniques and good practices are less well known and standardized.
From the point of view of dbd, it is already more difficult here, since it is necessary to look at the specific requirements of the task, propose solutions and discuss the pros and cons of specific implementations.
|To make it work in a reasonable time with 100 entries and 100 billion entries. It is desirable immediately.|
Is there a compendium worthy of trust and respect for configuring MS SQL Server for very large databases?
I think that here it’s probably not about setting up a server, but about such things as replication clusters of alwayson cubernetis skh you-name-it. When that makes sense to apply and how to customize.
From the developer’s side - active use of partitioning of large tables and file groups.
On the DBA side, the layout of the “archived” sections of large tables on HDDs and “fresh” sections on SSDs, a separate volume (and LUN on a disk shelf) for the transaction log, a separate volume under tempdb (the fastest), getting the budget out of license SQL Enterprise Edition, preparation AlwaysOn AvGroups or failover cluster (FCI), MSDN regularly read and agreed with the developers by rolling fresh SP and the CU, deployment monitoring rendering important indicators agreed with the developers, to dashboards and providing access for developers tors and the Service Desk.
Attentive setting together with the Resource Governor developers and dividing all users of such a large database into groups with different priority access to server resources.
For DBA, monitoring free space on volumes with all database files, including the MSSQL instance databases, and sending regular email notifications, including to developers (not everyone watches dashboards, and everyone reads service mail).
Since for sure such a database will be in the AlwaysOn group, you will need to think in advance who should be given access to which replicas. And the most important thing is to minimize the use of triggers (then say thank you more than once).
| Andy_OLAP, |
There are no features.
Maybe except for one thing - if you don’t make indexes, it won’t work at all
Any database must be designed with a view to the likely significant growth.
Provide opportunities for transparent data trimming / archiving without loss of functionality.
on the other hand, the same requirements can be set to work with the minimum baseline, but with archival information or simply heavily loaded)
AlwaysOn hardly pulls even large bases (~ 10Tb), and very large ones are a big question. Read about the limitations of REDO.
To the author of the topic: SQL Server is a delicate matter ... and when the bases exceed dozens of Terabytes, everything becomes an order of magnitude more complicated, and you need to pay close attention to even the smallest details (in the sense of ordinary size).
Very many things just will not, like the high availability and (often) readiness. We'll have to "cut" everything, scale and stay away from virtualization;)
|My opinion is that the developer develops as usual, the questions of a large base belong to 99% to the architect, not to the developer. If the employer does not see the difference, then it is better to give up this job, for you will turn from such an employer from a specialist into a laborer in the field.|
It was about who develops a specific database.
| L_argo, |
in what sense is cleverness? If you don’t know something, that doesn’t mean that others are stupid too. The requirements for the competence of the architect of the database are strikingly different from the requirements for the developer of the code, and such a thing as a “developer of the database” exists only on projects of the SOHO level, but there are no big bases there.
| L_argo, |
In my opinion, the architect is really thinking about the structure of the base being created. The developer implements ideas in terms of code, from the database administrator needs recommendations for fault tolerance, setting up backup, monitoring, setting up the ideas of the architect on the database parameters.
But more often there are no such posts. And everything is customizable: ok, further, further ok.
When it comes to OLTP, then look in the direction of the Microservice Architect,
if speech about HD look aside Data Vault and Anchor Model
And such things as partitioning, column indexes become relevant
In general, it is better not to make one big database.
It was a question rather on "measuring the depth of understanding and general competence"))
"what are the features of designing and creating a database for very large amounts of data.
I couldn’t find an answer ”- the author of the topic would have to answer“ After my appointment to the position of the main developer to create a database for a very large amount of data, I will hire Voronin’s deputies first. ”
Next time I will do it ..!;))
| There are many subtleties, for example: |
If the database is very large, then the company most likely has a test environment, and often not one.
Therefore, you need to think about the placement of data on FG in advance, to leave on the test only a small piece of data (for example, the last 2 years). Otherwise, the test environment will become golden in disk space.
When administrative operations need to very much consider their consequences, otherwise you can simply block everything for a few hours.
But in general, the oltp systems try not to bring this up, they simply cut them off after a year or two or five, and the old ones are sent to the archive. Therefore, most likely the question was about the difference between oltp-systems and data storage.
I don’t know, maybe the questioners wanted to understand whether the TS worked from the “base for very large amounts of data”?
What is “big” in his understanding?
What role did the vehicle have, did he just participate (and then what experience did he have), or was he one of the project leaders (and what solutions did he find, approaches to work with such bases)?
In general, questions about the experience.