~ INTRODUCTION ~
Karen introduced Jon Card, V.P. of Special Projects at Combined Transport (CT), and explained that his Tech Talk is about a rather unique database architecture marketed as MultiValue.
Jon began by explaining that he will to speak about MultiValue Database Architecture; also known as the eponymous Pick Database. [In 1965, Dick Pick launched a company based upon the public domain software he and Don Nelson had written for the government; the resulting product was named the Pick Operating System.]
Jon: There are multiple implementations of the Pick O.S. licensed to various computer manufacturers: UniVerse, UniData, D3, JBASE, Mvbase, OpenQM. UniVerse is the version we use so, specific details refer to UniVerse; that’s all under the Rocket [License] Umbrella. Zumasys, another large company, that owns one quarter of the Pick Databases.
~ ARCHITECTURAL DETAILS ~
With SQL, a database has rows and columns. In a MultiValue database, rows are referred to as records and there’s no practical limit to the number of rows you can have. Columns are considered fields. Data is stored in a file (as opposed to a [structured] table) and identified with a unique RecordID. The RecordID could be anything. One example is the CT Comp Trip Detail Record which is part of the load for moving a truckload item somewhere. The RecordID is simply the CT LoadID and we use that as a reference to another, associated file. Comp Trip Detail handles the truck information. The associate master file, Compensation Master, handles the load information. They both use the same RecordID so the same RecordID can be used to reference both files.
~ FORMATTING THE RECORDS~
The way they build the records is to use format marks (FM), value marks (VM), sub-value marks (SVM), and text marks.Format Marks separate each field in a record. In a particular field, you can have multi-value fields and sub-value fields. An example of this format is: if you think about a truck going to a truck stop and buying fuel, we want to record the truck stop number, the name of the truck stop, city and state of the truck stop, and how much fuel the driver bought. All that [varied] information can be stored in one record. Then, if the driver stops at another location and buys fuel, we can list the same categories of information within the same fields, with data from each location segregated by value marks. When you look at the raw data, you may see a string of OR]CA]OR]WA]OR]OR, and each one of those references a specific location where the driver bought fuel. And then, we can break down the data again if we put a second truck’s data on that record and have a separator—another value mark—and put all the states inline again. We can differentiate between truck #1, truck #2, etc., and all the data is contained within one record.
~ VISUAL EXAMPLE ~
Here is a visual example of the way data might be stored in a file. Note that, in this example, the control character used to separate values is displayed as a right bracket.
~ RAW DATA ~
|Truck Stop Number||1743]862]1743]12345|
|Truck Stop Name||Joe’s Gas Station]CA Truck Stop]Joe’s Gas Station]WA Truck Stop|
~ REPORT DISPLAY DATA ~
|Truck Stop #||Truck Stop Name||City||ST||
|1743||Joe’s Gas Station||Ashland||OR||
|862||CA Truck Stop||Sacramento||CA||
|1743||Joe’s Gas Station||Ashland||OR||
|12345||WA Truck Stop||Spokane||WA||
~ DATA FORMAT DISCUSSION ~
C: That seems way cleaner than SQL, in my opinion. SQL is so convoluted when you start talking about time tables together…
Jon: The advantage to using multi-value is: I have here a printout of one record that you can pass around. If you look at the last page, 176 lines.
Q: This is only one record that’s 176 lines long?
Jon: Yes. And it probably started out with 100 lines, and we decided to add a few fields to the same record. You don’t have to rebuild anything when you want to add a more items in this record; for example, date and time. You simply add a field; define the field in the [associated] dictionary.
Q: Is the 3 a sub-value mark?
Jon: Yes. In this printout, the 3 is a sub-value mark and the 2 is a multi-value mark. [In other implementations, the printout might display a value mark as a right bracket and a sub-value mark as a carat; both are control characters.]
Q: Is there a database-like schema that is defined?
C: This kinda sounds like the MongoDB Paradigm with document databases.
Jon: A Multi-Value Database can be faster than SQL, thanks to its ability to store complex data in a single file—or table, depending on what you want to call it. All the data is variable length and it’s not typed data. If a phone number, for example, is turned in with letters, we could enter that format in the file; the system wouldn’t care. Your business logic is where you declare the data type. If your business logic says the data must be 7 digits–or, 9 digits—then the business logic defines the data, not the data layer.
C: I have a comment: You can store the data in one format and display it in multiple formats, via your business logic.
Jon: Yes. we’ll talk about dictionaries in a minute. Data is defined with dictionaries on each file, rather than via a schema for the entire database. And, there can be multiple dictionary items associated with one field; each dictionary item may define the same data in different ways. The only reason you need a dictionary is for [ad hoc] reports. If you’re going to use business logic to build a report, you don’t even need a dictionary—but the programmer must know how the value is stored. Normally, we don’t store normal numbers with decimal places; we store the number(s) without decimal places and define the display format (e.g., number of decimal places) in a dictionary item. When the system accepts [for input] a dollar amount with a format that includes a dollar sign, commas and decimal point, that data is converted [stripped of formatting] to a base format, consisting of only the numbers. When you want to display the number in a particular format, you simply specify the format [e.g., MD2; Masked Decimal with 2 decimal places].
C: That sounds like it might be a real issue, in terms of training, because I know that I might put in one dollar in a variety of formats, with or without the decimal amount…
Jon: Well, you can do that but it’s your business logic that determines whether you can put in 25 as 25 dollars or 25 cents.
Q: How would the program know that I meant to put in 25 dollars without specifying cents, vs. a quarter?
Jon: Again, this is your business logic. If you define a format of 2N, your input mustbe 2 Numbers or if you define a format of 2N.2N, you would be required to enter 02.50 to store $2.50; the program would not let you get away with entering 2.5 alone. But if you defined the data as 0N, then the input field would permit an unlimited number of numbers followed by either a decimal point—or no decimal point. When you enter 25, the program assumes $25.00 and not 25 cents.
Q: So, wouldn’t the program processing the data entry define the format of the data?
Jon: Yes. That’s the whole point; it’s the front end—the business logic—that defines the data, not the data itself. We perform data validation at the application layer, not the data layer.
Most of MultiValue is programmed in MV Basic. They’ve added Python support so you can program with Python, if you want. With MultiValue, you can actually read and write to an SQL Database directly, if you want. But it’s designed to work with the MV Databases.
~ DATA STORAGE ~
Some of the interesting things include the fact that dates are stored as a number. If you remember Y2K, the Y2K issues were immaterial to Pick Databases. Day 0 is December 31, 1967; the day they launched the product. Today is 18904.
Q: Are negative dates possible if you have a date prior to 12/31/67?
Jon: Oh, yes! Absolutely! Any date prior to 12/31/67 is stored as a negative number. This makes it very easy because, when you’re doing math with dates, you’re working with a number; adding or subtracting numbers. Super easy! Then you simply convert the number for the date display; 2-digit year, 4-digit year, month first, day first, whatever [format] you want.
C: I’m assuming the underlying system doesn’t care what is stored; it’s simply a number.
C: There’s no notion of a date/time schema at all.
Jon: No, there isn’t.
C: All the schema stuff is unnecessary.
C: And, you can have the same numeric value displayed in different formats, by creating multiple dictionary items that define different formats: 2-digit year, 4-digit year, month first, day first, whatever [format] you want.
Jon: They store the time the same way. There’s 86,400 seconds in a day; that’s how they store time. You can have less than a second by putting a decimal place in a number. For example, our onboard computers send values that are less than a second. We happen to store them that way, but most of the time we display the times in minutes and don’t bother with the seconds; they’re immaterial—as far as we’re concerned.
Dictionaries offer users a quick way to produce reports—or, you can write a MultiValue Basic program. You can execute a Retrieve command; Retrieve is the name of the report writer. [Equivalent to SQL’s Report Builder.]
You are able to associate fields. If you have a field that’s empty, e.g., Commodity; Estimated Weight; Flat Rate/By Mile (if By Mile, Miles are stored); Rate Per Mile; and Extension -or- Dollar Amount. The Retrieve report writer is smart enough to figure out that it can line up all that data because all these fields are associated and belong on one line [row] in a report. If a couple of the fields are empty, the report writer will simply skip those fields, instead of moving the lines [rows] up to fill-in the empty space. So, we can have one line that’s Flat Rate and doesn’t have Miles (no Rate Per Mile) and the next line would show up with your Rate Per Mile—and the system doesn’t get confused.
Q: And that’s at the database level, counting Value Marks?
Jon: Yes. Counting the Value Marks and considering the association. You define dictionary items that declare the numbers that are associated with each other. Merely because I’ve got a file that’s 174 lines, I might have 3 or 4 lines that are associated and then have another 3-4 lines that have a different association.
You can also write programs inside dictionaries. This permits you to write some complicated choices instead of trying to do that manually. Or, you can manually write a little program inside the dictionary. For example, I want to add-up axles and the weight associated with them; this permits me to get the total gross weight for a truck. But the truck can have drive and steering and jeep and booster, so you have to add-up all of those to get the gross weight.
Q: Is this still called a PROC (abbreviation for PROCess)?
Then, everyone wants to get away from the green screen.Personally, I love the green screen!
C: It’s fast.
Jon: Yes! It’s much faster than doing a Windows-type interface. But the training is part of the issue, also. When you have someone who’s new and they’ve never used this [system] before, it takes a bit more training. If you give them a Windows Interface, they understand that a lot easier. So our long-time employees want to go back to the green screens but the newer employees like the Windows Interface.
Q: And this is without containerization?
Jon: Yes; without containerization. Just flat running on the machine.
We’re moving to Red Hat; Red Hat is going to be virtualized. They’ve come up with cloud experiences where you can build your package with a UniVerse Package, and choose which cloud service you want to go to and the system will fix everything you want to fix in your system and put it right out into the cloud.
Q: Is that your vendor that’s calling it my cloud experience?
Jon: Well, there are two different ones. I selected Pick Cloud; they’re a generic company. For $299/month, they will offer you a cloud package to do it, and they’ve got different pricing. They also do data backup, recovery, cold standby, hot standby, tape backup; however you want to backup your system. But you’ve got problems with printing, of course, when you’re in the cloud. So you have to figure out how you’re going to print from the cloud.
Most of my stuff, nowadays, I either print it out in Cap-delimited, Comma-delimited, or .pdf, and email it to the users as an email attachment and the users can print out the file if the want to, or not, to save paper. We still print 80,000 sheets of paper per month; a reduction of 8% from last month. That was more because the business slowed-down so we conserved on paper. But actually, 80,000 sheets of paper is a fairly small cost, right now. We’re not trying to reduce our paper usage right now.
We do document imaging and indexing, so we actually have a problem right now with the drivers sending in their paperwork so fast that the dispatchers don’t have time to make changes because the paperwork was sent in five minutes after the driver unloaded, and the billing clerk started billing the load that fast and so, once you start the billing process, you can’t change the dispatch process … so you have big arguments between Dispatch and Accounting. The new systems, where the driver takes a picture of his document, sends it into our system and the Billing Clerk is sitting there waiting for the next load to bill. The system is that fast!
C: Wow! [One solution to that might be to have a flag set that triggers an approval from Dispatch, and the system won’t initiate billing process until that flag is reset.]
Jon: That’s our process: being able to integrate everything into UniVerse. We use EDI (Electronic Data Interchange) on loads, and that’s normally an FTP (File Transfer Process) or an SFTP Process, and we build the flat files and bill it out automatically into a load. And, if we’ve got a contract on the customer, then we automatically validate the rates for the shipment.
Q: Does the MultiValue Database allow you to do some of the SQL functions like averages and other built-in functions like summary functions? SQL [Originally developed at IBM—based on Codd’s Relational Model—in the early 1970’s, at least five years after Dick Pick and Don Nelson’s GIRLS (Generalized Information Retrieval Language System) model was designed (on an IBM System/360) and used by the U.S. Military to control inventory of Cheyenne Helicopter Parts], is pretty powerful if you know how to use it.
Jon: Right. You can do that. Most of the time, I end up doing [those functions] in the business logic.
~ REPORTING FLEXIBILITY ~
Jon: Using this method gives me a bit more flexibility and I can pick-and-choose. Instead of trying to have this table that I’m trying to average, I can look at the table and decide that, since I don’t like one or two pieces of data, I’m going to exclude them from this report when I’m doing an average. They also have what are called Dynamic Files. We only have one dynamic file in our system and it has about 7,500,000 records in it. So what it does is: breaks-up each of the sections into about 1,500,000 in each one of the five partitions, and then divides it up based upon Record ID. And you can build any kind of Record ID you want.
And, if you have data that you retrieve a lot, e.g., on the Comm Trip Detail or even on the distributed file, we will index the files—usually on dates—so, if you want to pull a date range, the index already has it calculated so you merely have to tell the index to give you those files.
And, as long as you have that as the first criterion in your data selection, the system will use the index and use the other criteria to select, but only on the sub-set defined by the first criterion. So it can speed-up a request remarkably; anything that might otherwise take a couple of minutes might now take only a couple of seconds if you use an index with it.
~ STORAGE CAPACITY ~
Q: How many records are in your system right now?
Jon: Total? The UTM is 7.5 million. Our Comm Trip Detail is 28,000, and that’s every load that we’ve moved for the last five years or so.
Q: How does that compare with your system?
C: I write one million coordinates per day.
C: A day.
Jon: The biggest thing, though, is simply making sure your disk files are properly-sized. Some of our files need to be resized. You set up the maximum size of the file envelope and, if you exceed that level, it still works but slows-down the record processing. The goal is to set the size for a maximum plus a bit extra but not too much extra because then you are reserving more space than necessary. I can take a record that uses a modulo of 2 and make it 50 and that doesn’t hurt the system any but you’ve got all that unused disk space [reserved] that you can’t access.
~ POWERFUL SELECT STATEMENT ~
C: So you were talking about a Select statement and how that reduces the number of records you have to work with.
Jon: Right. When you do a Select, you must specify the index item first in your Select statement, if you want to take advantage of the index. You could write a poor Select statement by sticking the index anywhere, but putting the index first results in a proper use of the Select process. We use Assign Date and Arrival Date Detail on the CTD record, so if you’re going to do a selection, you want to select the date range first and then move on to the other criteria. And you have to have a dictionary item for each item in the Select statement. That’s one reason why you need dictionaries.
~ COSTS ~
Q: I have a question about costs. I never had responsibility for costs when I was contracting; I simply wrote code and got paid. But from a business perspective, now that I’ve heard the kinds of things that you all encounter (annual licensing fees, etc.) when you’re running the operation, how do costs with a Pick Operating System compare with other systems?
Jon: As far as the software itself, if you have a maintenance agreement, you can upgrade to any version you want with no additional cost. The one issue that I have is that our users usually log in three times [three simultaneous sessions], so we have to have three licenses per user. We limit our users to three logins. Some of the other Pick Implementations permit ten multiple sessions for one license. Those other implementations would be much better for our situation; I have 185 licenses, right now. And I only have 100 users. Some people run only one session, others run up to three sessions, and we hold the limit to three for most users. In addition, we operate two telnet sessions for the green screens and one for Windows. The nice thing about the Windows Session is: you can open that session and then initiate another process while the Windows Session remains running, and then return to the Windows Session once the secondary process has been shut down.
Obviously with Telnet, when you’re doing the green screen you can only pull up one thing at a time. The nice thing that we have though is that our Telnet Program stores your previous screens, so you can scroll back up and look at the results of previous activities. That can be convenient, sometimes.
~ TRANSACTION MANAGEMENT ~
Q: Is it safe to assume that the management system does all the transaction handling? Like ACID transactions and all?
Jon: Yes. When you set up a UniVerse account, you set up accounts level two. For our company, we have one system that does everything. We have an Accounts Payable Section, Accounts Receivable Section, and it locks all of them.
Q: You have Read/Write transactions, too?
Jon: Yes. Locks.
Q: Financial transactions?
Jon: Yes. It locks all of them.
Q: It does Resource locks?
Q: Well, a lot of it sounds really primitive so I didn’t know how much of that is already built-in.
Jon: It’s built-in if you use it. I can read a record that’s locked … I can always read a record that’s locked if I’m only reading it. But I can also say I want [the system] to read and lock that record to prevent someone else from changing that record. But you can simply read a record and write a record without ever checking unless you wrote in the program to validate for locks. And then, if it’s locked, you can get a report and either back out of it or send an email to somebody notifying them that they are locking you out of that record; the notification process depends on how you write the program. You can also write one specific record in a file. For example, if I want to update Field 174, I can simply update Field 174. I don’t have to read and write the whole record.
~ MODERN DATABASE ~
C: It does seem like replication and triggering and caching and all the other vast features of a modern database.
Jon: I object to the term that it’s not a modern database. You wouldn’t call SQL … I mean, everybody assumes SQL is the modern database. Well, it was launched about the same time as Pick. [Actually, about five years after Pick.]
C: SQL is a language. MySQL and PostGres…PostGres is more modern than MySQL.
C: And, a lot faster.
C: That’s still based on the original database.
~ DATABASE SELECTION CRITERIA ~
C: The reason I’m wondering is: a lot of the stuff you’re describing would be implemented in a lot of the popular JSON databases like MongoDB or something, and it has all of those features of a highly-scalable modern database so I’m trying to figure out where this fits in to “where would I select this over something like Mongo?”
Jon: Well, my experience has been that, when you get a new I.T. Manager in a business that has a Pick Database, the first thing the new I.T. Manager wants to do is replace the Pick Database instead of trying to modernize it. And, the way I feel about it is: you have all this business logic that you’ve already written and you know that it works and, hopefully, that you’ve continued to modernize and update it. Why do you want to throw away all that business logic to rewrite what you already have when all you need to do is put a new front end on it and continue to use the same business logic?
Q: If I wasn’t working on a legacy project, though?
Jon: If you were doing something brand new?
C: Where would this fit in if I’m doing something new in my database selection process?
Jon: I’m not a good proponent of it, either way. I love multi-value databases; that’s all I’ve done for the past 24 years. There are a lot of insurance companies, rental car companies, and parts stores to cite some example applications. Those types of applications involve huge databases. They work fast and, as long as you properly-maintain them, I would put them up against any database. But it’s difficult to find programmers that work in MultiValue Basic these days.
~ EXAMPLE USERS ~
- International NonProfit Organizations: General NonProfit Business Activities
- International Insurance and Corporate Service Brokers: Insurance Business Model
- International Real Estate Brokerage: Real Estate Transaction Activities
- Online Stock Trades Clearing House: General Clearing House Activities for Stocks & Bonds
- Convention & Visitors Bureau: Coordinating Scheduling and Management of City Events
- Wholesale Auto Parts Distributors: General Distribution Model; nationwide.
- International Paging Company: General Business Activities.
- Scientific Research Agency: Tracking Secret Stuff.
- Government Management: Managing Local Government Activities.
- Production Soldering Materials & Supplies Distributor: Specialty Distribution Model.
- Mini Computer Manufacturer; General Computer Manufacturer Model.
- Industrial Flow Measurement Products Manufacturer; General Manufacturing Model.
- Industrial Supply Distribution; General Distribution Model.
- Major University; Material Distribution: aggregating supply orders (from hundreds of departments: office supplies, gas cylinders for chem labs, tech equipment, etc.); ordering from vendors; receiving merchandise from vendors; delivering merchandise to ordering department; billing the ordering department; and transmitting overall accounting information to Accounting Dept.
~ SUPPORT OPTIONS ~
C: That’s kind of what I was thinking. When you are ready to retire, unless you train somebody else, the next person to come in is going to want to replace it, simply because the architecture is unfamiliar to most people.
C: So, to find somebody who’s familiar with the design, it’s going to be somebody like you who’s facing retirement. The age of the database is your challenge. With SQL, they’ve continued to evolve but not pushed it real hard, I guess. Microsoft has made it their baby, and they push it out there. But there are other databases as well. This is the first time I’ve heard of Pick O.S.
C: Worst-marketed product, ever!
C: I agree. Because I’ve been in the business for 27 years and I’ve never heard of it.
C: Dick was not a marketer; he assigned that job to his son.
C: Who did not do well, obviously. Is this a real common database, like on the AS/400? Or, do they have their own database on the 400?
Jon: I think they have their own database. I don’t see an option to get a UniVerse Database on an AS/400. But there are people who are using it and are current on it. It’s got all the SOAP interfaces and you can use Java and whatever you want.
~ BLOBs ~
Q: Does it store any binary objects (BLOBs)?
Jon: That’s the one object that there’s a problem with. [Possible Solution.] I can handle the headers for all my .pdf files but I simply store the .pdf files on a NAS Drive. So I know where each file is and I pull it out. I don’t actually store the BLOB in the actual database itself.
C: Unless you convert it to Base64?
Jon: Right. So, there are those issues … but … a lot of those issues are … our OBC (OnBoard Truck) won’t work with our UniVerse Database because we won’t store the BLOB. The problem is: if they want me to simply take the document and send it to them and put it in a container for them, that’s easy for me to do; it’s simply not connecting. What they make me do is: I’m writing to MariaDB [MySQL replacement] now on SQL Red Hat. (MariaDB is the new version because SQL quit being updated.) So, as far as cost, it is less cost than some of the other systems but I do have to pay maintenance for my licenses…and they just notified me that they’re raising rates seven percent this year.
~ MAINTENANCE COSTS ~
Q: So what’s the yearly maintenance on that?
~ PRESERVING APPLICATION SOFTWARE INVESTMENT ~
C: Another thing: you talked about investing in your application software; all your business rules. I think the biggest thing that’s kept so many businesses on this system for years is that, if they decide they don’t want to do business with UniData on their hardware, they can go to IBM, Prime or whomever. The point is: you can pick up your application software and move it to a whole different hardware platform, with some minor conversion issues to handle custom features on either the old or new system. [i.e., workarounds to handle lost features on old system, new code to take advantage of features on new system.]
Jon: There are a lot of people who switch from one version of Pick to another version of Pick. It’s either because the old version is no longer being maintained or the hardware is no longer being supported. Before I joined the company, we started with another version [implementation] of Pick. Then we moved to the IBM Power PC implementation and then on to UniVerse. The only reason I moved away from Power PC is that Power PC is costing so much more money these days. They want $20,000-plus for a hardware upgrade, and I can get a much cheaper Dell Server—even with SSD Drives—a lot cheaper than $20,000. On the other hand, I’ve never had my AIX Machine fail.
~ VIRTUALIZATION ~
Q: It’s bulletproof?
Jon: But once we get to virtualization, then I can simply move it over to another piece of hardware and keep on going.
C: You’re no longer limited by the hardware, once you virtualize. Your architecture is on a virtualized platform, at that point. You can have that run as many copies as you want.
Jon: And, they have full failover. They’ve got a system in place where you can have two machines running and you can set it up so that every time you write a record in one, it will send a packet over to the other machine and update it so, if your machine fails, it will fall over to the other machine.
I don’t think it is as robust as some of the others. One of the problems you have is: if you can’t keep up with your package, that’s a problem. You can’t take your old server and stick it out there and expect your new server to run as slow as your old server. You want it to run fast. As it starts sending more transactions, you start running into a problem where this machine might have to slow down if your packet storage is maxed-out, you won’t be able to write anything until you make more room; that’s an issue.
Q: Do you guys need that kind of availability, though?
Jon: No. Not for us. There are some banks and other businesses that obviously can’t afford to have it fail. For us, it is a concern if we go down but our biggest problem … Today, we had an issue that we thought was sourced in the AIX machine and we were going to re-boot it until we realized the source of the problem was our active server and when we rebooted it, the network card didn’t activate.
C: That’s a problem.
Jon: All we had to do was boot the Active Directory again and everything started working. But that’s my problem with running UniVerse on Windows … it’s just … when was the last time I rebooted my AIX machine? Three months ago? Four months ago?
C: And, with Windows, you have to do that every month for Patch Tuesday.
C: Back when I first moved to Clinton Valley, early August, late 90s, the first company I worked for had NT IIS4, I think, or 3, and we literally had to re-boot it every 4 hours.
[Lots of laughter]
C: HPC Admin hasn’t been re-booted in 651 days.
Q: Is it a Windows box?
C: We’ve come so far…think about the NT Days…
Q: Any other specific questions, right now?
C: It’s such a different … it seems like it’s so basic that it’s amazing that it works.
C: But I can see it working. I mean, it’s worked for 20-something years.
C: Since 1965…going on 55 years—and five years prior to SQL. I think the key is that they designed the database first … Originally, it was designed for the government to track inventory for Cheyenne Helicopter parts. They designed the database and then connected it to an IBM System/360 operating system. Simply the fact that they did it in that order … reverse from what normal developers do … may be the reason the system works so well. I think the best book I’ve ever read about synchronized hardware/software design was: Soul of a New Machine by Tracy Kidder, from way back in the 1970s. I happened to be working at Data General at the time they were working on that design; creating the DG Eclipse MV/8000 [to compete with DEC’s VAX] and AOS [code name Eagle], the companion operating system.
C: It’s fun seeing old stuff described as cutting edge.
Jon: Well, I will say, from my standpoint, since the entire company runs on this software, that anytime we decide that we want to add something, we simply add it. We don’t have to rely on an outside vendor to assist us. If somebody wants something done, they simply come to me or to our consultant for our bigger projects, and we simply get it done. To me, that’s a very positive situation for me and our company.
~ USER EXPERIENCE ~
Q: So, you said you have a Windows GUI? And then you have Telnet, obviously.
Q: And, so you see most of your new people using Windows and never going to Telnet?
Jon: There are some things we haven’t moved to the Windows version yet. We are actually using a separate product called System Builder to display the Windows on it. And it actually has three different versions of Windows: a web browser, so if you wanted to do something inside a web browser, you can do that; a rich client, which is more like Dot Net type stuff; and the older style System Builder Client that looks like a clunky Windows 95 (or so) type of display. But we pretty much use the rich client. And all it needs to have, in addition to what’s running on the AIX machine is: it has a Dot Net module that you have to put on a Windows machine. And that’s why we rebooted the Active Server, because we accidentally put the Dot Net program on our Active Server. When we upgrade to the new version, we’re going to put it on a different machine. But it’s a very lightweight application.
Q: You only need one instance of the Dot Net application for the whole company?
Jon: Right. And, we run two physical sites in the Valley and we’ve got black fiber between the two sites. And then, we only have a couple of people who are offsite who log in and use it. Unfortunately, they’re using Charter so they reboot about twice a day. I don’t know why they can’t stay up longer than that.
~ SESSION COSTS ~
Q: They don’t give an SLA on the business class?
Jon: Well, they do. We have Spectrum with fiber. But if you’re doing cable, they don’t.
Q: What are you paying for your fiber Charter?
Jon: $1200, I think.
Q: One gig?
C: Pretty good. I’m only paying $800; not for Charter, for Hunter.
Jon: Actually, Hunter’s providing all the fiber for us. But they had to run the fiber at the other site because they didn’t have fiber in place, so we’re paying for part of that. In fact, they had to run it to our site, too. They had to tunnel under the highway from their current run. But now, all I’m worried about is: we’ve got new construction on site, and I’m hoping they don’t hit the fiber.
~ EASE OF USE ~
Q: What about your clerks being able to do reports with things like creating their own PROCs [in a dictionary]?
Jon: We don’t allow any of the users to create their own reports.
C: But they could…
Jon: They could.
C: So that’s a company business rule, not something that’s restricted by the system.
Jon: We could; we simply don’t want to give them the training to do that.
C: I understand.
Jon: Our president knows how to do that and he does it occasionally, but he’s pretty much the only one who does that. Especially now that we’re doing stuff in Excel—or, tab delimited—if they want an extra field, I’ll add it. If one person wants the first five fields or another person wants the first ten fields, I’ll use the same report and [use dictionary items to] ignore the other fields, if it’s not confidential.
C: The point is that somebody without too much computer training (without knowing a programming language) could actually do something productive with your database; that’s one of the key features of Pick.
Jon: Yes. New versions of AccuTerm actually come with a report program where you allow the users to build their own reports. So you can have them use that if they want to. And, if you get the paid version, it will actually convert it into a full Excel version, instead of a tab-delimited or a comma-delimited format.
Jon wrapped up his presentation and his newly-educated audience gave him a round of applause. He had explained the design so well that the others in the group began to understand the concept—the similarities and differences—and everyone realized the benefits of MVA.
~ NEW DATA SCIENCE CLUB ~
Cody, a newcomer to the group, took the floor to promote his new Data Science Group.
Cody’s background is: work as an investment analyst for the last five years, augmented by schooling involving A/I and machine learning. he’s always liked the whole field of Data Science; using computers to build domain knowledge, combined with statistics (because he’s also a math major). He’s been trying to find people who are also interested in working with data. He and his roommate are trying to make their place into a pre-incubator, sort of thing; almost like a nerd clubhouse. We’ve got 3D printers and oscilloscopes and a server room.
C: The ladies must flock to your house.
Cody: Oh, yeah! So, we’re just having fun … We decided to see if we could find some other people who are interested in this type of thing. I decided I’ll teach the first couple of lessons and as people are interested and expand. I’ve got about 19 R.S.V.P.s so far, we’re going to be cozy. We’ve got a twelve-foot projector and it should be pretty fun.
Karen: And you said that if we R.S.V.P., you’ll live-stream it?
Cody: Yeah. Shoot me an email and I’ll send you the address. If you can’t make it and you want to watch, we’ll set up a GoToMeeting live stream.
Karen: I’m interested. I’ll send you an email tonight.
Cody: We want to get as many people as possible interested. We’ve got a lot of programmers; I think that more than half the people who are coming are programmers. We also have people who work in the public health data field. And financial analysis. Anybody who works with data and wants to learn new techniques for analyzing it. So, tomorrow, my goal is to do a half-hour session talking about a broad overview of all the different techniques and technologies, that stuff, that exist in modern data science.
The second half will be a live demo; a Kaggle competition. They do machine learning competitions. They got bought by Google. That’s where a lot of companies do their recruiting for machine learning experts, through these Kaggle competitions. There’s one that’s about analyzing real estate prices. I was going to do a quick demo on multiple regression, a learning technique to show people how to partition their data, build a test set, a training set, and try to predict what price a house will sell for. I’m going to take a prediction set and build a model for that.
Karen: There’s a new company out now that’s doing that; matching buyers and sellers.
Cody: When I had this idea, I was on my lunch break and I was able to come up with the idea in one hour. I pulled down some of my tools I use for rapid prototyping and machine learning, and produced something that could guess a selling price within 8% within an hour.
C: And, of course, that can be fine-tuned to be more accurate.
Cody: Well, the guys that are fine-tuning it are getting down to within 2%. I’m simply trying to demonstrate that you don’t have to be a genius programmer to be able to take a big chunk of data and get cool information out of it.
Q: How big would you say a data set would have to be to get decent results?
Cody: It depends on the noise; if there’s a strong signal-to-noise ratio in the data you’re trying to look at, then you’re going to be able to extract that out a lot easier with a smaller data set. But with really noisy stuff … I’m toying with stock market predictions … that stuff is noisy. So I have millions upon millions of rows of data I’m parsing through whenever I run a single learner on it. Because it’s got to be able to find patterns and sift through the noise.
Q: Is that data available through stock market indexing? Stock market pricing going way back?
Cody: It is if you pay for it. And I work for a finance firm.
C: Yes. I think to get anything useful, you’d have to go way back because the day-to-day stuff is one thing, but balances finally correct and there’s a big crash. It’s not something you see every day.
Cody: Yes. And, if your learner has never seen a crash, how can it predict one?
Cody: So you’ve really gotta have massive data in order to figure out where those outliers live. I’m doing tons of work with all sorts of different learning systems on the finance stuff. But the types of learning systems that we do for other fields like vision and voice processing.
In my machine learning class, we built decision trees where you put in stuff like the color and smell of a mushroom and whether or not it’s poisonous. And then you can get characteristics
of another mushroom and, based on what it’s seen in the past, the system will guess whether or not it’s poisonous. We hear about neural networks a lot; those are like black box systems.
But it’s really hard to derive how the network works. It’s hard to understand what the patterns are. There are other algorithms that you can get information out of, like decision tree learners, which analyze which characteristics reduce the entropy, and build a decision tree that’s the optimal decision tree for solving a problem, and it’s human-readable. That’s a machine learning technique. But it’s not neural networks; it’s not sexy buzz words. There’s all sorts of stuff going on in these fields and people don’t know about it. If I had the time, I could do this every week; there’s enough material, just in this field.
C: I’m very interested because I would like to build a predictive analysis model for the next 911 call coming in…or, crime reporting. We have all these years of crime data—localized—where can we guess the next call is going to come in.
C: The Pre-Crime Division!
C: It would be something to see what the model looks like, with the data we have.
Cody: That’s the kind of stuff we do. If that’s the kind of research you want to do, it would be awesome if you want to present that analysis. I want this group to be less about the technology and more about the math and the data.
Karen: I’m glad you came and I hope you’re glad you came!
Copyright © 2019, FPP, LLC. All rights reserved.