case sensitivity in maps

gnucap provides a flag for case sensitivity that affects the parsing and interpretation of identifiers. It is user controlled. The main use case is the support for mostly obsolete but still widely used input languages such as SPICE. the case sensitivity flag affects declaration and lookup.

In case insensitive mode, an identifier is cast to lower case when it is entered, and when it is looked up in a dictionary. This can be seen when switching to the SPICE language, which turns case sensitivity off.

gnucap> spice
gnucap-spice>.param a=1.
gnucap-spice>.param B=2.
 a= 1. b= 2.

internally, these names are stored in lower case, not case preserving, similar to names in FAT12, but lower case. back to verilog mode, things look sensitive again.

gnucap-verilog>param C=2.
gnucap-verilog>param c=3.
gnucap-verilog>param b=4.
.C( 2.), .a( 1.), .b( 4.), .c(3.)

in particular, the actual case is lost, and the parameter B is gone.

gnucap-verilog>eval C
C= 1.
gnucap-verilog>eval B
gnucap-verilog>eval b
b= 4.

similarly, when a parameter “X” is declared in case sensitive mode, it is invisible in case insensitive mode. This can have surprising consequences, when running simulations (e.g. parameter sweeps) in spice mode.

Newer case insensitive file systems (e.g. NTFS, VFAT, HFS+) are case preserving, not without odd side effects. on NTFS, readme.txt and a Readme.txt can coexist, according to wikipedia. Since file systems usually fix the sensitivity mode, the side effects are limited. But the approaches do not work very well in gnucap.

gnucap should be case preserving internally, but must implement a lookup that depends on the current mode.

gnucap> options noinsensitive
gnucap> param A=1.
gnucap> options insensitive
gnucap> eval A
 A= 1.
gnucap> eval a
 a= 1.
gnucap> param A=2.
gnucap> eval A
 A= 2.
gnucap> eval a
 a= 2.
gnucap> param A=3.
gnucap> eval A
 A= 3.
gnucap> eval a
 a= 3.

With an approriate lookup, no mangling will be required in case insensitive mode. But how to implement an appropriate lookup? Gnucap stores parameters in an std::map<std::string, PARAMETER>. The problem with that is, if “AAAA” is a key, a case insensitive lookup for “aaaa” will be tricky. The expectation is, that “AAAA” will be found. however, Maps store keys in binary trees, and the search is implemented as a bisection, for performance reasons. In a linear search, we could simply cast the search string to upper, as well as all the keys that we compare it to. In the map above, we'd have to do a lookup for all strings that “aaaa” could possibly match. Sixteen in this case, but much worse for longer strings.

The issue with the lookup boils down to the fact that the insensitive matches could be anywhere in the map, which is sorted by the order induced by the ASCII encoding, 1 < 2 < … < 9 < A < B … < Z < a < b < … < z. The following example illustrates the pattern, which easily scales to any size.

AAaa *
Aaaa *
aaaa *

In contrast, if the keys were sorted insensitively first and blocks with congruent keys sorted individually, case sensitive, the order would be

AAaa *
Aaaa *
aaaa *

Putting all possible matches to our query next to each other. In an std::map sorted like this, lookups and inserts with either case (in)sensitivity mode will work, while the mode may change in between.

- implement the string comparison operator - encapsulate it in a custom string type - provide case sensitive and insensitive lookup, by default these should use OPT::case_insensitive.

(this is implemented in the paramap-* branch)


With this kind of string in place, it will be possible to also implement an efficient CARD_LIST::find, supporting both case sensitivities. CARD_LISTs implement the lists of CARDs, which represent all user input, including the order, for each individual scope. The CARDs could be models or components or subcircuit declarations or just comments.

Unlike in PARAM_LISTs there can be many cards with the same key (short_label) in one CARD_LIST, which is implemented as a linked list. The lookup is implemented as a linear search, and hence slow. And if there was no case sensitivity flag, this could be easily fixed. an std::multimap<string, list_iterator> or std::multiset<list_iterator> right next to the std::list could be used to lookup the position.

So the list needs to stay the same, and instead of the multiset, we need a custom multiset that works transparently through case sensitivity changes. One way of doing this, is to nest sets. One set, with elements sorted case insensitively holds (at most) one multiset for each string class, a “bucket”. a bucket may then hold arbitrarily many CARDs, but in case sensitive ordering.

For example, if there are cards A A a B b b c, they will end up in three buckets.

A - A
  - A
  - a
B - B
  - b
  - b
c - c

A case insensitive lookup for b will then try and find it in the B bucket, then pick any of the cards there. similarly, a case sensitive lookup will pick one of the lower case cards. since std::multiset preserves the insertion order, it will be possible to fine tune the lookup.

(this is implemented in the multimap-* branch)

gnucap/user/istring.txt · Last modified: 2018/09/24 07:42 by felixs
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Run by Debian Driven by DokuWiki